Wednesday, November 24, 2010

Sparql - Google maps mashup - 2 markers hand coded

I'd like to have Sparql locations charted on a google map, but I have to take it one baby step at a time.
I put in 2 markers which I hand coded.

Image in question

Monday, November 22, 2010

Sparql - Google maps mashup - Auto generated map

I'd like to have Sparql locations charted on a google map, but I have to take it one baby step at a time.

This used code generated by The Google Map Wizard. I just wanted to see if I could get a map to appear with a single point on it.

Wednesday, September 15, 2010

Using Named Graphs - Joan Baez and Bob Dylan

Lets use the SPARQL named graphs functionality to process only the dbpedia records for Joan Baez and Bob Dylan. Your query is almost instantaneous because you process only the graphs you specify, rather than searching through all of dbpedia. These two graphs have only a few hundred triples between them.

This use of named graphs is quite similar to Oracle's uses of multiple tables in the FROM clause of the select statement. The database may have hundreds or thousands of tables, but you only want information from the specified tables.

SHOW HOW THESE SINGERS ARE CLASSIFIED

The way to name the source graph is shown in bold text.

We'll show how the singers are classified by using the skos:subject predicate.

We'll order results so we can see what classification these artists share when we do a visual inspection.

SELECT ?o ?s 
FROM <http://dbpedia.org/resource/Bob_Dylan>
FROM <http://dbpedia.org/resource/Joan_Baez>
WHERE { ?s <http://www.w3.org/2004/02/skos/core#subject>?o . }
order by ?o ?s

See the results.

GROUPING BY HOW THESE SINGERS ARE CLASSIFIED

In order to have only 1 row per classification, I had to output the classification, ?o, without ?s.

I did process the ?s variable within group functions.

count(distinct ?s) as ?singerCount

shows 2 where both singers have the same classification, and 1 when the classification is unique to either artist.

min(?s) as ?firstSinger
max(?s) as ?lastSinger

shows which artist(s) had the classification. When there is only 1 artist with that classification, their name appears in both positions. It's a little less elegant than I'd like, but it worked.

SELECT ?o 
count(distinct ?s) as ?singerCount 
min(?s) as ?firstSinger 
max(?s) as ?lastSinger
FROM <http://dbpedia.org/resource/Bob_Dylan>
FROM <http://dbpedia.org/resource/Joan_Baez>
WHERE 
{ ?s <http://www.w3.org/2004/02/skos/core#subject> ?o .
}
order by desc(?singerCount) ?o

See the results.

SHOW ONLY THE CLASSIFICATIONS THEY BOTH SHARE

The final query shows just the classifications that are exactly the same for both artists. This is accomplished by having the same object, ?o, in both graphs.

SELECT ?o
WHERE
{
  GRAPH <http://dbpedia.org/resource/Bob_Dylan>
  {
    ?Dylan <http://www.w3.org/2004/02/skos/core#subject> ?o
  } .
  GRAPH <http://dbpedia.org/resource/Joan_Baez>
  {
    ?Baez <http://www.w3.org/2004/02/skos/core#subject> ?o
  }.
}
order by ?o

See the results.

Friday, August 27, 2010

Counting and Summing - US States and their populations

Lets switch to the Census Bureau's SPARQL test page,http://www.rdfabout.com/demo/census/sparql.xpd, and find out some information about the 50 states of the United States.

It always helps to have the sparql engine give some guidance, or some starting queries. Still the bulk of your SPARQL queries will come from self-discovery of the subjects, predicates and the objects in the database.

Note: After playing a little with the Census Bureau's SPARQL test page, I switched back to the dbpedia Virtuoso browser, http://dbpedia.org/sparql because its has better support for aggregate functions such as COUNT and SUM, and it is easier to save the queries I am discussing and easier to show you the live results.

ANALYZING THE QUERY ABOUT THE UNITED STATES

Let's look at the query in the query box more closely.

<http://www.rdfabout.com/rdf/usgov/geo/us> is our subject.

?p represents any predicate, or column name about the US

?o represents any object or column value.

This is a very common way to initially explore what the data set contains about the subject you are interested in. After you see the values returned, you can begin to restrict the predicates or objects to close in on what you are interested in. This wild card type of querying also helps with copying a workable predicate or object in case your first attempts at removing the PREFIX fail.

FILTER(!isBlank(?o)) . - removes predicates without meaningful information.

FILTER(!regex(str(?o), "zcta")) . removes about 100 different census groups, subdivisions of the United States that are probably only meaningful to the census team.

See the results.

JUST SHOW THE STATES

We'll focus on just the states by specifying the predicate <http://purl.org/dc/terms/hasPart> using the PREFIX representation.

We'll also order by state to make duplicate checking easier.

PREFIX     rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dcterms:  <http://purl.org/dc/terms/>

SELECT * WHERE {
  <http://www.rdfabout.com/rdf/usgov/geo/us>
    dcterms:hasPart
     ?o .
  FILTER(!isBlank(?o)) .
  FILTER(!regex(str(?o), "zcta")) .
}
order by ?o

See the results.

Now we have only states, but do we have all 50 states?

LETS COUNT THE STATES

Counting in SPARQL is a little bit of a pain with the virtuoso browser because you have to feed the count function another, duplicate set of data. In this case, I called the objects ?state when counting them.

PREFIX     geo: 
PREFIX     rdf: 
PREFIX dcterms: 

SELECT ?o, count(distinct ?state) as ?stateCount WHERE 
{{
  <http://www.rdfabout.com/rdf/usgov/geo/us>
    dcterms:hasPart
     ?o .
  FILTER(!isBlank(?o)) .
  FILTER(!regex(str(?o), "zcta")) .
 } UNION {
  <http://www.rdfabout.com/rdf/usgov/geo/us>
    dcterms:hasPart
     ?state .
  FILTER(!isBlank(?state)) .
  FILTER(!regex(str(?state), "zcta")) .
 }
}
order by ?o

See the results.

We got 52 states. Let's add the state name to find the extra records.

WHAT IS THE STATE NAME CALLED?

To find out what the state name is called, we have to explore the state data by getting another set of triplets using the clause,?o ?p2 ?o2. This means, for the same object return all predicates identified as p2 and those detail objects, identified as ?o2.

We also add the new fields ?p2 and ?o2 to our select list, the variables we want to display.

PREFIX     rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dcterms:  <http://purl.org/dc/terms/>

SELECT ?o, count(distinct ?state) as ?stateCount, ?p2, ?o2 WHERE 
{{
  <http://www.rdfabout.com/rdf/usgov/geo/us>
        dcterms:hasPart
           ?o .
       FILTER(!isBlank(?o)) .
       FILTER(!regex(str(?o), "zcta")) .
          ?o
               ?p2
                  ?o2
              .
          
         }
         UNION 
         {
          <http://www.rdfabout.com/rdf/usgov/geo/us>
             dcterms:hasPart
               ?state .
           FILTER(!isBlank(?state)) .
           FILTER(!regex(str(?state), "zcta")) .
         }
        }
        order by ?o

See the results.

I search all the records for the string Alaska and find it as the object of the predicate, http://purl.org/dc/elements/1.1/title. That is the predicate for state name.

DISPLAY THE STATE NAME

We will break the state name predicate into 2 pieces and now show just the fields we really want.

PREFIX           rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX       dcterms:  <http://purl.org/dc/terms/>
PREFIX    dcelements:  <http://purl.org/dc/elements/1.1/>

SELECT ?o,?stateName, count(distinct ?state) as ?stateCount WHERE 
{{
  <http://www.rdfabout.com/rdf/usgov/geo/us>
        dcterms:hasPart
           ?o .
       FILTER(!isBlank(?o)) .
       FILTER(!regex(str(?o), "zcta")) .
            ?o
               dcelements:title
                  ?stateName            
             .
         }
         UNION 
         {
          <http://www.rdfabout.com/rdf/usgov/geo/us>
             dcterms:hasPart
               ?state .
           FILTER(!isBlank(?state)) .
           FILTER(!regex(str(?state), "zcta")) .
         }
        }
        order by ?o

See the results.

Now I see why we have 52 records listed -- the count record adds 1 to the tally and the District of Columbia appears as a part of the United States.

DISPLAY THE STATE POPULATION

Going back to the same query that revealed the state name, I gathered that the state population was given by
http://www.rdfabout.com/rdf/schema/census/population.
I broke the population predicate into 2 pieces and now also show the population in the select list.

PREFIX        census: <http://www.rdfabout.com/rdf/schema/census/>
PREFIX    dcelements: <http://purl.org/dc/elements/1.1/>
PREFIX       dcterms: <http://purl.org/dc/terms/>
PREFIX           rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>


SELECT ?o, ?stateName,?population, count(distinct ?state) as ?stateCount WHERE 
{{
  <http://www.rdfabout.com/rdf/usgov/geo/us>
        dcterms:hasPart
           ?o .
       FILTER(!isBlank(?o)) .
       FILTER(!regex(str(?o), "zcta")) .
            ?o
               dcelements:title
                  ?stateName;            
               census:population
                  ?population
              .
          
         }
         UNION 
         {
          <http://www.rdfabout.com/rdf/usgov/geo/us>
             dcterms:hasPart
               ?state .
           FILTER(!isBlank(?state)) .
           FILTER(!regex(str(?state), "zcta")) .
         }
        }
        order by ?o

See the results.

Now I see population figures for each state, but I'm not sure at what time they were recorded. I would need to check them against other sources as well if I was trying to speak authoritatively about population.

SUM THE STATE POPULATIONS TO CALCULATE THE TOTAL US POPULATION

Now in the select list I'll sum up the individual state population figures that was produced in the 2nd unioned subquery that I had previously used to compute the state count. This produces a grand total population.

PREFIX        census: <http://www.rdfabout.com/rdf/schema/census/>
PREFIX    dcelements: <http://purl.org/dc/elements/1.1/>
PREFIX       dcterms: <http://purl.org/dc/terms/>
PREFIX           rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>


SELECT ?o, ?stateName, 
?population,sum(?statePopulation) as ?totalPopulation,
count(distinct ?state) as ?stateCount WHERE 
{{
  <http://www.rdfabout.com/rdf/usgov/geo/us>
        dcterms:hasPart
           ?o .
       FILTER(!isBlank(?o)) .
       FILTER(!regex(str(?o), "zcta")) .
            ?o
               dcelements:title
                  ?stateName;            
               census:population
                  ?population
              .
          
         }
         UNION 
         {
          <http://www.rdfabout.com/rdf/usgov/geo/us>
             dcterms:hasPart
               ?state .
                ?state
                 census:population
                   ?statePopulation
               .
            
           FILTER(!isBlank(?state)) .
           FILTER(!regex(str(?state), "zcta")) .
         }
        }
        order by ?o

See the results.

Now I see total population for the US, but at some unspecified period of time.

Prefixes in SPARQL Queries

While still using the dbpedia Virtuoso browser, http://dbpedia.org/sparql, pointing to a single page about Joan Baez, lets explore the use of prefixes to shorten our links.

Prefixes facilitate communication by helping you use standard predicates and objects to describe your data so the user doesn't have to work so hard to understand you.

SUPPLYING THE RIGHT PREDICATE FOR RDF:TYPE

Viewing the source of the Come from the Shadows page, and looking for the string, rdf:type, I find a simpler URI to use, http://www.w3.org/1999/02/22-rdf-syntax-ns# which works.

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/artist>
       <http://dbpedia.org/resource/Joan_Baez> ;
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type > 
       ?type.
}

See the results.

ABBREVIATE A PREDICATE USING A PREFIX

Instead of using the full URI in the previous example, I can:

1) define a prefix, rdf:,

2) list the URI to identify the prefix enclosed in angled brackets, and then

3) use the prefix to express a URI more compactly, rdf:type

Doing that gives us the same results.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/artist>
       <http://dbpedia.org/resource/Joan_Baez> ;
     rdf:type 
       ?type.
}

See the results.

Using the URI of the prefix, we can go to that web page and learn more about related predicates that might be available in that namespace. These namespaces are important for explaining whether, for example, the chip we are discussing is a computer chip, a potato chip, a paint chip or a cow chip.

OTHER COMMON PREFIX

PREFIX owl: <http://www.w3.org/2002/07/owl#>: Provides metadata about concepts and how they are related to other concepts.
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>: Provides information about how the information is structured and often used to specify the data type of a particular piece of information.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>: Provides metadata about the RDF data such as related concepts, user friendly labels, comments about the information.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>: Provides very fundamental classification of the information.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>: Provides information about people and their inter-relationships (friend of a friend), and how to contact them.
PREFIX dc: <http://purl.org/dc/elements/1.1/>: Provides the Dublic Core items of metadata. It is often used to describe published information and the publisher of that information.
PREFIX : <http://dbpedia.org/resource/>: With this empty prefix, we could then refer to the dbpedia resource <http://dbpedia.org/resource/Joan_ Baez> simply as :Joan_Baez
PREFIX dbpedia2: <http://dbpedia.org/property/>: This describes the properties that Dbpedia uses to describe things or subjects it has information about. These properties are appropriate to the thing, so a river would have a different set of pre-defined properties than a mathemetician.
PREFIX dbpedia: <http://dbpedia.org/>: Provides information coming from dbpedia..
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>: Simple Knowledge Organization System (SKOS) is a common data model for sharing and linking knowledge organization systems via the Semantic Web.
PREFIX dbo: <http://dbpedia.org/ontology/>: Provides metadata about what entities exist, how they can be grouped, how they related within a hierarchy, and how they can be subdivided according to similarities and differences.

Thursday, August 26, 2010

Dbpedia Reverse Links - is dbpedia-owl:artist of

While still using the dbpedia Virtuoso browser, http://dbpedia.org/sparql, pointing to a single page about Joan Baez, lets explore what it takes to retrieve data from reverse links, in this case is dbpedia-owl:artist of.

When dbpedia shows is somePredicate of, then the subject of the current page, in this case Joan Baez, becomes the object on another page, while the values to the right become the subjects of those other pages. It will become clearer when you examine the results of the following query.

In the Joan Baez Page:

We see the predicate, is dbpedia-owl:artist of to the left, and a number of other dbpedia links listed as the object to the right. I'm showing just the first few entries because Joan Baez is quite a prolific artist.

In the Come From the Shadows Page:

I clicked on the first object link from the Joan Baez page, dbpedia:Come_from_the_Shadows to see what it was.

We see the predicate, dbpedia-owl:artist to the left, and dbpedia:Joan Baez listed as the object to the right.

JOAN BAEZ AS THE OBJECT INSTEAD OF THE SUBJECT

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/artist>
       <http://dbpedia.org/resource/Joan_Baez>
 .
}

This query asks SPARQL TO list all subject, ?s, where the artist is Joan Baez.

RUN QUERY

See the results.

WHAT DID WE RETRIEVE?

It looks like these are her albums, but let's make sure. Lets get the rdf:type of each subject. I copied the link for rdf:type as I have done all the other links so far. This time it caused some problems.

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/artist>
       <http://dbpedia.org/resource/Joan_Baez>;
    <http://dbpedia.org/about/html/http://www.w3.org/1999/02/22-rdf-syntax-ns%01type>
       ?type.
}

RUN QUERY

See the results.

The dreaded blank column headers doesn't tell me what's wrong, but the long predicate picked up from the link rdf:type looks suspicious. <http://dbpedia.org/about/html/http://www.w3.org/1999/02/22-rdf-syntax-ns%01type> is very long and different from our other links that worked.

SUPPLYING THE RIGHT PREDICATE FOR RDF:TYPE

Viewing the source of the Come from the Shadows page, and looking for the string, rdf:type, I find a simpler URI to use, http://www.w3.org/1999/02/22-rdf-syntax-ns# which works.

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/artist>
       <http://dbpedia.org/resource/Joan_Baez> ;
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type > 
       ?type.
}

See the results.

IT WORKED BUT EACH ALBUM IS REPEATED MULTIPLE TIMES

The last set of results list Come from the Shadows 4 times because it's page gave it 4 different types: an owl thing, a MusicalWork, a Work and an Album. Most of the albums appear multiple times as well.

LIST EACH ALBUM ONCE

I looked through the Come from the Shadows page to find a description of what it was that was a single entry. I found the predicate, http://dbpedia.org/ontology/type. Let's try that as our predicate and sort by subject to see if we did indeed list each album only once.

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/artist>
       <http://dbpedia.org/resource/Joan_Baez> ;
    <http://dbpedia.org/ontology/type > 
       ?type.
}
order by ?s

See the results.

Yes, mission accomplished! We have each album listed only once, and dbpedia has classified the albums as Album, Compilation_album, Live_album or a Studio_album.

Wednesday, August 25, 2010

Processing Dates - When was Joan Baez born

While still using the dbpedia Virtuoso browser, http://dbpedia.org/sparql, pointing to a single page about Joan Baez, lets explore what it takes to retrieve date information.

WHEN WAS SHE BORN AND WHEN SHE BECAME ACTIVE

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/birthDate>
      ?birthDate;
    <http://dbpedia.org/ontology/activeYearsStartYear>
      ?firstActive.
}

The subject ?s will point back to this page.

The first predicate was the dbpedia-owl:birthDate link copied from the page.

The first object ?birthDate will be the info returned from the query.

It is followed by a semicolon instead of a period meaning repeat the same subject for the next predicate - object pair. I use indentation to remind myself that I am seeking info about the same subject, but the indentation is not significant to the SPARQL engine.

The second predicate, was the dbpedia-owl:ActiveYearsStartYear link copied from the page.

The second object, will be when she became active musically, presumably when she appeared on the world's stage, rather than the first time she sang or played a guitar.

RUN QUERY

See the results.

FILTERING DATES

Your first attempt to filter by a date in the form of yyyy-mm-dd will fail - giving you the dreaded empty column headers feedback.

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/birthDate>
      ?birthdate;
    <http://dbpedia.org/ontology/activeYearsStartYear>
      ?firstActive.
 FILTER(?birthdate > "1940-01-01").
}

RUN QUERY

See the results.

FILTERING DATES USING DATETIME TYPE

Looking at the SPARQL manual on 11 Testing Values you see that you can't match a dateTime against a simple string, but rather you must match against a dateTime string as indicated by the 2 circumflex characters ^^ then the prefix xsd: and finally the datatype, dateTime.

By matching a dateTime to a dateTime, you will not see your results.

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/birthDate>
      ?birthdate;
    <http://dbpedia.org/ontology/activeYearsStartYear>
      ?firstActive.
 FILTER(?birthdate > "1940-01-01"^^xsd:dateTime).
}

RUN QUERY

See the results.

COMPARING STRINGS TO STRINGS

The same results would be obtained with the following filter condition:

FILTER(str(?birthdate) > "1940-01-01"). because you are first converting the ?birthdate to a string before comparing it to another string.

The str function is also handy for changing URI's (or web identifiers) into strings for simpler filtering.

Tuesday, August 24, 2010

Retrieving Other Info From a Dbpedia Page

While still using the dbpedia Virtuoso browser, http://dbpedia.org/sparql, pointing to a single page about Joan Baez, lets explore what it takes to retrieve different types of data.

DISPLAYING THE ABSTRACT

The subject ?s will point back to this page.

The predicate was the dbpedia-owl:abstract link copied from the page.

The object ?o will be the abstract returned from the query.

RUN QUERY

See the results.

GET ONLY THE ENGLISH ABSTRACTS

The last query returned abstracts in many different languages. To return just the English version, we modify the Sparql query as follows:

select distinct * 
where {?s <http://dbpedia.org/ontology/abstract> ?o .
 FILTER(langMatches(lang(?o),"en")).
}

The filter condition is how we specify the language tag associated with English,
abbreviated "en". The built-in function, langMatches provides the capability to detect the language associated with a column.

RUN QUERY

See the results.

GET ONLY THE ENGLISH ABSTRACTS OR ABSTRACTS AS SIMPLE STRINGS

Sometimes you might have no language tag at all, just a simple text string. To pick that up, you need the UNION operator to handle this possibility.

Note:The clause describing the abstracts with the @en label, for English, is enclosed in braces {} followed by the keyword UNION and then braces must enclose the clause for label-less abstracts.

select distinct * 
where 
{
{
  ?s <http://dbpedia.org/ontology/abstract> ?o .
  FILTER(langMatches(lang(?o),"en")).
 
 }
 UNION
 {
  ?s <http://dbpedia.org/ontology/abstract> ?o .
  FILTER(!langMatches(lang(?o),"*")).
 }
}

The second filter condition is how we specify a string without a language tag, or rather, it does not match the object that has a language value of *, meaning any language.

In this case it doesn't make a difference because all abstracts include a language tag.

RUN QUERY

See the results.

Monday, August 23, 2010

Querying a Single DBPedia Page - for Joan Baez

Still using the dbpedia Virtuoso browser, http://dbpedia.org/sparql, I applied the last query to a single dbpedia page by specifying the default graph uri as http://dbpedia.org/resource/Joan_Baez. This will show the various ways dbpedia classifies Joan Baez.

Virtuoso SPARQL Query Form

RUN QUERY

See the results.

COMPARE TO THE DBPEDIA PAGE FOR JOAN BAEZ

When starting with SPARQL, I often pored over the dbpedia resource file in order to compare it with my SPARQL output to see if I got the SPARQL code right. These results match the entries under rdf:type (the predicate which was abbreviated as a).

Note: I did not specify the Joan Baez human readable page, http://dbpedia.org/page/Joan_Baez, in the Default Graph URI.

Instead, I specified the RDF page, http://dbpedia.org/resource/Joan_Baez" which is designed for SPARQL querying. However, if you point your browser to to resource page, dbpedia will show you the human readable page. The Sparql query engine however can go straight to the underlying RDF.

Tuesday, August 17, 2010

An Online SPARQL Query Form

I must apologize to my reader(s) about hogging all the Sparql fun. Here I've been querying and revising queries and not even given you a bit of help on running your own Sparql queries. Forgive me.

Virtuoso SPARQL Query Form

The link for the website is http://dbpedia.org/sparql

I like it for querying dbpedia because of it's immediacy. I put in the Sparql query and get out my answer, nothing to install on my machine, sweet and simple SPARQL power over the web.

Default Graph URI

The default graph textbox allows you to tell SPARQL what information source to query when you do not explicitly name a source in your query. The Virtuoso browser prefills the field with http://dbpedia.org meaning query all of dbpedia.

THE QUERY TEXT

The query text box is where you enter your SPARQL query. I generally open a second window pointing to the Virtuoso query, then I can copy the SPARQL queries from this blog or any other SPARQL learning resource, put it in this box, and run it. That way I can not only see the queries in action, but can tweak them and explore a little further on my own.

select distinct ?Concept where { [] a ?Concept }

This says: Show me the distinct Concepts in http://dbpedia.org.

The symbols [] indicates I don't care what the subject is.

The predicate a is shorthand for rdf:type. I translate it as "is a" .

The object is called a ?Concept.

DISPLAY RESULTS AS

The Virtuoso browser provides many different output formats. I use HTML most of the time. If you write a query that outputs only a few columns and a few records, then running the same query in each of the different formats is a very clear way to appreciate the differences between each output format.

RIGOROUS CHECK OF THE QUERY

The rigorous check checkbox probably provides more feedback about less desirable SPARQL, but unchecking it won't help you run a query with fatal errors. I personally find the SPARQL compiler's error feedback less helpful than the feedback from Oracle's sql compiler, but Oracle has been around a lot longer than SPARQL.

EXECUTION TIMEOUT IN MILLISECONDS

I haven't used it.

RUN QUERY

When you click the Run Query button you can see the output of your query in the format your specified, in this example, HTML. Such a raw dump of all the distinct concepts in dbpedia shows an enormous number of concepts that appear unintelligible. Farther down the list you see concepts that make sense even at a casual glance.

See the results.

Monday, August 16, 2010

All Simpson's Blackboard gags - Filtered and Ordered

FILTER AND ORDER THE RESULTS

There are extra rows showing because a season sometimes has more than 1 label assigned to it. We now want to FILTER the rows to just those containing The Simpsons episodes in the season title.

THE SPARQL QUERY - FILTERED AND ORDERED

SELECT distinct ?season_title,?episode,?chalkboard_gag
WHERE 
{
 ?episode
   <http://www.w3.org/2004/02/skos/core#subject>
     ?season
 .
 ?episode
   <http://dbpedia.org/property/blackboard>
     ?chalkboard_gag
 .
     ?season
       <http://www.w3.org/2000/01/rdf-schema#label>
         ?season_title
     .
    FILTER(regex(?season_title,"The Simpsons episodes","i")).
}
ORDER BY ?season_title

Show results

FILTER WITH REGULAR EXPRESSIONS

13)FILTER(regex(?season_title,"The Simpsons episodes","i")).

Line 13) FILTER lets us restrict the rows in our output to only those that make our expression true. These are similar to Oracle's where clauses. In this example we used the regex function in a very simple way to search ?season_title for the string, The Simpsons episodes, ignoring any variations in uppercase or lower case letters.

ORDER THE RESULTS

15) ORDER BY ?season_title

Line 15) We would like to order by season title, so we see the gags over time. This command orders the title as a string, at least keeping the seasons together, but putting season 1 and season 11 before season 2.
We could work harder to isolate the number at the end of the season and sort on its
numeric representation, but let's not.
Some episodes have an airdate, the first time the episode was shown, but the data item is too sparse to order all our results. Let's move on.

All Simpson's Blackboard gags - With Season Title

DISPLAY THE SEASON LABEL

Looking at the dbpedia page The Simpsons episodes Season 12, we see it has a user-friendly label, The Simpson episodes, season 12.

THE QUERY DIAGRAM

THE SPARQL QUERY - WITH THE SEASON TITLE

SELECT distinct ?season_title,?episode,?chalkboard_gag
WHERE 
{
 ?episode
   <http://www.w3.org/2004/02/skos/core#subject>
     ?season
 .
 ?episode
   <http://dbpedia.org/property/blackboard>
     ?chalkboard_gag
 .
   ?season
       <http://www.w3.org/2000/01/rdf-schema#label>
         ?season_title
     .
}

Show results

PICK UP THE SEASON TITLE

12)?season
13) <http://www.w3.org/2000/01/rdf-schema#label>
14) ?season_title
15).

Line 12) For each season
Line 13) return the it's user-friendly label
Line 14) and call it ?season_title.

Friday, July 30, 2010

All Simpson's Blackboard gags

Let's expand our second query and get all the blackboard gags from all the Simpson seasons and episodes.

THE QUERY DIAGRAM

THE SPARQL QUERY - ANY EPISODE ANY SEASON WITH A BLACKBOARD GAG

SELECT distinct ?episode,?chalkboard_gag
WHERE 
{
 ?episode
   <http://www.w3.org/2004/02/skos/core#subject>
     ?season
 .
 ?episode
   <http://dbpedia.org/property/blackboard>
     ?chalkboard_gag
 .
}

Show results.

GET ANY EPISODE - ANY SEASON

4) ?episode  
5)  <http://www.w3.org/2004/02/skos/core#subject>  
6)   ?season  
7) .

Note: I'll only discuss the newest items of interest instead of endlessly repeating that the select clause lists the variables you want to see.

Line 4) The subject we are talking about.
Line 5) We already learned that the skos:subject referred to the season this episode belonged to.
Line 6)Instead of limiting our results to The Simpsons season 12, we will put in ?season, meaning any season.
Line 7) The period marks the end of the first selection clause.
Recapping, lines 4-7 ask the SPARQL engine to search all off dbpedia to find the page(s) that I will call ?episode that have a skos:subject value of anything. We will call that anything a season. We will have to count on the 2nd clause to restrict our results to ?episodes that have a blackboard variable for the blackboard gag.

LETS CHECK OUR RESULTS

The results look like Simpson blackboard gags, but lets make sure.

Thursday, July 29, 2010

The Simpson's Blackboard gags for Season 12

Let's expand your first query and get all the blackboard gags from the 12th Simpson season.

STARTING FROM ONE EPISODE

We'll start with the first sparql query's dbpedia page, the dbpedia version of The Worst Episode.
Here are the dbpedia page's sections identifying the episode, the blackboard gag for this episode, and lastly, the season this episode is part of. We will use this information for our query.

Note: click any picture to enlarge it.

THE QUERY DIAGRAM

THE SPARQL QUERY - SEASON 12 EPISODES WITH BLACKBOARD GAGS

SELECT distinct ?episode,?chalkboard_gag
WHERE 
{
 ?episode
   <http://www.w3.org/2004/02/skos/core#subject>
     <http://dbpedia.org/resource/Category:The_Simpsons_episodes%2C_season_12>
 .
 ?episode
   <http://dbpedia.org/property/blackboard>
     ?chalkboard_gag
 .
}

Show results.

SPECIFY YOUR OUTPUT VARIABLES

1) SELECT distinct ?episode,?chalkboard_gag

The select clause specifies what fields you want in your output.
The distinct keyword removes duplicate values of ?episode - ?chalkboard_gag combinations.
I specified the same two variables, ?episode and ?chalkboard_gag.

START OF YOUR SELECTION CRITERIA

2) WHERE { . . . } Enough said.

IDENTIFY THE EPISODES IN SEASON 12

4) ?episode  the subject
5)  <http://www.w3.org/2004/02/skos/core#subject>  the predicate
6)   <http://dbpedia.org/resource/Category:The_Simpsons_episodes%2C_season_12>  the object
7) .

Line 4) The subject we are talking about.
Line 5) At first sight, skos:subject didn't mean much to me, but when I saw its value refered to the Simpson's season 12, I figured it had to describe the season that this episode belonged to. I copied the link's value into my sparql query.
Line 6) For the object, I picked up the value of the long link to the right of skos:subject.
Line 7) The period marks the end of the first selection clause.
Recapping, lines 4-7 ask the SPARQL engine to search all off dbpedia to find the page(s) that I will call ?episode that have a skos:subject value of http://dbpedia.org/resource/Category:The_Simpsons_episodes%2C_season_12.

GET THE BLACKBOARD GAG


8) ?episode  the subject 
9)  <http://dbpedia.org/property/blackboard>  the predicate 
10)   ?chalkboard_gag  the object 
11) .

Line 8) By using the same variable, ?episode as subject in lines 4 and 8, I am telling SPARQL that I want to see the blackboard gag for the episode defined above.
Line 9) I copied the link,dbprop:blackboard from the dbpedia page as mentioned before.
Line 10) I specified the object as a variable called ?chalkboard_gag, meaning pick it up from the page (or really from the RDF data underlying the page).

This will now show different blackboard gags for different episodes.
Line 11) End of the 2nd search clause.
Line 12)The closing bracket ends your selection criteria.

LETS CHECK OUR RESULTS

So that's your second query. Let's compare our results. versus The object link to the Simpson episodes season 12.

We have a little problem. Our results show 7 gags and season 12 had 21 episodes. What gives?
Not all the episodes had blackboard gags, so lets ask for each episode in season 12 and then optionally show the blackboard gag if it exists.

SPARQL QUERY WITH THE BLACKBOARD GAG OPTIONAL

SELECT distinct ?episode,?chalkboard_gag
WHERE 
{
 ?episode
   <http://www.w3.org/2004/02/skos/core#subject>
<http://dbpedia.org/resource/Category:The_Simpsons_episodes%2C_season_12>
 .
 OPTIONAL
 { 
 ?episode
   <http://dbpedia.org/property/blackboard>
     ?chalkboard_gag
 .
 } # end of optional blackboard gag
} # end of where clause

Show results
Now there are 21 episodes and still the same 7 blackboard gags. That clears up the mystery.

Thursday, July 22, 2010

Your first Sparql Query

Let's jump right in and make your first query. We will get the blackboard gag from a single Simpson episode, The Worst Episode Ever.

FIND THE RDF DATA TO QUERY

I searched Google for "simpson episodes complete list".
I find a page with a fairly comprehensive list of seasons. I select season 12.
From the page for season 12, I pick a catchy title, Worst Episode Ever. This returns a wikipedia article about an individual episode I am interested in but the wiki page is designed for people to read and enjoy.
I want to find the dbpedia version of the page to make my Sparql querying simpler. I query google for dbpedia + "Worst Episode Ever" and find the the dbpedia version of this page.
Here is the dbpedia page's abstract, the section identifying the episode and the blackboard gag for this episode. We will use this information for our query.

Note: click any picture to enlarge it.

THE QUERY DIAGRAM

THE SPARQL QUERY - 1 EPISODE'S BLACKBOARD GAG

SELECT distinct ?episode,?chalkboard_gag
WHERE 
{
 ?episode
   <http://xmlns.com/foaf/0.1/page>
     <http://en.wikipedia.org/wiki/Worst_Episode_Ever>
 .
 ?episode
   <http://dbpedia.org/property/blackboard>
     ?chalkboard_gag
 .
}

Show results.

SPECIFY YOUR OUTPUT VARIABLES

1) SELECT distinct ?episode,?chalkboard_gag

The select clause specifies what fields you want in your output.
The distinct keyword removes duplicate values of ?episode - ?chalkboard_gag combinations. I usually use distinct to get the output as compact as possible in terms of the number of rows returned.
Variables are indicated by a question mark before an identifier that you specify. My two variables are ?episode and ?chalkboard_gag, although I could have called them something else.

START OF YOUR SELECTION CRITERIA

2) WHERE { . . . }

Within the brackets is the specification of which rows out of the potentially billions of rows in the RDF datastore you want to see at this moment.

IDENTIFY THE EPISODE

4) ?episode  the subject
5)  <http://xmlns.com/foaf/0.1/page>  the predicate
6)   <http://en.wikipedia.org/wiki/Worst_Episode_Ever>  the object
7) .

Line 4) As in English, the subject must be established so we know what we are talking about. We make the subject a variable by using a leading question mark.
Line 5) If you examine the dbpedia page, there are several ways to identify this episode, something akin to Oracle's primary key. I chose to identify the record by its foaf:page.

The link foaf:page is my predicate and I need to capture its web address.
In Firefox I simply right click the link and select Copy link location.
In Internet Explorer, I right click the link, select properties, and then copy the link from the address (URL) field.
This gives me http://xmlns.com/foaf/0.1/page. I enclose it in angled brackets, making it <http://xmlns.com/foaf/0.1/page>. Now I have a sparql predicate.
Line 6) To pick up the object, or value of foaf:page, I do the same to the link on the right. Enclosing it in angled brackets gives me:

<http://en.wikipedia.org/wiki/Worst_Episode_Ever>
Line 7) The period marks the end of the first selection clause. It doesn't have to be on a separate line, but I like to use it on its own line to better separate each clause, something important to me as the complexity of queries grows.
Recapping, lines 4-7 ask the SPARQL engine to search all off dbpedia to find the page(s) that I will call ?episode that have a foaf:page value of http://en.wikipedia.org/wiki/Worst_Episode_Ever.

GET THE BLACKBOARD GAG


8) ?episode  the subject 
9)  <http://dbpedia.org/property/blackboard>  the predicate 
10)   ?chalkboard_gag  the object 
11) .

Line 8) By using the same variable, ?episode as subject in lines 4 and 8, I am telling SPARQL that I want to see the blackboard gag for the episode defined above.
Line 9) I copied the link,dbprop:blackboard from the dbpedia page as mentioned before.
Line 10) I specified the object as a variable called ?chalkboard_gag, meaning pick it up from the page (or really from the RDF data underlying the page).
This will show: "I will not hide the teacher's medicine."
Line 11) End of the 2nd search clause.

Line 12) The closing bracket ends your selection criteria.
So that's your first query. Complicated at first, but as you do more and more, it will become second nature to you.

Wednesday, July 7, 2010

What is the RDF data format

RDF stands for Resource Description Framework. It is a way of representing information so that related facts can be easily combined.
For the technically inclined, see the

RDF Primer.

THE RELATIONAL APPROACH

In a typical database, you have tables that collect related information about a particular thing, i.e an employee. The table has columns such as id, name, age, etc. and each of those columns have values and usually types. The id might be a 4 digit number, the name could be a 40 character string, and the age a 3 digit number.

A RELATIONAL EMPLOYEE TABLE

Id	Name	Age
1	Bill Townsend	47
2	Mary Maxwell	33

The number of columns in tables vary from 1 to several hundred.
In this employee table, 1 row of the table stores all the various items about a single employee.

THE RDF APPROACH

Right now I am oversimplifying, but in the RDF model, the data would be stored something like this.

EMPLOYEE INFORMATION IN RDF FORM

?subject	?predicate	?object
<http://www.BillTownsend.com/me>	name	"Bill Townsend"
<http://www.BillTownsend.com/me>	age	47
<http://www.bestandbrightest.com/mmaxwell>	name	"Mary Maxwell"
<http://www.bestandbrightest.com/mmaxwell>	age	33

The basic rdf "table" structure will always have these same 3 columns.
In the RDF model, 1 row stores the subject, and 1 item of information about that subject. It will take many rows or triplets (subject,predicate,object combinations) to fully describe the subject.

The subject is a web address that the whole world could use to uniquely identify this person. Bill might have created that web page for his resume, but now that web address could be used by anyone to record information about him.

The predicates are comparable to the relational database world's column names. It's best to use predicates that are already known to the RDF/SPARQL community so people know what you are talking about.

The objects correspond to the database column values. They could be literals or other web addresses that become the subject of additional predicates and objects.

Graphical Representation of RDF Data

In my time grappling with Sparql and RDF, I've found it most helpful to plan my queries visually, and even to analyze queries visually. I will describe the convention I will use in this blog.

The subjects for my little diagrams will be rounded, because they must be web addresses.

The predicates will be labels on the arrows.

The objects will be square if they are literals and will be rounded if they are a web address, which can have other information hanging off it.

Thursday, July 1, 2010

What is the semantic web

The semantic web consists of web pages that are organized so that they not only make sense to people reading them, but allow software to easily pull information out of them.

I give thanks to the weblog of Bob DuCharme which showed how much fun Sparql could be. He showed a query used to get all the blackboard gags that Bart wrote on the blackboard during season 12. My first queries brought me a lot of laughs as I began to retrieve joke after joke without visiting each and every web page describing each episode.

I deal with government fishery databases every work day. I never realized that the developers of the Simpson's television series were methodically recording various details about each show in wikipedia and that these descriptions of each episode could be processed as if they were records in a database. This is like the metadata that computer programmers are supposed to write to document their programs.

What is Linked Data from Internet pioneer Sir Tim Berners-Lee

Here is Sir Tim Berners-Lee on this at TED:

Michael Hausenblas's introductions: