Wednesday, November 24, 2010

Sparql - Google maps mashup - 2 markers hand coded

I'd like to have Sparql locations charted on a google map, but I have to take it one baby step at a time.
I put in 2 markers which I hand coded.

Image in question

Monday, November 22, 2010

Sparql - Google maps mashup - Auto generated map

I'd like to have Sparql locations charted on a google map, but I have to take it one baby step at a time.

This used code generated by The Google Map Wizard. I just wanted to see if I could get a map to appear with a single point on it.

Wednesday, September 15, 2010

Using Named Graphs - Joan Baez and Bob Dylan

Lets use the SPARQL named graphs functionality to process only the dbpedia records for Joan Baez and Bob Dylan. Your query is almost instantaneous because you process only the graphs you specify, rather than searching through all of dbpedia. These two graphs have only a few hundred triples between them.

This use of named graphs is quite similar to Oracle's uses of multiple tables in the FROM clause of the select statement. The database may have hundreds or thousands of tables, but you only want information from the specified tables.

SHOW HOW THESE SINGERS ARE CLASSIFIED

The way to name the source graph is shown in bold text.

We'll show how the singers are classified by using the skos:subject predicate.

We'll order results so we can see what classification these artists share when we do a visual inspection.

SELECT ?o ?s 
FROM <http://dbpedia.org/resource/Bob_Dylan>
FROM <http://dbpedia.org/resource/Joan_Baez>
WHERE { ?s <http://www.w3.org/2004/02/skos/core#subject>?o . }
order by ?o ?s

See the results.

GROUPING BY HOW THESE SINGERS ARE CLASSIFIED

In order to have only 1 row per classification, I had to output the classification, ?o, without ?s.

I did process the ?s variable within group functions.

count(distinct ?s) as ?singerCount

shows 2 where both singers have the same classification, and 1 when the classification is unique to either artist.

min(?s) as ?firstSinger
max(?s) as ?lastSinger

shows which artist(s) had the classification. When there is only 1 artist with that classification, their name appears in both positions. It's a little less elegant than I'd like, but it worked.

SELECT ?o 
count(distinct ?s) as ?singerCount 
min(?s) as ?firstSinger 
max(?s) as ?lastSinger
FROM <http://dbpedia.org/resource/Bob_Dylan>
FROM <http://dbpedia.org/resource/Joan_Baez>
WHERE 
{ ?s <http://www.w3.org/2004/02/skos/core#subject> ?o .
}
order by desc(?singerCount) ?o

See the results.

SHOW ONLY THE CLASSIFICATIONS THEY BOTH SHARE

The final query shows just the classifications that are exactly the same for both artists. This is accomplished by having the same object, ?o, in both graphs.

SELECT ?o
WHERE
{
  GRAPH <http://dbpedia.org/resource/Bob_Dylan>
  {
    ?Dylan <http://www.w3.org/2004/02/skos/core#subject> ?o
  } .
  GRAPH <http://dbpedia.org/resource/Joan_Baez>
  {
    ?Baez <http://www.w3.org/2004/02/skos/core#subject> ?o
  }.
}
order by ?o

See the results.

Friday, August 27, 2010

Counting and Summing - US States and their populations

Lets switch to the Census Bureau's SPARQL test page,http://www.rdfabout.com/demo/census/sparql.xpd, and find out some information about the 50 states of the United States.

It always helps to have the sparql engine give some guidance, or some starting queries. Still the bulk of your SPARQL queries will come from self-discovery of the subjects, predicates and the objects in the database.

Note: After playing a little with the Census Bureau's SPARQL test page, I switched back to the dbpedia Virtuoso browser, http://dbpedia.org/sparql because its has better support for aggregate functions such as COUNT and SUM, and it is easier to save the queries I am discussing and easier to show you the live results.

ANALYZING THE QUERY ABOUT THE UNITED STATES

Let's look at the query in the query box more closely.

<http://www.rdfabout.com/rdf/usgov/geo/us> is our subject.

?p represents any predicate, or column name about the US

?o represents any object or column value.

This is a very common way to initially explore what the data set contains about the subject you are interested in. After you see the values returned, you can begin to restrict the predicates or objects to close in on what you are interested in. This wild card type of querying also helps with copying a workable predicate or object in case your first attempts at removing the PREFIX fail.

FILTER(!isBlank(?o)) . - removes predicates without meaningful information.

FILTER(!regex(str(?o), "zcta")) . removes about 100 different census groups, subdivisions of the United States that are probably only meaningful to the census team.

See the results.

JUST SHOW THE STATES

We'll focus on just the states by specifying the predicate <http://purl.org/dc/terms/hasPart> using the PREFIX representation.

We'll also order by state to make duplicate checking easier.

PREFIX     rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dcterms:  <http://purl.org/dc/terms/>

SELECT * WHERE {
  <http://www.rdfabout.com/rdf/usgov/geo/us>
    dcterms:hasPart
     ?o .
  FILTER(!isBlank(?o)) .
  FILTER(!regex(str(?o), "zcta")) .
}
order by ?o

See the results.

Now we have only states, but do we have all 50 states?

LETS COUNT THE STATES

Counting in SPARQL is a little bit of a pain with the virtuoso browser because you have to feed the count function another, duplicate set of data. In this case, I called the objects ?state when counting them.

PREFIX     geo: 
PREFIX     rdf: 
PREFIX dcterms: 

SELECT ?o, count(distinct ?state) as ?stateCount WHERE 
{{
  <http://www.rdfabout.com/rdf/usgov/geo/us>
    dcterms:hasPart
     ?o .
  FILTER(!isBlank(?o)) .
  FILTER(!regex(str(?o), "zcta")) .
 } UNION {
  <http://www.rdfabout.com/rdf/usgov/geo/us>
    dcterms:hasPart
     ?state .
  FILTER(!isBlank(?state)) .
  FILTER(!regex(str(?state), "zcta")) .
 }
}
order by ?o

See the results.

We got 52 states. Let's add the state name to find the extra records.

WHAT IS THE STATE NAME CALLED?

To find out what the state name is called, we have to explore the state data by getting another set of triplets using the clause,?o ?p2 ?o2. This means, for the same object return all predicates identified as p2 and those detail objects, identified as ?o2.

We also add the new fields ?p2 and ?o2 to our select list, the variables we want to display.

PREFIX     rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dcterms:  <http://purl.org/dc/terms/>

SELECT ?o, count(distinct ?state) as ?stateCount, ?p2, ?o2 WHERE 
{{
  <http://www.rdfabout.com/rdf/usgov/geo/us>
        dcterms:hasPart
           ?o .
       FILTER(!isBlank(?o)) .
       FILTER(!regex(str(?o), "zcta")) .
          ?o
               ?p2
                  ?o2
              .
          
         }
         UNION 
         {
          <http://www.rdfabout.com/rdf/usgov/geo/us>
             dcterms:hasPart
               ?state .
           FILTER(!isBlank(?state)) .
           FILTER(!regex(str(?state), "zcta")) .
         }
        }
        order by ?o

See the results.

I search all the records for the string Alaska and find it as the object of the predicate, http://purl.org/dc/elements/1.1/title. That is the predicate for state name.

DISPLAY THE STATE NAME

We will break the state name predicate into 2 pieces and now show just the fields we really want.

PREFIX           rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX       dcterms:  <http://purl.org/dc/terms/>
PREFIX    dcelements:  <http://purl.org/dc/elements/1.1/>

SELECT ?o,?stateName, count(distinct ?state) as ?stateCount WHERE 
{{
  <http://www.rdfabout.com/rdf/usgov/geo/us>
        dcterms:hasPart
           ?o .
       FILTER(!isBlank(?o)) .
       FILTER(!regex(str(?o), "zcta")) .
            ?o
               dcelements:title
                  ?stateName            
             .
         }
         UNION 
         {
          <http://www.rdfabout.com/rdf/usgov/geo/us>
             dcterms:hasPart
               ?state .
           FILTER(!isBlank(?state)) .
           FILTER(!regex(str(?state), "zcta")) .
         }
        }
        order by ?o

See the results.

Now I see why we have 52 records listed -- the count record adds 1 to the tally and the District of Columbia appears as a part of the United States.

DISPLAY THE STATE POPULATION

Going back to the same query that revealed the state name, I gathered that the state population was given by
http://www.rdfabout.com/rdf/schema/census/population.
I broke the population predicate into 2 pieces and now also show the population in the select list.

PREFIX        census: <http://www.rdfabout.com/rdf/schema/census/>
PREFIX    dcelements: <http://purl.org/dc/elements/1.1/>
PREFIX       dcterms: <http://purl.org/dc/terms/>
PREFIX           rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>


SELECT ?o, ?stateName,?population, count(distinct ?state) as ?stateCount WHERE 
{{
  <http://www.rdfabout.com/rdf/usgov/geo/us>
        dcterms:hasPart
           ?o .
       FILTER(!isBlank(?o)) .
       FILTER(!regex(str(?o), "zcta")) .
            ?o
               dcelements:title
                  ?stateName;            
               census:population
                  ?population
              .
          
         }
         UNION 
         {
          <http://www.rdfabout.com/rdf/usgov/geo/us>
             dcterms:hasPart
               ?state .
           FILTER(!isBlank(?state)) .
           FILTER(!regex(str(?state), "zcta")) .
         }
        }
        order by ?o

See the results.

Now I see population figures for each state, but I'm not sure at what time they were recorded. I would need to check them against other sources as well if I was trying to speak authoritatively about population.

SUM THE STATE POPULATIONS TO CALCULATE THE TOTAL US POPULATION

Now in the select list I'll sum up the individual state population figures that was produced in the 2nd unioned subquery that I had previously used to compute the state count. This produces a grand total population.

PREFIX        census: <http://www.rdfabout.com/rdf/schema/census/>
PREFIX    dcelements: <http://purl.org/dc/elements/1.1/>
PREFIX       dcterms: <http://purl.org/dc/terms/>
PREFIX           rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>


SELECT ?o, ?stateName, 
?population,sum(?statePopulation) as ?totalPopulation,
count(distinct ?state) as ?stateCount WHERE 
{{
  <http://www.rdfabout.com/rdf/usgov/geo/us>
        dcterms:hasPart
           ?o .
       FILTER(!isBlank(?o)) .
       FILTER(!regex(str(?o), "zcta")) .
            ?o
               dcelements:title
                  ?stateName;            
               census:population
                  ?population
              .
          
         }
         UNION 
         {
          <http://www.rdfabout.com/rdf/usgov/geo/us>
             dcterms:hasPart
               ?state .
                ?state
                 census:population
                   ?statePopulation
               .
            
           FILTER(!isBlank(?state)) .
           FILTER(!regex(str(?state), "zcta")) .
         }
        }
        order by ?o

See the results.

Now I see total population for the US, but at some unspecified period of time.

Prefixes in SPARQL Queries

While still using the dbpedia Virtuoso browser, http://dbpedia.org/sparql, pointing to a single page about Joan Baez, lets explore the use of prefixes to shorten our links.

Prefixes facilitate communication by helping you use standard predicates and objects to describe your data so the user doesn't have to work so hard to understand you.

SUPPLYING THE RIGHT PREDICATE FOR RDF:TYPE

Viewing the source of the Come from the Shadows page, and looking for the string, rdf:type, I find a simpler URI to use, http://www.w3.org/1999/02/22-rdf-syntax-ns# which works.

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/artist>
       <http://dbpedia.org/resource/Joan_Baez> ;
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type > 
       ?type.
}

See the results.

ABBREVIATE A PREDICATE USING A PREFIX

Instead of using the full URI in the previous example, I can:

1) define a prefix, rdf:,

2) list the URI to identify the prefix enclosed in angled brackets, and then

3) use the prefix to express a URI more compactly, rdf:type

Doing that gives us the same results.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/artist>
       <http://dbpedia.org/resource/Joan_Baez> ;
     rdf:type 
       ?type.
}

See the results.

Using the URI of the prefix, we can go to that web page and learn more about related predicates that might be available in that namespace. These namespaces are important for explaining whether, for example, the chip we are discussing is a computer chip, a potato chip, a paint chip or a cow chip.

OTHER COMMON PREFIX

PREFIX owl: <http://www.w3.org/2002/07/owl#>: Provides metadata about concepts and how they are related to other concepts.
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>: Provides information about how the information is structured and often used to specify the data type of a particular piece of information.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>: Provides metadata about the RDF data such as related concepts, user friendly labels, comments about the information.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>: Provides very fundamental classification of the information.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>: Provides information about people and their inter-relationships (friend of a friend), and how to contact them.
PREFIX dc: <http://purl.org/dc/elements/1.1/>: Provides the Dublic Core items of metadata. It is often used to describe published information and the publisher of that information.
PREFIX : <http://dbpedia.org/resource/>: With this empty prefix, we could then refer to the dbpedia resource <http://dbpedia.org/resource/Joan_ Baez> simply as :Joan_Baez
PREFIX dbpedia2: <http://dbpedia.org/property/>: This describes the properties that Dbpedia uses to describe things or subjects it has information about. These properties are appropriate to the thing, so a river would have a different set of pre-defined properties than a mathemetician.
PREFIX dbpedia: <http://dbpedia.org/>: Provides information coming from dbpedia..
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>: Simple Knowledge Organization System (SKOS) is a common data model for sharing and linking knowledge organization systems via the Semantic Web.
PREFIX dbo: <http://dbpedia.org/ontology/>: Provides metadata about what entities exist, how they can be grouped, how they related within a hierarchy, and how they can be subdivided according to similarities and differences.

Thursday, August 26, 2010

Dbpedia Reverse Links - is dbpedia-owl:artist of

While still using the dbpedia Virtuoso browser, http://dbpedia.org/sparql, pointing to a single page about Joan Baez, lets explore what it takes to retrieve data from reverse links, in this case is dbpedia-owl:artist of.

When dbpedia shows is somePredicate of, then the subject of the current page, in this case Joan Baez, becomes the object on another page, while the values to the right become the subjects of those other pages. It will become clearer when you examine the results of the following query.

In the Joan Baez Page:

We see the predicate, is dbpedia-owl:artist of to the left, and a number of other dbpedia links listed as the object to the right. I'm showing just the first few entries because Joan Baez is quite a prolific artist.

In the Come From the Shadows Page:

I clicked on the first object link from the Joan Baez page, dbpedia:Come_from_the_Shadows to see what it was.

We see the predicate, dbpedia-owl:artist to the left, and dbpedia:Joan Baez listed as the object to the right.

JOAN BAEZ AS THE OBJECT INSTEAD OF THE SUBJECT

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/artist>
       <http://dbpedia.org/resource/Joan_Baez>
 .
}

This query asks SPARQL TO list all subject, ?s, where the artist is Joan Baez.

RUN QUERY

See the results.

WHAT DID WE RETRIEVE?

It looks like these are her albums, but let's make sure. Lets get the rdf:type of each subject. I copied the link for rdf:type as I have done all the other links so far. This time it caused some problems.

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/artist>
       <http://dbpedia.org/resource/Joan_Baez>;
    <http://dbpedia.org/about/html/http://www.w3.org/1999/02/22-rdf-syntax-ns%01type>
       ?type.
}

RUN QUERY

See the results.

The dreaded blank column headers doesn't tell me what's wrong, but the long predicate picked up from the link rdf:type looks suspicious. <http://dbpedia.org/about/html/http://www.w3.org/1999/02/22-rdf-syntax-ns%01type> is very long and different from our other links that worked.

SUPPLYING THE RIGHT PREDICATE FOR RDF:TYPE

Viewing the source of the Come from the Shadows page, and looking for the string, rdf:type, I find a simpler URI to use, http://www.w3.org/1999/02/22-rdf-syntax-ns# which works.

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/artist>
       <http://dbpedia.org/resource/Joan_Baez> ;
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type > 
       ?type.
}

See the results.

IT WORKED BUT EACH ALBUM IS REPEATED MULTIPLE TIMES

The last set of results list Come from the Shadows 4 times because it's page gave it 4 different types: an owl thing, a MusicalWork, a Work and an Album. Most of the albums appear multiple times as well.

LIST EACH ALBUM ONCE

I looked through the Come from the Shadows page to find a description of what it was that was a single entry. I found the predicate, http://dbpedia.org/ontology/type. Let's try that as our predicate and sort by subject to see if we did indeed list each album only once.

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/artist>
       <http://dbpedia.org/resource/Joan_Baez> ;
    <http://dbpedia.org/ontology/type > 
       ?type.
}
order by ?s

See the results.

Yes, mission accomplished! We have each album listed only once, and dbpedia has classified the albums as Album, Compilation_album, Live_album or a Studio_album.

Wednesday, August 25, 2010

Processing Dates - When was Joan Baez born

While still using the dbpedia Virtuoso browser, http://dbpedia.org/sparql, pointing to a single page about Joan Baez, lets explore what it takes to retrieve date information.

WHEN WAS SHE BORN AND WHEN SHE BECAME ACTIVE

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/birthDate>
      ?birthDate;
    <http://dbpedia.org/ontology/activeYearsStartYear>
      ?firstActive.
}

The subject ?s will point back to this page.

The first predicate was the dbpedia-owl:birthDate link copied from the page.

The first object ?birthDate will be the info returned from the query.

It is followed by a semicolon instead of a period meaning repeat the same subject for the next predicate - object pair. I use indentation to remind myself that I am seeking info about the same subject, but the indentation is not significant to the SPARQL engine.

The second predicate, was the dbpedia-owl:ActiveYearsStartYear link copied from the page.

The second object, will be when she became active musically, presumably when she appeared on the world's stage, rather than the first time she sang or played a guitar.

RUN QUERY

See the results.

FILTERING DATES

Your first attempt to filter by a date in the form of yyyy-mm-dd will fail - giving you the dreaded empty column headers feedback.

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/birthDate>
      ?birthdate;
    <http://dbpedia.org/ontology/activeYearsStartYear>
      ?firstActive.
 FILTER(?birthdate > "1940-01-01").
}

RUN QUERY

See the results.

FILTERING DATES USING DATETIME TYPE

Looking at the SPARQL manual on 11 Testing Values you see that you can't match a dateTime against a simple string, but rather you must match against a dateTime string as indicated by the 2 circumflex characters ^^ then the prefix xsd: and finally the datatype, dateTime.

By matching a dateTime to a dateTime, you will not see your results.

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/birthDate>
      ?birthdate;
    <http://dbpedia.org/ontology/activeYearsStartYear>
      ?firstActive.
 FILTER(?birthdate > "1940-01-01"^^xsd:dateTime).
}

RUN QUERY

See the results.

COMPARING STRINGS TO STRINGS

The same results would be obtained with the following filter condition:

FILTER(str(?birthdate) > "1940-01-01"). because you are first converting the ?birthdate to a string before comparing it to another string.

The str function is also handy for changing URI's (or web identifiers) into strings for simpler filtering.

Subscribe to: Comments (Atom)