Thursday, August 26, 2010

Dbpedia Reverse Links - is dbpedia-owl:artist of

While still using the dbpedia Virtuoso browser, http://dbpedia.org/sparql, pointing to a single page about Joan Baez, lets explore what it takes to retrieve data from reverse links, in this case is dbpedia-owl:artist of.

When dbpedia shows is somePredicate of, then the subject of the current page, in this case Joan Baez, becomes the object on another page, while the values to the right become the subjects of those other pages. It will become clearer when you examine the results of the following query.


In the Joan Baez Page:

We see the predicate, is dbpedia-owl:artist of to the left, and a number of other dbpedia links listed as the object to the right. I'm showing just the first few entries because Joan Baez is quite a prolific artist.


In the   Come From the Shadows   Page:

I clicked on the first object link from the Joan Baez page, dbpedia:Come_from_the_Shadows to see what it was.

We see the predicate, dbpedia-owl:artist to the left, and dbpedia:Joan Baez listed as the object to the right.



JOAN BAEZ AS THE OBJECT INSTEAD OF THE SUBJECT

SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/artist>
       <http://dbpedia.org/resource/Joan_Baez>
 .
}

This query asks SPARQL TO list all subject, ?s, where the artist is Joan Baez.


RUN QUERY


See the results.

WHAT DID WE RETRIEVE?

It looks like these are her albums, but let's make sure. Lets get the rdf:type of each subject. I copied the link for rdf:type as I have done all the other links so far. This time it caused some problems.


SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/artist>
       <http://dbpedia.org/resource/Joan_Baez>;
    <http://dbpedia.org/about/html/http://www.w3.org/1999/02/22-rdf-syntax-ns%01type>
       ?type.
}

RUN QUERY

See the results.

The dreaded blank column headers doesn't tell me what's wrong, but the long predicate picked up from the link rdf:type looks suspicious. <http://dbpedia.org/about/html/http://www.w3.org/1999/02/22-rdf-syntax-ns%01type> is very long and different from our other links that worked.

SUPPLYING THE RIGHT PREDICATE FOR RDF:TYPE

Viewing the source of the Come from the Shadows page, and looking for the string, rdf:type, I find a simpler URI to use, http://www.w3.org/1999/02/22-rdf-syntax-ns# which works.


SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/artist>
       <http://dbpedia.org/resource/Joan_Baez> ;
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type > 
       ?type.
}
See the results.

IT WORKED BUT EACH ALBUM IS REPEATED MULTIPLE TIMES

The last set of results list Come from the Shadows 4 times because it's page gave it 4 different types: an owl thing, a MusicalWork, a Work and an Album. Most of the albums appear multiple times as well.


LIST EACH ALBUM ONCE

I looked through the Come from the Shadows page to find a description of what it was that was a single entry. I found the predicate, http://dbpedia.org/ontology/type. Let's try that as our predicate and sort by subject to see if we did indeed list each album only once.


SELECT * WHERE 
{?s 
    <http://dbpedia.org/ontology/artist>
       <http://dbpedia.org/resource/Joan_Baez> ;
    <http://dbpedia.org/ontology/type > 
       ?type.
}
order by ?s
See the results.

Yes, mission accomplished! We have each album listed only once, and dbpedia has classified the albums as Album, Compilation_album, Live_album or a Studio_album.

No comments:

Post a Comment