Thursday, July 22, 2010

Your first Sparql Query

Let's jump right in and make your first query. We will get the blackboard gag from a single Simpson episode, The Worst Episode Ever.

FIND THE RDF DATA TO QUERY

  • I searched Google for "simpson episodes complete list".
  • I find a page with a fairly comprehensive list of seasons. I select season 12.
  • From the page for season 12, I pick a catchy title, Worst Episode Ever. This returns a wikipedia article about an individual episode I am interested in but the wiki page is designed for people to read and enjoy.
  • I want to find the dbpedia version of the page to make my Sparql querying simpler. I query google for dbpedia + "Worst Episode Ever" and find the the dbpedia version of this page.
  • Here is the dbpedia page's abstract, the section identifying the episode and the blackboard gag for this episode. We will use this information for our query.

    Note: click any picture to enlarge it.








THE QUERY DIAGRAM

 

THE SPARQL QUERY - 1 EPISODE'S BLACKBOARD GAG

SELECT distinct ?episode,?chalkboard_gag
WHERE 
{
 ?episode
   <http://xmlns.com/foaf/0.1/page>
     <http://en.wikipedia.org/wiki/Worst_Episode_Ever>
 .
 ?episode
   <http://dbpedia.org/property/blackboard>
     ?chalkboard_gag
 .
}

Show results.


SPECIFY YOUR OUTPUT VARIABLES

1) SELECT distinct ?episode,?chalkboard_gag
  • The select clause specifies what fields you want in your output.
  • The distinct keyword removes duplicate values of ?episode - ?chalkboard_gag combinations. I usually use distinct to get the output as compact as possible in terms of the number of rows returned.
  • Variables are indicated by a question mark before an identifier that you specify. My two variables are ?episode and ?chalkboard_gag, although I could have called them something else.

START OF YOUR SELECTION CRITERIA

2) WHERE { . . . }

Within the brackets is the specification of which rows out of the potentially billions of rows in the RDF datastore you want to see at this moment.

IDENTIFY THE EPISODE

4) ?episode  the subject
5)  <http://xmlns.com/foaf/0.1/page>  the predicate
6)   <http://en.wikipedia.org/wiki/Worst_Episode_Ever>  the object
7) .

  • Line 4) As in English, the subject must be established so we know what we are talking about. We make the subject a variable by using a leading question mark.
  • Line 5) If you examine the dbpedia page, there are several ways to identify this episode, something akin to Oracle's primary key. I chose to identify the record by its foaf:page.








    The link foaf:page is my predicate and I need to capture its web address.
    In Firefox I simply right click the link and select Copy link location.
    In Internet Explorer, I right click the link, select properties, and then copy the link from the address (URL) field.
    This gives me http://xmlns.com/foaf/0.1/page. I enclose it in angled brackets, making it <http://xmlns.com/foaf/0.1/page>. Now I have a sparql predicate.
  • Line 6) To pick up the object, or value of foaf:page, I do the same to the link on the right. Enclosing it in angled brackets gives me:

    <http://en.wikipedia.org/wiki/Worst_Episode_Ever>
  • Line 7) The period marks the end of the first selection clause. It doesn't have to be on a separate line, but I like to use it on its own line to better separate each clause, something important to me as the complexity of queries grows.
  • Recapping, lines 4-7 ask the SPARQL engine to search all off dbpedia to find the page(s) that I will call ?episode that have a foaf:page value of http://en.wikipedia.org/wiki/Worst_Episode_Ever.

GET THE BLACKBOARD GAG


8) ?episode  the subject 
9)  <http://dbpedia.org/property/blackboard>  the predicate 
10)   ?chalkboard_gag  the object 
11) .

  • Line 8) By using the same variable, ?episode as subject in lines 4 and 8, I am telling SPARQL that I want to see the blackboard gag for the episode defined above.
  • Line 9) I copied the link,dbprop:blackboard from the dbpedia page as mentioned before.
  • Line 10) I specified the object as a variable called ?chalkboard_gag, meaning pick it up from the page (or really from the RDF data underlying the page).
    This will show: "I will not hide the teacher's medicine."
  • Line 11) End of the 2nd search clause.
Line 12) The closing bracket ends your selection criteria.
So that's your first query. Complicated at first, but as you do more and more, it will become second nature to you.

No comments:

Post a Comment