Monday, 28 of July of 2014

SPARQL tutorial

This is the simplest of SPARQL tutorials. The tutorial’s purpose is two-fold: 1) through a set of examples, introduce the reader to the syntax of SPARQL queries, and 2) to enable the reader to initially explore any RDF triple store which is exposed as a SPARQL endpoint.

SPARQL (SPARQL protocol and RDF query language) is a set of commands used to search RDF triple stores. It is modeled after SQL (structured query language), the set of commands used to search relational databases. If you are familiar with SQL, then SPARQL will be familiar. If not, then think of SPARQL queries as formalized sentences used to ask a question and get back a list of answers.

Also, remember, RDF is a data structure of triples: 1) subjects, 2) predicates, and 3) objects. The subjects of the triples are always URIs — identifiers of “things”. Predicates are also URIs, but these URIs are intended to denote relationships between subjects and objects. Objects are preferably URIs but they can also be literals (words or numbers). Finally, RDF objects and predicates are defined in human-created ontologies as a set of classes and properties where classes are abstract concepts and properties are characteristics of the concepts.

Try the following steps with just about any SPARQL endpoint:

  1. Get an overview- A good way to begin is to get a list of all the ontological classes in the triple store. In essence, the query below says, “Find all the unique objects in the triple store where any subject is a type of object, and sort the result by object.”

  2. Learn about the employed ontologies- Ideally, each of the items in the result will be an actionable URI in the form of a “cool URL”. Using your Web browser, you ought to be able to go to the URL and read a thorough description of the given class, but the URLs are not always actionable.
  3. Learn more about the employed ontologies- Using the following query you can create a list of all the properties in the triple store as well as infer some of the characteristics of each class. Unfortunately, this particular query is intense. It may require a long time to process or may not return at all. In English, the query says, “Find all the unique predicates where the RDF triple has any subject, any predicate, or any object, and sort the result by predicate.”

  4. Guess- Steps #2 and Step #3 are time intensive, and consequently it is sometimes easier just browse the triple store by selecting one of the “cool URLs” returned in Step #1. You can submit a modified version of Step #1′s query. It says, “Find all the subjects where any RDF subject (URI) is a type of object (class)”. Using the
    LiAM triple store, the following query tries to find all the things that are EAD finding aids.

  5. Learn about a specific thing- The result of Step #4 ought to be a list of (hopefully actionable) URIs. You can learn everything about that URI with the following query. It says, “Find all the predicates and objects in the triple store where the RDF triple’s subject is a given value and the predicate and object are of any value, and sort the result by predicate”. In this case, the given value is one of the items returned from Step #4.

  6. Repeat a few times- If the results from Step #5 returned seemingly meaningful and complete information about your selected URI, then repeat Step #5 a few times to get a better feel for some of the “things” in the triple store. If the results were not meaningful, then got to Step #4 to browser another class.
  7. Take these hints- The first of these following two queries generates a list of ten URIs pointing to things that came from MARC records. The second query is used to return everything about a specific URI whose data came from a MARC record.

  8. Read the manual- At this point, it is a good idea to go back to Step #2 and read the more formal descriptions of the underlying ontologies.
  9. Browse some more- If the results of Step #3 returned successfully, then browse the objects in the triple store by selecting a predicate of interest. The following queries demonstrate how to list things like titles, creators, names, and notes.

  10. Read about SPARQL- This was the tiniest of SPARQL tutorials. Using the
    LiAM data setas an example, it demonstrated how to do the all but simplest queries against a RDF triple store. There is a whole lot more to SPARQL than SELECT, DISTINCT, WHERE, ORDER BY, AND LIMIT commands. SPARQL supports a short-hand way of denoting classes and properties called PREFIX. It supports Boolean operations, limiting results based on “regular expressions”, and a few mathematical functions. SPARQL can also be used to do inserts and deletes against the triple store. The next step is to read more about SPARQL. Consider reading the
    canonical documentationfrom the W3C, “
    SPARQL by example“, and the Jena project’s “
    SPARQL Tutorial“. [1, 2, 3]

Finally, don’t be too intimidated about SPARQL. Yes, it is possible to submit SPARQL queries by hand, but in reality, person-friendly front-ends are expected to be created making search much easier.


Leave a comment


Comments RSS TrackBack 2 comments