Wednesday, 30 of July of 2014

Semantic Web application

This posting outlines the implementation of a Semantic Web application.

Many people seem to think the ideas behind the Semantic Web (and linked data) are interesting, but many people are also waiting to see some of the benefits before committing resources to the effort. This is what I call the “chicken & egg problem of the linked data”.

While I have not created the application outlined below, I think it is more than feasible. It is a sort of inference engine feed with a URI and integer, both supplied by a person. Its ultimate goal is to find relationships between URIs that were not immediately or readily apparent.* It is a sort of “find more like this one” application. Here’s the algorithm:

  1. Allow the reader to select an actionable URI of personal interest, ideally a URI from the set of URIs you curate
  2. Submit the URI to an HTTP server or SPARQL endpoint and request RDF as output
  3. Save the output to a local store
  4. For each subject and object URI found the output, go to Step #2
  5. Go to step #2 n times for each newly harvested URI in the store where n is a reader-defined integer greater than 1; in other words, harvest more and more URIs, predicates, and literals based on the previously harvested URIs
  6. Create a set of human readable services/reports against the content of the store, and think of these services/reports akin to a type of finding aid, reference material, or museum exhibit of the future. Example services/reports might include:
    • hierarchal lists of all classes and properties – This would be a sort of semantic map. Each item on the map would be clickable allowing the reader to read more and drill down.
    • text mining reports – collect into a single “bag of words” all the literals saved in the store and create: word clouds, alphabetical lists, concordances, bibliographies, directories, gazetteers, tabulations of parts of speech, named entities, sentiment analyses, topic models, etc.
    • maps – use place names and geographic coordinates to implement a geographic information service
    • audio-visual mash-ups – bring together all the media information and create things like slideshows, movies, analyses of colors, shapes, patterns, etc.
    • search interfaces – implement a search interface against the result, SPARQL or otherwise
    • facts – remember SPARQL queries can return more than just lists. They can return mathematical results such as sums, ratios, standard deviations, etc. It can also return Boolean values helpful in answering yes/no questions. You could have a set of canned fact queries such as, how many ontologies are represented in the store. Is the number of ontologies greater than 3? Are there more than 100 names represented in this set? The count of languages used in the set, etc.
  7. Allow the reader to identify a new URI of personal interest, specifically one garnered from the reports generated in Step #6.
  8. Go to Step #2, but this time have the inference engine be more selective by having it try to crawl back to your namespace and set of locally curated URIs.
  9. Return to the reader the URIs identified in Step #8, and by consequence, these URIs ought to share some of the same characteristics as the very first URI; you have implemented a “find more like this one” tool. You, as curator of the collection of URIs might have thought the relations between the first URI and set of final URIs was obvious, but those relationships would not necessarily be obvious to the reader, and therefore new knowledge would have been created or brought to light.
  10. If there are no new URIs from Step #7, then go to Step #6 using the newly harvested content.
  11. Done. If a system were created such as the one above, then the reader would quite likely have acquired some new knowledge, and this would be especially true the greater the size of n in Step #5.
  12. Repeat. Optionally, have a computer program repeat the process with every URI in your curated collection, and have the program save the results for your inspection. You may find relationships you did not perceive previously.

I believe many people perceive the ideas behind the Semantic Web to be akin to investigations in artificial intelligence. To some degree this is true, and investigations into artificial intelligence seem to come and go in waves. “Expert systems” and “neural networks” were incarnations of artificial intelligence more than twenty years ago. Maybe the Semantic Web is just another in a long wave of forays.

On the other hand, Semantic Web applications do not need to be so sublime. They can be as simple as discovery systems, browsable interfaces, or even word clouds. The ideas behind the Semantic Web and linked data are implementable. It just a shame that nothing is catching the attention of the wider audiences.

* Remember, URIs are identifiers intended to represent real world objects and/or descriptions of real-world objects. URIs are perfect for cultural heritage institutions because cultural heritage institutions maintain both.


Leave a comment


Comments RSS TrackBack 1 comment