Tuesday, November 27, 2007

Embedding OWL-RDFS syntax in XHTML with RDFa

Short introduction to RDFa, OWL and Microformats


The OWL-language (web ontology language) is a recommendation by the W3C. is a language that allows proper definition and representation of . It is supposed to form the foundation of the next generation of the internet: the semantic web (a.k.a. web3.0). Unfortunately, even though it has been a recommendation since 2004, it has gained very little traction in the online community.
There are several reasons for this, and some of them are discussed in this article.

Microformats are small structural mark-ups of HTML, with a very consistent format. They are somewhat of a web 2 and a half approach, allowing machine readability through consistent structure, but lacking the flexibility to capture more than what is defined in the limited Microformat standard. Microformats have more or less emerged spontaneously. Through their ease of use, Microformats have gained a lot of momentum, but as mentioned, they lack the expressivity required to really push the boundaries of the current internet experience.

This article is about coming up with a solution that reconciles the ease of use of Microformats with the expressivity of a language like OWL.
Some problems hindering OWL adoptation will be highlighted, and a first experiment with the use of RDFa mark-up to embed OWL data directly into an XHTML page will be demonstrated, a solution that can be considered as a step higher than Microformats on the evolutionary ladder of the web.


Terminology frequently used in this article:
Instance

A real world occurrence of a class (a bordeaux wine you just bought in the store).

Class

A unit of meaning (wine as a 'category').

Parent

With meaning a little broader than the given Class (less specific), for example 'Potable (drinkable) Liquid' in the case of wine.

Semantic items

Human and machine understandable content.



OWL-RDFS


There are three sublanguages of OWL, named OWL-Full, OWL-DL and OWL-Lite. Funnily enough, and contrary to what you might think, OWL-Lite is the most strict of the three and hardest to achieve. OWL-Full is the least interesting of the three dialects; it is often gibberish to reasoners (programs that interpret your semantic constructs) because you can intermingle anything with everything. Usually, the goal is to produce OWL-DL syntax at the least.
All OWL syntax examples in this document validate as OWL-Lite, copy-paste the syntax into the OWL Validator to check. For a more complete introduction on OWL, see the W3C OWL language guide.


Why OWL-RDFS has such poor web presence.


There are several reasons why OWL syntax is nowhere to be seen on the web, and some of these problems are as follows:

1. OWL is Verbose

It really is, but, in my opinion, this does not constitute that much of a problem. Verbosity can be good if it improves readability.
See next point for some considerations however.
On verbosity; below is an example of the minimal OWL-DL compatible syntax required to declare one class, with one parent specified, and one outgoing relation, no comments or labels included, and no advanced relating. As you can see in the figure below, that is already quite a mouth full.


<rdf:RDF
xmlns ='http://this.page'
xml:base ='http://this.page'
xmlns:owl ='http://www.w3.org/2002/07/owl#'
xmlns:rdf ='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:rdfs ='http://www.w3.org/2000/01/rdf-schema#'
xmlns:xsd ='http://www.w3.org/2001/XMLSchema#'>
<owl:Ontology rdf:about='http://example.com/dummy_ontology'/>
<owl:Class rdf:ID='id_for_this_class'>
<rdfs:subClassOf>
<owl:Class rdf:about='http://example.com/dummy_parent' />
</rdfs:subClassOf>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty>
<owl:ObjectProperty
rdf:about='http://example.com/dummy_property'/>
</owl:onProperty>
<owl:someValuesFrom>
<owl:Class
rdf:about='http://example.com/dummy_target' />
</owl:someValuesFrom>
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>
</rdf:RDF>


Figure 1: OWL Verbosity: Expressing one class, including one parent and one outgoing relation.


2. OWL is Incomprehensible

Well, not really, there is a lot of logic in OWL. At least computers should have no problem understanding. Honestly!?
But to the average human, it is a bit like learning to read bits and bytes. Part of the reason why Microformats are successfull is that every web designer understands css and html. Before you delve into OWL, you should have a good understanding of RDF, xml Namespaces, URI's, ... and even though all OWL documents are RDF documents, the language is in many ways nothing like RDF. Finally, you also need to become familiar with ontological jargon (classes, instances, subsumption, disjoints, ...) and methodology and how it impacts your ontology. This, in many ways, resembles learning algebra. The methodology is not something people are acustomed to, there is a general 'why bother attitude', but once learned it paves the road to new insights. So there is a bit of a learning curve.


Perhaps the most significant problem however, is that there are multiple ways of expressing one and the same thing. Consider the examples below. For both, ONLY the syntax varies from what is expressed in Figure 1, the content remains the same (mind that this is still about the very simple set-up: one class, one parent and one relation). They both validate as OWL-Lite as well. The bottom example is the one that resembles the most an original RDF graph, but it is also the hardest one to read. It is an important example however, seeing that this is more or less what RDF, extracted from an RDFa-XHTML page looks like (more on that later).



<rdf:RDF
xmlns ='http://this.page'
xml:base ='http://this.page'
xmlns:owl ='http://www.w3.org/2002/07/owl#'
xmlns:rdf ='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:rdfs ='http://www.w3.org/2000/01/rdf-schema#'
xmlns:xsd ='http://www.w3.org/2001/XMLSchema#'>
<owl:Ontology rdf:about='http://example.com/dummy_ontology'/>
<owl:Class rdf:ID='id_for_this_class'>
<rdfs:subClassOf rdf:resource='dummy_parent' />
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource='dummy_property' />
<owl:someValuesFrom rdf:resource='dummy_target' />
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>
<owl:Class rdf:about="dummy_parent"/>
<owl:ObjectProperty rdf:about="dummy_property"/>
<owl:Class rdf:about="dummy_target"/>
</rdf:RDF>

<!--
reference to parent & to target: must be declared
in this document as being of the owl:Class type to be
owl-DL compatible.
reference to property: must be declared as being of type
owl:ObjectProperty in this document to be owl-DL compatible.
-->


<rdf:RDF
xmlns ='http://this.page'
xml:base ='http://this.page'
xmlns:owl ='http://www.w3.org/2002/07/owl#'
xmlns:rdf ='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:rdfs ='http://www.w3.org/2000/01/rdf-schema#'
xmlns:xsd ='http://www.w3.org/2001/XMLSchema#'>
<rdf:Description rdf:about="http://example.com/dummy_ontology">
<rdf:type
rdf:resource="http://www.w3.org/2002/07/owl#Ontology"/>
</rdf:Description>
<rdf:Description rdf:ID="id_for_this_class">
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>
<rdfs:subClassOf rdf:resource="#dummy_parent" />
<rdfs:subClassOf rdf:resource="#restriction1" />
</rdf:Description>
<rdf:Description rdf:ID="restriction1">
<rdf:type
rdf:resource="http://www.w3.org/2002/07/owl#Restriction"/>
<owl:onProperty rdf:resource='#dummy_property' />
<owl:someValuesFrom rdf:resource='#dummy_target' />
</rdf:Description>
<rdf:Description rdf:ID="dummy_parent">
<rdf:type
rdf:resource="http://www.w3.org/2002/07/owl#Class"/>
</rdf:Description>
<rdf:Description rdf:ID="dummy_target">
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>
</rdf:Description>
<rdf:Description rdf:ID="dummy_property">
<rdf:type
rdf:resource="http://www.w3.org/2002/07/owl#ObjectProperty"/>
</rdf:Description>
</rdf:RDF>


Figure 2: OWL variancy: The two examples above only differ syntactically from the OWL expression found in figure 1, the content is the same, and both validate as OWL-Lite as well. Cfr. OWL Validator.


3. No explored approaches to align / integrate OWL with current web content.

In my opinion, this is the biggest roadblock that stands in the way of public OWL-adoptation.
For ontological data to truly be useful, you need to somehow tie current web content with semantic classes and instances. OWL has failed miserably in this respect so far. It is much like a far away island of Eden, and a man without a canoe. All the important data could be there but no-one knows how to reach it. Granted, OWL does define a standard format for data interchange between applications, but this limited scope cannot be what the semantic vision is about.

What we need is semantic annotation. You need to be able to tag sections of your content with explicit ontology classes and even relations, without it hindering the display of your content. If you wrote a piece on a certain bordeaux for example, you could mark up the section as being about an instance of wine, perhaps even with some properties defined (or even more amazing, just by knowing it is about wine properties can be extracted automatically, a mention of red is bound to be about the wine color).

Take a look at our freebase widget for the desired search engine functionality. You should be able to look for wine (example), by specifying color, origin, vintageyear and more. The funtionality present on that page is thanks to the Freebase service.
Freebase (big thumbs up) really understands the need to couple instance data (they call it 'Topics') and content (images, text) to semantic classes ('Type' in freebase lingo).

The amazing observation here is that you would not need a service like freebase to pull out this kind of data if:

  • It where possible to directly associate your content with certain units of meaning (semantic classes), something which is referred to as 'semantic annotation'.

  • Search engines would index these annotations.


To achieve this goal we need several things to happen:

  • The ability to easily reference ontological classes; at the least there should be some highly accessible repository(/-ies) of ontological data to which one can refer. This is part of the rationale behind ontologyonline.org and why it offers one page per semantic class, it can be seen as an ontological tag index.

  • The ability to embed semantic content directly and invisibly within (X)HTML.


This brings us back to the topic of this section: the current divide between semantic data and web content. There is hope on the horizon however, there seem to be possibilities, and I'd like to demonstrate one of them in this article: the use of RDFa.
Roughly said, RDFa is a next-generation Microformat. It is a structured mark-up that allows direct inclusion of RDF into XHTML, and seeing that OWL is more or less an extension to RDF an interesting thought would be to try and use RDFa to embed OWL in a web page.

Integration into XHTML markup: the promise of RDFa


This article does not attempt to provide a full introduction to RDFa, for that: see W3C's RDFa Primer. As mentioned above, the use case for the majority of people would be to mark up content with reference to semantic classes (e.g. tag your content as being an instance of semantic class Y). In this experiment however, we focus on embedding OWL class information and ontology information directly within XHTML. Embedding Instance information (as required for the above) should be easier and may be discussed in a later Online Ontology Visualisation blog entry.


As said before, part of ontology online's design is the offer one page per semantic class. This makes referencing a lot easier, instead of having to usi URI's (like an URL, but may also point to sections of a certain page or a given part of a document), you can just use a page URL to retrieve a reference to an ontological class. As will be demonstrated, it also makes the RDFa approach a lot cleaner. Of note, the mark-up has been simplified a little (information not required for this demonstration has been stripped) for easier reading. The figure below demonstrates how the content that has RDFa embedded looks like visually (the RDFa is completely invisible to the user).



Figure: Visual output of the RDFa-embedded XHTML.



Step 0. Being Standards-compliant


This is the only pain-point of embedding RDFa. If you wish to get your page to validate against the W3C-Validator service then you must set up your page properly. This means that, besides the fact that you better have valid XHTML if you wish to extract the RDFa out of the page, you also need to change the DOCTYPE declaration of the page, and the content-type you serve, plus include any namespaces you are using in your document. Beware that changing the DOCTYPE may influence the display of your page a little (css is more strict) and changing the content type may affect javascript functionality. This may not even be a possibility to you. Fortunately, this is only a long-term goal to strive to, the RDFa will still be extractable even with less properly defined doctype & content-type (hence step 0).



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdfs ='http://www.w3.org/2000/01/rdf-schema#'
xmlns:dc="http://purl.org/dc/elements/1.1/">
<head>
<meta http-equiv="Content-Type"
content="application/xhtml+xml; charset=UTF-8"/>
...


Figure: Declarations you need if you wish to be Standards-compliant.


You might notice that when you try to run Ontology Online concept pages through the W3C validator still one error persists: 'Conflict between Mime Type and Document Type'. This is because the server does not yet serve the correct content type (it seems to affect javascript functionality and that is something we do not wish to deal with at this point).

Step 1. Declaring the ontology


On any page describing a semantic class (concept page) the ontology is reference by a 'rev' attribute on an anchor element. 'rev' expresses an inverse relationship in RDFa. In this case we say that the ontology (defined at the target of the 'href' attribute) has an owl:Class (designates a class of the ontology) which is described on this page. The mechanism to declare the ontology itself is not discussed in this article, but can be easily seen by examining the source of any of the ontology html pages at ontologyonline.org.


<a rev="owl:Class"
href="http://ontologyonline.org/visualisation/c/CellTypeOntology/">
Cell Type Ontology</a>


Figure: Declaring the ontology on a concept page.


Step 2. Declaring the Class


In RDFa an 'about' attribute is used when talking about something that does not refer to the entire page. Seeing that the resource here is indeed the entire page (the entire page equals the semantic class) we do not use an about 'attribute'. The class is defined simply by adding a class attribute with value owl:Class in the markup. Notice how the markup does not affect the content itself, nor it's display (hint: for those who did not know, it is possible to use multiple classes in html markup, by providing a space-separated list: example class='owl:Class concept blue', css will take into account all of them). The property attribute with value rdfs:label provides a nicer representation name than just 'this page url'. Ideally an xml:lang attribute would be specified as well, denoting that the label is in English for example (xml:lang="en", work in progress).


<h1 class="owl:Class" property="rdfs:label">mature B cell</h1>

Figure: Declaring the semantic class.


The parents are declared on another page, hence a link (anchor element) is used to reference them. By wrapping the anchor element in an additional html element we can specify two relations, firstly that the class expressed on this page has a subclass relationship to something (as expressed by the 'rel' attribute with value 'SubClassOf'). And secondly, that this 'something' is an actual owl:Class, which is very important if we wish to achieve OWL-DL syntax and state that this concerns a parent. We do this by setting the 'rel' attribute on the anchor element to 'owl:Class'. In the figure below two parents are declared:


<div>
<span rel='rdfs:SubclassOf'>
<a rel='owl:Class' href='c/CellTypeOntology/B_cell'>B cell</a>
</span>
<span rel='rdfs:SubclassOf'>
<a rel='owl:Class'
href='c/CellTypeOntology/professional_antigen_presenting_cell'>
professional antigen presenting cell</a>
</span>
</div>

Figure: Declaring Parent(s).


Outgoing relations are specified by taking a similar, yet slightly more complex, approach. The outgoing relation requires a property declaration (type of relation) and a target declaration, another class of the ontology to which the class on this page relates. At this time properties at ontology online do not have their own page just yet, so they are referenced only in text. For the target of a relation a link (anchor element) is used as reference. The top html element with attribute 'rel' and value 'rdfs:SubclassOf' again denotes that the class expressed on this page has another subclass relationship to something. The element underneath it tell us that this subclass relationship is a kind of outgoing relationship (instead of a parent relationship) by having an attribute 'rel' with value 'owl:Restriction'. This element wraps around two elements, one which designates the property (recognized by having the attribute 'property' with value 'owl:onProperty' The second is an anchor element for which the href attribute specifies the target, while the 'rel' attribute 'someValuesFrom' adds some additional logic the relationship (see W3C's OWL language guide for details on someValuesFrom).

Of note, it should be expressed that the property is of the type owl:ObjectProperty and the target is of the type 'owl:Class' to be OWL-DL compliant, again work in progress (still deciding on the most optimal way to convey this...).


<div rel='rdfs:SubclassOf'>
<div rel='owl:Restriction'>
<span property='owl:onProperty'>develops_from</span>
<a rel='owl:someValuesFrom'
href='c/CellTypeOntology/immature_B_cell'>immature B cell</a>
</div>
</div>

Figure: Declaring Outgoing Relations.


Last thing to do is to add a description and perhaps some additional language terms that define the class described on the page. This is done though 'rdfs:comment' and 'rdfs:label' properties (when the target is text on the same page then the attribute 'property' is used in RDfa). Again, it would be better to also append an xml:lang attribute as well specifying the language of the comment or language term.


<div property='rdfs:comment'>"A mature form of a B cell, a type of
lymphocyte whose defining characteristic is the expression of an
immunoglobulin complex." [GOC:add, ISBN:0781735149]</div>
<span property='rdfs:label'>mature B lymphocyte</span>
<span property='rdfs:label'>mature B-cell</span>
<span property='rdfs:label'>mature B-lymphocyte</span>

Figure: Declaring description and additional language terms.



Testing out: Operator Plugin



It is possible to see how the RDFa extract from any of the concept pages looks like by using a browser plugin: The latest version of Mike Kaply's Operator Plugin recognizes RDFa data embedded within XHTML pages.


Browsing any of the Concept pages on OntologyOnline.org should highlight the 'Resources' button, which should show both the ontology to which the concept belongs (if on a concept page) and OWL-syntax like information about the concept itself:

Browsing an Ontology information page (e.g. Cell Type Ontology) should reveal one resource recognized, the ontology itself.



@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://ontologyonline.org/visualisation/c/CellTypeOntology/>
owl:Class
<http://ontologyonline.org/visualisation/c/CellTypeOntology/neutrophilic_promyelocyte>
.

<http://ontologyonline.org/visualisation/c/CellTypeOntology/neutrophilic_promyelocyte>
rdf:type owl:Class ;
rdfs:label "neutrophilic promyelocyte" ;
rdfs:label "neutrophilic premyelocyte" ;
rdfs:label "neutrophilic progranulocyte" ;
rdfs:comment ""A neutrophil precursor in the granulocytic series, being a cell
intermediate in development between a myeloblast and myelocyte, and containing
a few, as yet undifferentiated, cytoplasmic granules."
[GOC:add, ISBN:0721601464]" ;
rdfs:SubclassOf [
owl:Restriction [
owl:onProperty "develops_from" ;
owl:someValuesFrom <c/CellTypeOntology/neutrophilic_myeloblast>
]
] ;
rdfs:SubclassOf [
owl:Class <c/CellTypeOntology/promyelocyte>
] .


Figure: OWL-DL based Syntax extracted from an Ontology Online concept page by the firefox Operator plugin.



Conclusion


To my knowledge, this is a first attempt to finally reconcile OWL-RDFS with standard XHTML web pages.
The extracted RDF (as tested with RDFa Distiller) does not validate as OWL-DL just yet. But this initial venture into OWL-enabled XHTML shows promise. There is room for improvement, so RDF-Guru or not: feedback and suggestions are greatly appreciated.







If you found the content of this article interesting, consider digging it.