Monday, September 10, 2007

Why we need the semantic web

Two search strings with the same syntax can have dramatically different semantics. What if your looking for pages about a less popular topic which happens to share the same syntax as a highly popular yet 'totally irrelevant to you' topic? Finding what you need will take time and ingenuity. The newly introduced concept feeds (external feeds matching the concept of the given page), for example those associated with the concept page on Cells (in it's biological sense), demonstrate our dire need for semantically enabled search engines.

'Cell', as an English term, applies to a wide variety of concepts: 'cell' in the biological sense, 'cell' as in 'cell phone', 'cell' as in prison cell, 'cell' as in an aggregation of people, and many more including some combinatorial uses (e.g. a title for games, novels, etc). No search engine at the moment is capable of disambiguating between these different meanings and filter results accordingly. The books feed will display a novel by Stephen King, and a book about 9/11. The blog feed lists a police break-up of a Nazi-cell, many entries about cell phones and one about solar cells. The Digg news feed is again mainly about cell phones. Imagine if one of them offered the ability to filter results pertaining only to cell in the biological sense, you might imagine that becoming the next-gen search engine.
And that's exactly what ontologies do, provide a means for disambiguating syntactically equal but semantically different items. Ontologies tell you that there are in fact different concepts (owl classes), one being a designed artifact, another a part of an organism, yet another being a certain aggregation of people, etc etc which all share the same language term 'cell'. Knowing this is already half the work.

You could argue that dictionaries might point to you the different semantics as well, yet by browsing the ontology you have easy access along a variety of axes (horizontal & hierarchical) to a plethora of related concepts such as organisms and cell structures for cell in it's biological sense, telephony in the case of cell phones, social groups and people in the case of 'cell' as an aggregation of people.. you get the picture. When using an ontology as a backbone for indexing, you might be able to figure out which concept applies by examining the context in which it is found.

Tagging/Indexing with concepts, as opposed to language terms, might be a core requirement for future search engines.

No comments: