Posts Tagged ‘ontology elicitation’

Ontology Elicitation Defined in Encyclopedia of Database Systems

August 17, 2009

My entry (download a pre-publication draft here with permission of Springer) on Ontology Elicitation will soon be published in the Encyclopedia of Database Systems by Springer.  The Encyclopedia, under the editorial guidance of Ling Liu and M. Tamer Özsu, will be a multiple volume, comprehensive, and authoritative reference on databases, data management, and database systems. Since it will be available in both print and online formats, researchers, students, and practitioners will benefit from advanced search functionality and convenient interlinking possibilities with related online content.  The  Encyclopedia’s online version will be accessible on the platform: SpringerLink.

De Leenheer, P. (2009) Ontology Elicitation. In Encyclopedia of Database Systems, editors-in-chief. Liu, L. and Ôzsu, T., Springer, forthcoming Spring 2009.

Advertisements

The Virtue of Naming concepts

July 16, 2009

Everybody knows the Pizza Ontology that has been used for ages now to demonstrate tools and methods in the Semantic Web community. Nowadays the Beer Ontology is gaining interest, and I wonder how many concept types the Belgian beer namespace will consist, as there is no clear enumeration of that :-) Anyway, when talking about pizza or even about Belgian beers, we are still playing around with small ontologies.

(Too long) names to decontextualise the proliferation of concept types

Seriously, an ontology should refer to context-independent and language-neutral concepts. However, natural language (vocabulary etc.) is still needed to represent these concepts. Wittgenstein once said:

“The limits of my language means the limits of my world. “

When building large conceptual frameworks of thousands of concept types, vocabulary is usually exhausted before finishing. BTW, is the job ever finished given the proliferation of concepts in communities? Anyway, (as in natural language) terms will have different meanings depending on the context. E.g., the term  java can refer to coffee, a country, or a programming language. In the latter case we can even doubt whether we are talking about java as a sub-type or an instance of the concept type programming language. Let’s not see how deep the philosophical rabbit hole goes here. IMHO, in a formal semantic system we could consider to introduce a fuzzy parameter that can switch between both perspectives.

Now, let’s get back to the ambiguity problem of vocabulary. Lacking better solutions, many of these large ontologies have chosen very long labels to refer their concepts in an unambiguous manner (as the title of this blog already suggests). Usually, these labels are concatenations of a number of parameters that determine the context of the label. Consider, for example, the IFRS Taxonomy 2009 which is a complete translation of International Financial Reporting Standards (IFRSs) as of 1 January 2009 into XBRL:Picture 1

The label for the illustrated concept reads (first take a deep breath):

AdditionsOtherThanThroughBusinessCombinationsCopyrightsPatentsAndOtherIndustrialPropertyRightsServiceAndOperatingRights

And this is not a single occurance. The IFRS taxonomy counts hundreds of concept labels fo this size. See for yourself:

Picture 2

This may be ok for one single person who built the ontology, and actually chose the labels, but when sharing it is not understandable for machines, or even other user. This situation creates a vicious circle: long labels are difficult to navigate, hence users introduce new concept types as they cannot retrieve what they are looking for. When defining these new concept types, they have no choice than to invent new labels “with the wet index”, inexorably aggravating the situation.

Categorisation

The problem is also found when people tend to overcategorise. This is an excerpt of a product taxonomy from Kevin Jenkins during a discussion on SemWeb on this matter:

Product (Root Class)
--- software
------ desktop software
----------- desktop internet software
------------------- desktop internet access software (individual)
------------------- desktop internet browser software (individual)
------------------- desktop internet messaging software (individual)
------------ desktop multimedia software
------------------- desktop multimedia 3d software (individual)
------------------- desktop multimedia audio software (individual)
------------------- desktop multimedia video software (individual)
------ internet software
------------ internet saas software
------------------- internet saas collaboration software (individual)
------------------- internet saas videosharing software (individual)
------------ internet cloud software
------ enterprise software

In order to differentiate a subtype from its parent, a term is appended to the more general label. According to Azamat Abdoullaev long classification is done according the scheme “noun specifying another noun”, like below:

((subsubclass)(subclass(class)): audio multimedia desktop software.

He compares it withe problem of URI schemes or computer directory (folder, catalog) names, it will be written as a root hierarchy:

software/desktop/multimedia/audio/...

However, this is not how humans talk to each other. Humans tend to contextualise their concepts through sentences in which they qualify certain attributes. This is done in terms of facts. E.g, following example shows 4 facts for this Person.

Person drives Car with Brand “Minerva” and married to Woman with Name “Athena”.

The fact types used here are:

Person drives Car
Car with Brand
Person married to Woman
Woman with Name

Hence, using simple fact types we can describe very complex concept types without even using categorisation in many cases. The terms used to refer to the concept types of course need to be disambiguated. There is no deus ex machina here: context is a social construct as well that has to be included in the ontology.

Context as first-class citizen

Context is an inexorable construct when representing ontologies. As I already discussed in an earlier publication. Particularly when stakeholders in a community use a different vocabulary to refer to the common concept types.

In our approach we use a context identifier g to articulate a term t with a concept type identifier c with the following function.

(g,t)->c

Hence, c is a URI that refers to a language-neutral and context-independent concept type. This can be represented in the WordNet manner in terms of a gloss (=informal description) plus a synset (=set of synonymous terms). For one of the terms on the above fact types this would be (based on WordNet):

(drivingfordummies, person)->(gloss,synset)
gloss="a human being"
synset={individualsomeonesomebody}

Assuming that this fact type was extracted from a book called Driving for Dummies. So by keeping track of the context of elicitation g of very fact type, we can disambiguate the involved terms properly without the need for very long labels.

Further Reading

In my PhD, I developed a methodology that enacts a community to collaboratively construct an ontology architecture consisting of several layers (upper common, lower common, stakeholder level).

  • The top layer refers to language-neutral and context-independent concepts that are already agreed and applied by the community.
  • The lowest stakeholder layer consists of “stakeholder perspectives” on these upper layers, specialising the upper layer with locally relevant concept types represented by local vocabularies.
  • Gradually these lower perspectives are reconciled in the lower common layer, and when a new version is produced parts are promoted the upper common layer.

Hence community does not only have to agree on the concept types (gloss) but also on the preferred terms (synset) to refer to these concept types.

Ontology Elicitation in Springer Encyclopedia of Database Systems

January 15, 2009

The Encyclopedia of Database Systems (edited by Ling Liu and M. Tamer Özsu) will be a comprehensive reference to topics in database systems for students, researchers and practitioners who need a quick and authoritative reference to the subject of databases, data management, and database systems, such as basic concept definition, data processing algorithms, key results to date, and references to source materials. The encyclopedia will feature an alphabetical organization of nearly 1000 entries, covering both topics of current interest and key research results of historical significance in all the main areas of database systems.

Publication by Springer is planned for April 2009; the Encyclopedia of Database Systems will be available as a printed volume and an online reference work.

I was invited to write the entry for Ontology elicitation. deleenheer_edb_2007_fig2Ontology elicitation embraces the family of methods and techniques to explicate, negotiate, and ultimately agree on a partial account of the structure and semantics of a particular domain, as well as on the symbols used to represent and apply this semantics unambiguously. Ontology elicitation only results in a partial account because the formal definition of an ontology cannot completely specify the intended structure and semantics of each concept in the domain, but at best can approximate it. Therefore, the key for scalability is to reach the appropriate amount of consensus on relevant ontological definitions through an effective meaning negotiation in an efficient  manner. In this entry we give definitions, historical background, scientific fundamentals, key applications, and finally future directions for ontology elicitation.

This reference is designed to address the needs of a wide audience including researchers, graduate and undergraduate students, and other professionals and practitioners who might need speedy and reliable information in the databases, data management, and database systems subject area. We anticipate many to benefit from this reference, including database specialists, software developers, scientists and engineers who need to deal with (structured, semi-structured or unstructured) large datasets. In addition database and data mining researchers and scholars in the many areas that apply database technologies, such as artificial intelligence, software engineering, robotics and computer vision, machine learning, finance and marketing are expected to benefit from the encyclopedia.

This home page is being updated continuously during the course of this project. The Editor-in-Chiefs and advisory board value feedback from the database community concerning every aspect of the Encyclopedia of Database Systems.

De Leenheer, P. (2009) Ontology Elicitation. In Encyclopedia of Database Systems, eds. Liu, L. and Ôzsu, T., Springer, forthcoming Spring 2009.