International

    Virtual

    Observatory

Alliance


Ontology of Astronomical Object Types Use Cases


Version 1.1

IVOA Technical Note 17 January 2010




This version:

http://www.ivoa.net/Documents/Notes/AstrObjectOntologyUseCases/20100117/

Latest version:

http://www.ivoa.net/Documents/Notes/AstrObjectOntologyUseCases/

Previous version(s):

http://www.ivoa.net/Documents/Notes/AstrObjectOntologyUseCases/20080703/

Editors:

S. Derriere

A. Preite Martinez

A. Richard

Author(s):

    L. Cambrésy – cambresy@astro.u-strasbg.fr

S. Derriere – derriere@astro.u-strasbg.fr

P. Padovani – ppadovan@eso.org

A. Preite Martinez – andrea.preitemartinez@iasf-roma.inaf.it

A. Richard – richard@astro.u-strasbg.fr





Abstract

The Semantic Web and ontologies are emerging technologies which enable advanced knowledge management and sharing. Their application to Astronomy can offer new ways of sharing information between astronomers, but also between machines or software components and allow inference engines to perform reasoning on an astronomical knowledge base.

This document presents several use-cases exploiting an ontology of astronomical object types. This strongly constrained ontology allows formal reasoning including similarity measurements, concept classification and consistency checking.



Status of this document

This is an IVOA Technical Note for review by IVOA members and other interested parties. It s a draft and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use IVOA Technical Notes as reference materials or to cite them as other than “work in progress”.

A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/



Acknowledgments


The Active Galaxy Nuclei section of the ontology of astronomical object types was made in collaboration with Paolo Padovani of ESO, the Young Stellar Objects and diffuse matter sections with Laurent Cambrésy of the CDS and the variable star and emission nebulae sections with Andrea Preite Martinez of INAF-Roma.


INAF and CDS acknowledge support from the VO-TECH design study project.

Contents


Abstract 1

Status of this document 2

Acknowledgments 2

Contents 3

1 Introduction 5

2 General Implementation 6

2.1 OWL API 6

2.1.1 Former API: Jena framework 6

2.1.2 Current API: Protégé-OWL API 6

2.1.3 Possible future API change: OWL API 6

2.1.4 Ontology models 6

2.1.5 Easier manipulation of OWL items 7

2.2 Reasoner 8

2.3 Graphic visualization 9

2.3.1 Choice of a visualization engine 9

2.3.2 A small Java API for Graphviz 9

2.4 Applications general structure 9

2.4.1 Implementation as Web Services 9

2.4.2 External calls 9

2.4.3 Development Environment 9

3 Use cases 10

3.1 Registry query builder 10

3.1.1 Why an ontology-based query builder? 10

3.1.2 Conception overview 11

3.1.3 Implementation choices 11

3.1.4 The search for keywords 12

3.1.5 Operation overview 13

3.2 Registry Resource Finder 14

3.2.1 From a query builder to a resource finder 14

3.2.2 Conception overview 15

3.2.3 Operation overview 17

3.3 Concept explorer 19

3.3.1 Conception overview 19

3.3.2 Web interface overview 22

3.4 Annotation hierarchy graph 24

3.4.1 Conception overview 24

3.4.2 Operation overview 26

3.4.3 Text output example 29

3.4.4 SKOS output example 30

3.5 SIMBAD consistency checker 31

3.5.1 Why a consistency checker? 31

3.5.2 Conception overview 31

3.5.3 Consistency check details 32

3.5.4 Operation overview 32

3.6 NED consistency checker 35

3.6.1 Differences with SIMBAD consistency checker 35

3.6.2 Conception overview 35

3.6.3 Operation overview 36

3.7 SIMBAD Consistency checking through instances 37

3.7.1 Conception overview 37

3.8 Keyword mapper 39

3.8.1 Use-case diagram 39

3.8.2 Keyword mapper sequence diagram 39

3.8.3 Web interface overview 40

4 Perspectives and challenges 41

Appendix - Changes from previous versions 42

Glossary 43

References 44

1Introduction


Alongside the development of the ontology of astronomical object types, a series of use-cases taking advantage of this ontology is being implemented to complete the evaluation of the possibilities offered by strongly constrained formal ontologies in astronomy.


Ontologies are structures for representing and formalizing knowledge. They can be used to guarantee the consistency of knowledge shared between men and machines as well as between machines. Their use ranges from basic classification in the case of primitive concepts ontologies to advanced inference and reasoning in the case of defined concepts ontologies.


This possibility of automated consistency checks and inferences is what interest us most. Indeed a few ontologies have been built to represent part of the astronomical knowledge, but since they lack formal definitions of the concepts, they allow very little reasoning. While this can be sufficient in some cases, it tremendously limits the application of the ontology. Though it is much more difficult, we are willing to build such definitions to set-up a semantic layer allowing to automate operations usually performed by humans since it is the human who has the knowledge to do these operations.


To experiment on these possibilities, we are building an ontology of astronomical object types along with some applications. This ontology is first based on the standardization of object types1 used in the SIMBAD2 database. These choices are motivated mainly by the possibilities offered by an astronomical knowledge engine coupled to databases, like consistency checks of the semantics of the database entries or advanced queries.


Last but not least, ontology-based systems are little dependent of the evolution of the ontology. This means that when the astronomical knowledges evolves, one just has to update the ontology accordingly and the systems exploiting it will take the changes into account, unlike dedicated systems for which each change can impact the whole system.


This document covers the general implementation choices and the different use-cases considered to this day.



2General Implementation

2.1OWL API

2.1.1Former API: Jena framework

To build applications exploiting the ontology, we need an API allowing us to access and manipulate directly an ontology written in OWL. Only a few exist and nearly all of them are based on the Jena framework. Jena is a Java framework for building semantic web applications. It is open source and provides -among various programming toolboxes- an OWL API.


Since it is reliable, mature and offers a good compatibility with most of the other RDFS/OWL API, Jena was our first choice of API to build our applications. We since switched to the Protégé-OWL API for convenience.


2.1.2Current API: Protégé-OWL API

Indeed, a shortcoming of the Jena Framework for OWL exploitation comes form its very nature: Jena is a general RDF/RDFS framework. Thus Jena lacks specific primitives for OWL-based applications. It is the opposite with the Protégé-OWL API which is dedicated to OWL manipulation and provides most functions needed to exploit an OWL ontology. This results in a faster and simpler programming. Moreover, since this API is powering the Protégé ontology editor, it benefits from the same development support as the editor and is not likely to be forsaken any time soon. So after considering the pros and cons of the different API, the Protégé-OWL API is our final choice for our programming needs.


It is worth noting that these API being Java-based, this implies at least the core of the applications to be coded in Java.


2.1.3Possible future API change: OWL API

The second version of the OWL API is very promising since it is completely centered around OWL and seems to yield quite a better performance than the Protégé-OWL API. We seriously considered the option to port our prototypes to it but we eventually judged that the API was not mature enough to do so. We are considering it for future works though.


2.1.4Ontology models

The Protégé-OWL API is centered around a model of ontologies. More precisely, the API includes classes that describe every OWL item (concept, property, etc.) and the model is an instantiation of the whole ontology. In this model each OWL item is represented as an instance of its corresponding class.


This ontology model described by the class OWLModel is not the only one existing. Not only each OWL API has its own model, but more importantly each reasoner has one too. In our implementation we use the OWLModel from the Protégé-OWL API and the reasoner uses a translation of it in its own optimized format. We never have to manipulate the reasoner's model since the translation is ensured by the DIG protocol.

2.1.5Easier manipulation of OWL items

On top of all the classes to manipulate OWL items which can be fond in the Protégé-OWL API, a class called OntoManager has been created, mainly for convenience. Its main role is to allow the manipulation of OWL items by their names instead of their URI.


It also includes numerous methods that make operations on concept and properties easier to code by factorizing calls of finer Protégé-OWL methods so that the developed prototypes call the methods from OntoManager instead of directly manipulating Protégé-OWL classes.


2.2Reasoner

Another very important issue when developing applications on ontologies is the choice of a description logics reasoning engine for the use cases that need reasoning.


Various efficient reasoners are available for description logics. All are based on different description logics and their implementations are summarized in the following table :

reasoner

test version

logic

implementation

License

comments

RACER

1.7.23

and
1.7.24

SHRIQ(D)

CommonLISP

free license

discontinued after 1.7.24 (authors went commercial with RacerPro)

RacerPro

1.9

SHRIQ(D)

LISP

commercial

DIG-only interface is free but not as flexible as the original RACER

FaCT++

1.1.3

SHOIQ(D)

C++

GPL

difficulties with large scale hierarchies

Pellet

1.5

SHROIQ(D)

Java

MIT

Full support of OWL 1.1 specification, overall performance on par with RacerPro, often faster with the higher complexity description logics

Pellet

2.0rc

SROIQ(D)

Java

AGPL v3

Support of OWL 1.1, OWL2 EL, OWL2 QL, performance improved from v1.5 for most use-cases and logics, better interfacing


To determine which is best for our needs, we performed various tests3. The tests compared the performances of the different reasoners for the following tasks :

This led to the following

So after starting with Racer 1.7.24 we eventually switched to Pellet since it provided better support and much higher performance.



2.3Graphic visualization

Although a complete ontology's graph lacks legibility because of its high number of arcs, being able to get a graphic representation of a sub-graph, like a local neighborhood can prove very handy.

2.3.1Choice of a visualization engine

We needed a visualization with fast rendering, good support, free license and cross-platform capabilities. Though there were interesting Java toolboxes which would have allowed perfect integration to our applications' code we settled for Graphviz and the dot algorithm. Indeed, Graphviz has all the characteristics expected and is commonly used for ontology visualization through tools like the OWLViz visualization plugin for the Protégé ontology editor.

2.3.2A small Java API for Graphviz

The only drawback of using Graphviz is the integration. Since it's not Java and has an API that would have required a lot of native calls we settled for using a little Java handler which would communicate with Graphviz. To be exact, we adapted an existing GPL Java Graphviz API 4 to fit our needs

Basically, this small API contains dedicated methods to easily create Graphviz scripts in DOT language, call Graphviz to generate the corresponding image and optional mapping file and retrieve them respectively as a Java byte array and string.



2.4Applications general structure

2.4.1Implementation as Web Services

The majority of the Virtual Observatory tools being Web Services we decided to choose this kind of application. The code being Java and the reasoning operations often requiring rather high resources, the Servlet/JSP model was preferred. The servlets run on an Apache Tomcat server

2.4.2External calls

Pellet and Graphviz both are not integrated in the servlets. Graphviz runs as a stand-alone application and is called by the servlets; and for both performance and convenience Pellet runs as a separate server and communicates using the DIG5 protocol (a DIG interface implementation is readily available in the Protégé-OWL API)


2.4.3Development Environment

The implementation itself is done using the eclipse IDE and a Tomcat plugin for convenience.



3Use cases


3.1Registry query builder

Our first application exploiting the ontology of astronomical object types is a request builder for querying astronomical registries. This was presented at IAU XXVIth GA, Prague 08/2006 during Special Session 3

3.1.1Why an ontology-based query builder?

The goal is to have a tool able to build advanced queries in the VO registry. Since the ontology is one of astronomical object types, the queries will be performed on the <subject> element of the Registry scheme, which may contain a description of astronomical object types.


The idea of such a tool comes from the limitations of existing registry querying methods. Obviously, when querying on the <subject> field of registry entries, one must use existing keywords in the query in order to have some results. But the following problems arise when considering astronomical object types:


For example, if we consider the registry entries coming from VizieR6:


An ontology-based query builder offers the comfort to solve the problem of finding adequate keywords via an automated process. Moreover, the process is highly reliable since it is based on knowledge formalized with the help of experts from the user community. Finally, the tool is independent from the knowledge evolution. For example, if a new object type was to be added to the existing list or another changed, all there is to do is to modify the ontology accordingly, not the request builder.

3.1.2Conception overview

3.1.2.1Use-case diagram

3.1.2.2
Global sequence diagram

3.1.3
Implementation choices

Since the subsumption relationships in the ontology corresponds to the knowledge needed to retrieve more specific or more general keywords, our request builder searches through the ontology to automatically propose adequate keywords for the user to use.


To make that possible, we have used annotations in the ontology. Concepts corresponding to existing <subject> keywords were annotated with those keywords. Currently the VizieR registry keywords have been implemented (cf. vizier:kwd annotations in the ontology)

The general idea is that the request builder will be fed a query subject and will refine or broaden the query by adding keywords found during a search within the subsumed or subsuming concepts ontology, the original query subject being the starting point of the search.

3.1.4The search for keywords

The search for keywords is done in two times: first search through the subsumed concepts to look for more specific keywords. After this step, if no keyword has been found for the query another search is performed, this time to get the most specific subsumer.

3.1.4.1Activity diagram


3.1.4.2Practical examples

For the following examples, we consider the VizieR keywords which will be written in red on the graphs.




There is no keyword attached to DoubleStar but the tool will retrieve the ones attached to SpectroscopicBinary, EclipsingBinary, CataclymicVariable and Nova and will suggest them to the user.




In this case, the tool suggests the keyword corresponding to XRaySource since it is the closest to the original query subject.


3.1.5Operation overview
















3.2Registry Resource Finder

3.2.1From a query builder to a resource finder


The idea with this application was to capitalize on the first attempts to provide better registry queries with the Registry Request Builder by extending it to other registries than VizieR and allowing the user to type a free text as the input instead of having to choose the concept corresponding to the object type he wishes to query a registry on.


So mostly it is a preprocessing before a process similar to that of the Registry Request Builder. Indeed the idea is to match the input from the user to annotations in the ontology. The best match indicates the concept representing the object type to query on. Then get the keywords to use to query the registry as it is done in the registry request builder and build the query.


The main problem to solve was to interpret the input in order to know which object type the user wished to query on. To do this the resource finder relies of the following :




As far as querying registries go, the application is able to query either VizieR or the Astrogrid registry. In fact, the application could even query other services as long as the keywords for that service are present in the ontology. To show that possiblity the prototype application was made able to query the SAO/NASA Astrophysics Data System 7


3.2.2Conception overview

3.2.2.1Use-case diagram



3.2.2.2Global sequence diagram



3.2.2.3Input conversion activity diagram





3.2.2.4Get registry keywords for concepts Activity diagram



The search for keywords is the same as the one used for the Registry Request Builder (Fully covered in 3.1.4). It is done in two times: first search through the subsumed concepts to look for more specific keywords. After this step, if no keyword has been found for the query another search is performed, this time to get the most specific subsumer.



3.2.3Operation overview

3.2.3.1Input query topic

The topic of the query can consist of words and/or keywords. It is not case-sensitive or expecting a specific word pattern.



3.2.3.2Select the query

From the input words several possible queries are created . The user chooses the one to execute on VizieR on Astrogrid (or even the ADS service which was added to prove that the resource finder could be used with other services than registries)









3.2.3.3
Results screen

3.3Concept explorer


The idea behind the development of a concept explorer is the will to have an application that lets the user both:


This use case was explored both to test the performance of OWL API and reasoners on basic operations like ontology manipulation and classification, and also to have a web-service able to give details on concepts and allow some user-friendly browsing, especially since most ontology exploring tools are rather heavy offline programs.


Though developed with the ontology of astronomical object types in mind, the tool requires very little changes to be usable with any ontology, both for concept browsing and concept classification.

3.3.1Conception overview

3.3.1.1Use-case diagram



3.3.1.2
Browse concepts sequence diagram

Notes:

3.3.1.3Add new concept sequence diagram


Note:

If the name of the concept chosen by the user is actually the name of an existing concept of the ontology, the system interprets as a browsing request centered on this concept. (Thus the sequence will be the one presented in 3.2.1.2 )

3.3.1.4Add restriction sequence diagram



Notes:

3.3.2Web interface overview

3.3.2.1Concept browsing



Notes on the graph:

3.3.2.2
Adding restrictions

3.4Annotation hierarchy graph

The annotation hierarchy graph is meant to show hierarchies of keywords with regard to the ontology.


Items of the ontology -usually concepts- can be annotated, annotations having no part in reasoning. Therefore annotations are a good way of providing additional information on concepts. In particular, it is a convenient way of indicating which keywords in astronomy vocabularies or data sources correspond to a given concept.


It may be interesting to know how these keywords are organized within the ontology. Since they are present as annotations, they follow the same hierarchy as the concepts they annotate which the application shows as a graph.


This service was further extended by adding an option to output the graphs of annotations as text, be it in the form of a hierarchy or, more interestingly, as SKOS vocabularies as defined by the IVOA recommendation for vocabularies in the VO9.


3.4.1Conception overview

3.4.1.1Use-case diagram


3.4.1.2Global sequence diagram for graph output




Note:

There are two possible graphs for a given annotation, depending if the user wants a graph of only the annotated concepts or all the hierarchy of the annotated concepts, including some non-annotated intermediate ones.


3.4.1.3Global sequence diagram for SKOS output












3.4.2Operation overview

3.4.2.1Selection screen




1. keyword selection




2. output type selection









3. only tagged concepts selection







3.4.2.2Example graph: only annotated concepts




Note: The annotated concepts are in blue, the others in orange

3.4.2.3Example graph: whole annotated concepts hierarchy



3.4.3Text output example


An alternate output of the graph is indented plain text organized in a tree-like structure. For example, the following is an excerpt of the text output of the VizieR keywords hierarchy, with only the annotated concepts taken into account.




















3.4.4SKOS output example


Last but not least, this application can be used to output SKOS vocabularies for annotations present in the ontology. These are generated using the tagged concepts, and all their relationships





3.5SIMBAD consistency checker


3.5.1Why a consistency checker?

The SIMBAD database contains cross-identifications, measurements and bibliography for more than 3.8 million astronomical objects. An object classification is also provided, based on a standardized list of object types that was used as a starting point for the ontology construction. The newer version of SIMBAD (SIMBAD4, released in 2006) adds the possibility for an object to be assigned multiple object classifications.


For a given object, the ontology can be used in a simple way to check the consistency of the cross-identifications:


We can therefore use the knowledge on all object types inferred from the different identifiers of an object, to check if this is consistent with constraints that are present in the ontology, and detect some possibly wrong cross-identifications in SIMBAD.


A typical example would be a single object having an identifier corresponding to an AGB star, and one corresponding to a QSO, if AGB star is a subclass of Star and QSO is a subclass of Galaxy, and Star and Galaxy are disjoint in the ontology, then we detect the inconsistency.

3.5.2Conception overview

3.5.2.1Use-case diagram



3.5.2.2Global sequence diagram

3.5.3Consistency check details

To check the SIMBAD object's consistency with regard to the ontology the following method is used :

This method requires concerting some of a SIMBAD object's properties into ontology restrictions but in return it allows the full use of the reasoner including consistency checking and ontology classification.

3.5.4Operation overview

3.5.4.1Input objects to check


3.5.4.2Results screen



3.5.4.3Draw concept hierarchy screen

The concept hierarchy corresponding to each SIMBAD object checked can be viewed as a graph. The following example corresponds to the entry found inconsistent in the results shown on 3.4.4.2. Indeed, the SIMBAD object types for that object include * and QSO, translated respectively as High-powerRadio-quietAGN and StellarObject and a concept cannot inherit from both, hence the inconsistency.




3.6NED consistency checker

3.6.1Differences with SIMBAD consistency checker


Following our work on a consistency checker for the SIMBAD database, we adapted the same application to be able to use it with the NASA/IPAC Extragalactic Database (NED).


It shows very little differences with the SIMBAD version. In fact, the most important differences are that, unlike the SIMBAD consistency checker :

3.6.2Conception overview

3.6.2.1Use-case diagram

3.6.2.2
Global sequence diagram



3.6.3Operation overview

As stated previously, the NED consistency checker varies from the SIMBAD checker in that it allows to check only one NED entry at a time. Also, the data for a given object may be different in SIMBAD and NED, leading to possibly different results.

3.6.3.1Input objects to check

3.6.3.2
Results screen



3.6.3.3Draw concept hierarchy screen



3.7SIMBAD Consistency checking through instances

The latest evolution of the database semantic consistency checking consisted of not testing the consistency at the T-Box level (concepts) but at the A-Box level (instances).


The overall process is similar to the SIMBAD consistency checking on concepts (3.5) but instead of creating a new concept corresponding to the database entry to check an instance of existing concepts is created. To allow this, an automated instantiation process was set up.


3.7.1Conception overview

3.7.1.1Use-case diagram




3.7.1.2Global sequence diagram




3.7.2Specificities of using instances

3.7.2.1Automated instantiation process and instance consistency checking

Using instances changes very little in the overall process: instead of checking the object described by a SIMBAD entry by inserting a new concept corresponding to it and checking the ontology's consistency, here it is directly an instance that is added and checked for consistency with regard to the concepts.


The strategy selected to create and check the instance corresponding to the SIMBAD entry to check is the following:

3.7.2.2Current limitations


At the moment, both Protégé-OWL API and OWLAPI v2 does not allow to ask a reasoner to check the consistency of a modified ABox without reclassifying the TBox first, even though it should be feasible. This directly implies that it is currently impossible to check if an instance is consistent with regard to the concepts without reclassifying the whole ontology.


Unfortunately, this also means that while this problem is unsolved, the expected performance gain from working on instances is null. Hopefully, OWLAPI version 3 coupled with the more recent versions of Pellet will allow this operation in the near future.

3.8Keyword mapper

Following the direction of the annotation , a keyword mapper has been developed. Its goals are first to map keywords with regard to the ontology and second to output these mapping in various formats so that they can be either used or compared.


This last option is especially interesting since some keyword mappings already exist so comparing them with the results obtained with the ontology-based mapping may lead to enhancements in ontology annotations or the mapping actually used by other applications.

3.8.1Use-case diagram



3.8.2Keyword mapper sequence diagram


Note:

3.8.3Web interface overview

Here is an example using a version focused on the mapping between Astronomical Data Center and VizieR registry keywords.






4Perspectives and challenges

The future of this work is divided in two main orientations: continue to improve the ontology of astronomical object types and develop applications exploiting it.


Future works include further developments on the mapping of keywords and the automated construction of an instance base of SIMBAD objects, the main goal being to dramatically improve the performance, mainly by lowering the reasoning time.

Appendix - Changes from previous versions


From v1.0 to v1.1:

Glossary


defined concept

Concept which is defined by at least one set of necessary and sufficient conditions


domain (of a property)

Concept to which a property can be applied.


Jena
Java framework for building Semantic Web applications. It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine. Jena is open source and grown out of work with the HP Labs Semantic Web Programme. (http://jena.sourceforge.net/)


primitive concept

Concept which is not defined by at least one set of necessary and sufficient conditions.


property (role)

Binary relationship between two concepts or unions of concepts (since you can define a concept as the union of other concepts).


Protégé
Protégé is a WYSIWYG ontology editor developed by the University of Stanford (http://protege.stanford.edu/). It features a version dedicated to OWL ontologies: Protégé-OWL revolving around an API partially compatible with Jena: the Protégé-OWL API


range (of a property)

Concept where a property takes its value.


subsumption
Relationship between concepts or properties. It can be roughly summarized as a kind of a “is a” relationship, meaning that children are more specific than their parents.


References


[Horridge et al., 2004] M. Horridge, H. Knublauch, A. Rector, R. Stevens, C. Wroe A Practical Guide To Building OWL Ontologies Using The Protégé-OWL Plugin and CO-ODE Tools Edition 1.0. University Of Manchester, 2004


[Napoli, 1997] A. Napoli Une introduction aux logiques de descriptions. Rapport de recherche RR 3314, INRIA, 1997.


[Napoli, 2004] A. Napoli Description Logics (DL): general introduction. In : Summer School on Semantic Web and Ontologies, Aussois, June 23, 2004.


[Staab and Studer, 2004] S. Staab and R. Studer Handbook on Ontologies. Springer, Berlin, 2004.


[Uschold and King, 1995] M. Uschold and M. King Towards a Methodology for Building Ontologies. Uschold M. Towards a Methodology for Building Ontologies Workshop on Basic Ontological Issues in Knowledge Sharing, held in conduction with IJCAI-95, 1995.



1Objects and object types in SIMBAD refer to a categorization of the nature of astronomical sources, not to objects and types as in object-oriented programming.

2http://simbad.u-strasbg.fr/

3http://wiki.eurovotech.org/twiki/bin/view/VOTech/InferenceEngineTests

4Java Graphviz API by Laszlo Szathmary, 2004

5Description logics Implementation Group

6http://vizier.u-strasbg.fr/viz-bin/VizieR

7http://adsabs.harvard.edu/index.html

8i.e. “as an anonymous subsumer of the concept”

9http://www.ivoa.net/Documents/latest/Vocabularies.html

8