of Astronomical Object Types Use Cases
IVOA Technical Note 17 January 2010
A. Preite Martinez
L. Cambrésy – email@example.com
S. Derriere – firstname.lastname@example.org
P. Padovani – email@example.com
A. Preite Martinez – firstname.lastname@example.org
A. Richard – email@example.com
The Semantic Web and ontologies are emerging technologies which enable advanced knowledge management and sharing. Their application to Astronomy can offer new ways of sharing information between astronomers, but also between machines or software components and allow inference engines to perform reasoning on an astronomical knowledge base.
This document presents several use-cases exploiting an ontology of astronomical object types. This strongly constrained ontology allows formal reasoning including similarity measurements, concept classification and consistency checking.
This is an IVOA Technical Note for review by IVOA members and other interested parties. It s a draft and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use IVOA Technical Notes as reference materials or to cite them as other than “work in progress”.
A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/
The Active Galaxy Nuclei section of the ontology of astronomical object types was made in collaboration with Paolo Padovani of ESO, the Young Stellar Objects and diffuse matter sections with Laurent Cambrésy of the CDS and the variable star and emission nebulae sections with Andrea Preite Martinez of INAF-Roma.
INAF and CDS acknowledge support from the VO-TECH design study project.
Alongside the development of the ontology of astronomical object types, a series of use-cases taking advantage of this ontology is being implemented to complete the evaluation of the possibilities offered by strongly constrained formal ontologies in astronomy.
Ontologies are structures for representing and formalizing knowledge. They can be used to guarantee the consistency of knowledge shared between men and machines as well as between machines. Their use ranges from basic classification in the case of primitive concepts ontologies to advanced inference and reasoning in the case of defined concepts ontologies.
This possibility of automated consistency checks and inferences is what interest us most. Indeed a few ontologies have been built to represent part of the astronomical knowledge, but since they lack formal definitions of the concepts, they allow very little reasoning. While this can be sufficient in some cases, it tremendously limits the application of the ontology. Though it is much more difficult, we are willing to build such definitions to set-up a semantic layer allowing to automate operations usually performed by humans since it is the human who has the knowledge to do these operations.
To experiment on these possibilities, we are building an ontology of astronomical object types along with some applications. This ontology is first based on the standardization of object types1 used in the SIMBAD2 database. These choices are motivated mainly by the possibilities offered by an astronomical knowledge engine coupled to databases, like consistency checks of the semantics of the database entries or advanced queries.
Last but not least, ontology-based systems are little dependent of the evolution of the ontology. This means that when the astronomical knowledges evolves, one just has to update the ontology accordingly and the systems exploiting it will take the changes into account, unlike dedicated systems for which each change can impact the whole system.
This document covers the general implementation choices and the different use-cases considered to this day.
To build applications exploiting the ontology, we need an API allowing us to access and manipulate directly an ontology written in OWL. Only a few exist and nearly all of them are based on the Jena framework. Jena is a Java framework for building semantic web applications. It is open source and provides -among various programming toolboxes- an OWL API.
Since it is reliable, mature and offers a good compatibility with most of the other RDFS/OWL API, Jena was our first choice of API to build our applications. We since switched to the Protégé-OWL API for convenience.
Indeed, a shortcoming of the Jena Framework for OWL exploitation comes form its very nature: Jena is a general RDF/RDFS framework. Thus Jena lacks specific primitives for OWL-based applications. It is the opposite with the Protégé-OWL API which is dedicated to OWL manipulation and provides most functions needed to exploit an OWL ontology. This results in a faster and simpler programming. Moreover, since this API is powering the Protégé ontology editor, it benefits from the same development support as the editor and is not likely to be forsaken any time soon. So after considering the pros and cons of the different API, the Protégé-OWL API is our final choice for our programming needs.
It is worth noting that these API being Java-based, this implies at least the core of the applications to be coded in Java.
The second version of the OWL API is very promising since it is completely centered around OWL and seems to yield quite a better performance than the Protégé-OWL API. We seriously considered the option to port our prototypes to it but we eventually judged that the API was not mature enough to do so. We are considering it for future works though.
The Protégé-OWL API is centered around a model of ontologies. More precisely, the API includes classes that describe every OWL item (concept, property, etc.) and the model is an instantiation of the whole ontology. In this model each OWL item is represented as an instance of its corresponding class.
This ontology model described by the class OWLModel is not the only one existing. Not only each OWL API has its own model, but more importantly each reasoner has one too. In our implementation we use the OWLModel from the Protégé-OWL API and the reasoner uses a translation of it in its own optimized format. We never have to manipulate the reasoner's model since the translation is ensured by the DIG protocol.
On top of all the classes to manipulate OWL items which can be fond in the Protégé-OWL API, a class called OntoManager has been created, mainly for convenience. Its main role is to allow the manipulation of OWL items by their names instead of their URI.
It also includes numerous methods that make operations on concept and properties easier to code by factorizing calls of finer Protégé-OWL methods so that the developed prototypes call the methods from OntoManager instead of directly manipulating Protégé-OWL classes.
Another very important issue when developing applications on ontologies is the choice of a description logics reasoning engine for the use cases that need reasoning.
Various efficient reasoners are available for description logics. All are based on different description logics and their implementations are summarized in the following table :
discontinued after 1.7.24 (authors went commercial with RacerPro)
DIG-only interface is free but not as flexible as the original RACER
difficulties with large scale hierarchies
Full support of OWL 1.1 specification, overall performance on par with RacerPro, often faster with the higher complexity description logics
Support of OWL 1.1, OWL2 EL, OWL2 QL, performance improved from v1.5 for most use-cases and logics, better interfacing
To determine which is best for our needs, we performed various tests3. The tests compared the performances of the different reasoners for the following tasks :
Checking the consistency of the ontology
Classifying the ontology (i.e. inferring subsumption relationships for both concepts and properties from the constraints on the concepts)
This led to the following
Pellet and RacerPro are currently the best reasoners. RacerPro is probably a little more reliable while Pellet has better overall performance when it comes to complex reasoning.
RacerPro being commercial, prices and possible incompatibility with some API may be a serious problem.
In terms of compatibility with all the existing API, Racer 1.7.x is probably the best while Pellet has the best support and updates (Racer 1.7 being no longer maintained)
RACER 1.7.24 is a debugged revision of RACER 1.7.23. Specifically, it handles properly complex description logics expressions like anonymous concepts as ranges, which RACER 1.7.23 reports as inconsistent.
So after starting with Racer 1.7.24 we eventually switched to Pellet since it provided better support and much higher performance.
Although a complete ontology's graph lacks legibility because of its high number of arcs, being able to get a graphic representation of a sub-graph, like a local neighborhood can prove very handy.
We needed a visualization with fast rendering, good support, free license and cross-platform capabilities. Though there were interesting Java toolboxes which would have allowed perfect integration to our applications' code we settled for Graphviz and the dot algorithm. Indeed, Graphviz has all the characteristics expected and is commonly used for ontology visualization through tools like the OWLViz visualization plugin for the Protégé ontology editor.
The only drawback of using Graphviz is the integration. Since it's not Java and has an API that would have required a lot of native calls we settled for using a little Java handler which would communicate with Graphviz. To be exact, we adapted an existing GPL Java Graphviz API 4 to fit our needs
Basically, this small API contains dedicated methods to easily create Graphviz scripts in DOT language, call Graphviz to generate the corresponding image and optional mapping file and retrieve them respectively as a Java byte array and string.
The majority of the Virtual Observatory tools being Web Services we decided to choose this kind of application. The code being Java and the reasoning operations often requiring rather high resources, the Servlet/JSP model was preferred. The servlets run on an Apache Tomcat server
Pellet and Graphviz both are not integrated in the servlets. Graphviz runs as a stand-alone application and is called by the servlets; and for both performance and convenience Pellet runs as a separate server and communicates using the DIG5 protocol (a DIG interface implementation is readily available in the Protégé-OWL API)
The implementation itself is done using the eclipse IDE and a Tomcat plugin for convenience.
Our first application exploiting the ontology of astronomical object types is a request builder for querying astronomical registries. This was presented at IAU XXVIth GA, Prague 08/2006 during Special Session 3
The goal is to have a tool able to build advanced queries in the VO registry. Since the ontology is one of astronomical object types, the queries will be performed on the <subject> element of the Registry scheme, which may contain a description of astronomical object types.
The idea of such a tool comes from the limitations of existing registry querying methods. Obviously, when querying on the <subject> field of registry entries, one must use existing keywords in the query in order to have some results. But the following problems arise when considering astronomical object types:
Some object types do not have a keyword associated.
More specific keywords are not taken into account in a broader query.
All the keywords have to be selected manually by the user if he wants the best query possible.
For example, if we consider the registry entries coming from VizieR6:
Double stars do not have an associated VizieR keyword, so one cannot query directly on them.
But specific double stars like eclipsing binaries or cataclysmic variable do have associated VizieR keywords.
Moreover, if one queries for the keyword associated with multiple stars, the query result will only be the entries featuring this keyword, which means that multiple stars with specific keywords like cataclysmic variables will be ignored.
There is no tool for automatically retrieving more specific or more general keywords.
An ontology-based query builder offers the comfort to solve the problem of finding adequate keywords via an automated process. Moreover, the process is highly reliable since it is based on knowledge formalized with the help of experts from the user community. Finally, the tool is independent from the knowledge evolution. For example, if a new object type was to be added to the existing list or another changed, all there is to do is to modify the ontology accordingly, not the request builder.
Since the subsumption relationships in the ontology corresponds to the knowledge needed to retrieve more specific or more general keywords, our request builder searches through the ontology to automatically propose adequate keywords for the user to use.
To make that possible, we have used annotations in the ontology. Concepts corresponding to existing <subject> keywords were annotated with those keywords. Currently the VizieR registry keywords have been implemented (cf. vizier:kwd annotations in the ontology)
The general idea is that the request builder will be fed a query subject and will refine or broaden the query by adding keywords found during a search within the subsumed or subsuming concepts ontology, the original query subject being the starting point of the search.
The search for keywords is done in two times: first search through the subsumed concepts to look for more specific keywords. After this step, if no keyword has been found for the query another search is performed, this time to get the most specific subsumer.
For the following examples, we consider the VizieR keywords which will be written in red on the graphs.
First search : more specific keywords. The user wants to make a query about double stars. Hence the tool searches for VizieR keywords attached to subconcepts of DoubleStar (DoubleStar included)
There is no keyword attached to DoubleStar but the tool will retrieve the ones attached to SpectroscopicBinary, EclipsingBinary, CataclymicVariable and Nova and will suggest them to the user.
Second search : more general keywords. The user wants to make a query about X-ray Binaries. The tool searches within subconcepts without success. Since XRayBinary does not have an attached keyword either, the tool searches the superconcepts
In this case, the tool suggests the keyword corresponding to XRaySource since it is the closest to the original query subject.
The user selects an astronomical object type on which he wishes to query the registry from a list of all the object types represented in the ontology.
The search for keywords is performed
The request builder outputs a list of suggested keywords corresponding to the wishes of the user.
The request builder builds a query corresponding to the checked boxes and sends it to the registry.
The idea with this application was to capitalize on the first attempts to provide better registry queries with the Registry Request Builder by extending it to other registries than VizieR and allowing the user to type a free text as the input instead of having to choose the concept corresponding to the object type he wishes to query a registry on.
So mostly it is a preprocessing before a process similar to that of the Registry Request Builder. Indeed the idea is to match the input from the user to annotations in the ontology. The best match indicates the concept representing the object type to query on. Then get the keywords to use to query the registry as it is done in the registry request builder and build the query.
The main problem to solve was to interpret the input in order to know which object type the user wished to query on. To do this the resource finder relies of the following :
The concepts of the ontology are tagged with keywords from various services.
On top of those keywords, the concepts of the ontology are tagged with an annotation named MISCgeneralKeywords which is a collection of words in natural language describing part or all of the object type represented by the concept.
If the input is actually identical to an existing keyword annotating a concept then the annotated concept is the one to base the query on.
Else, break the input into words and try and match them to the content of the MISCgeneralKeywords annotation. The closest match indicates the best concept to base the query on.
As far as querying registries go, the application is able to query either VizieR or the Astrogrid registry. In fact, the application could even query other services as long as the keywords for that service are present in the ontology. To show that possiblity the prototype application was made able to query the SAO/NASA Astrophysics Data System 7
The search for keywords is the same as the one used for the Registry Request Builder (Fully covered in 3.1.4). It is done in two times: first search through the subsumed concepts to look for more specific keywords. After this step, if no keyword has been found for the query another search is performed, this time to get the most specific subsumer.
The topic of the query can consist of words and/or keywords. It is not case-sensitive or expecting a specific word pattern.
From the input words several possible queries are created . The user chooses the one to execute on VizieR on Astrogrid (or even the ADS service which was added to prove that the resource finder could be used with other services than registries)
The idea behind the development of a concept explorer is the will to have an application that lets the user both:
Browse the concepts of the ontology.
Introduce new defined concepts to see where they are classified by the reasoner within the ontology's structure (i.e. infer their subsumers/subsumees).
This use case was explored both to test the performance of OWL API and reasoners on basic operations like ontology manipulation and classification, and also to have a web-service able to give details on concepts and allow some user-friendly browsing, especially since most ontology exploring tools are rather heavy offline programs.
Though developed with the ontology of astronomical object types in mind, the tool requires very little changes to be usable with any ontology, both for concept browsing and concept classification.
get details() refers to getting the restrictions and disjunctions on the concept as well as the asserted subsumers and subsumees.
compute neighborhood() refers to getting the inferred subsumers, subsumees and equivalent concepts.
draw () refers to generating an image graph and a corresponding mapping. In this case the nodes are concepts and the arcs the subsumption hierarchy. The mapping is used as means of visual browsing of concepts.
If the name of the concept chosen by the user is actually the name of an existing concept of the ontology, the system interprets as a browsing request centered on this concept. (Thus the sequence will be the one presented in 220.127.116.11 )
Restriction components are the property, the quantifier, a cardinality if applicable and the restricted range of the property.
Restriction components are selected through lists of possible choices. The list are dynamically updated to match the previous components chosen and the structure of the ontology (to enforce that the resulting restriction will be valid).
After all components of a restriction have been chosen, the restriction is added to the ontology model as a condition8 on the currently browsed concept and the concept's subsumers and subsumees are re-inferred to match the new restriction.
If the consistency check fails after adding the new restriction the restriction is removed from the model and the check failure is reported to the user.
Notes on the graph:
The currently explored concept is represented by a rectangle instead of a bubble.
The defined concepts are colored in orange and the primitive in blue
The annotation hierarchy graph is meant to show hierarchies of keywords with regard to the ontology.
Items of the ontology -usually concepts- can be annotated, annotations having no part in reasoning. Therefore annotations are a good way of providing additional information on concepts. In particular, it is a convenient way of indicating which keywords in astronomy vocabularies or data sources correspond to a given concept.
It may be interesting to know how these keywords are organized within the ontology. Since they are present as annotations, they follow the same hierarchy as the concepts they annotate which the application shows as a graph.
This service was further extended by adding an option to output the graphs of annotations as text, be it in the form of a hierarchy or, more interestingly, as SKOS vocabularies as defined by the IVOA recommendation for vocabularies in the VO9.
There are two possible graphs for a given annotation, depending if the user wants a graph of only the annotated concepts or all the hierarchy of the annotated concepts, including some non-annotated intermediate ones.
Note: The annotated concepts are in blue, the others in orange
An alternate output of the graph is indented plain text organized in a tree-like structure. For example, the following is an excerpt of the text output of the VizieR keywords hierarchy, with only the annotated concepts taken into account.
Last but not least, this application can be used to output SKOS vocabularies for annotations present in the ontology. These are generated using the tagged concepts, and all their relationships
The SIMBAD database contains cross-identifications, measurements and bibliography for more than 3.8 million astronomical objects. An object classification is also provided, based on a standardized list of object types that was used as a starting point for the ontology construction. The newer version of SIMBAD (SIMBAD4, released in 2006) adds the possibility for an object to be assigned multiple object classifications.
For a given object, the ontology can be used in a simple way to check the consistency of the cross-identifications:
An object often has multiple identifiers, with each acronym following the Dictionary of Nomenclature of Celestial Objects.
Some of these identifiers are associated to an object type (e.g. if an acronym was created for a paper studying only globular clusters, all objects having such an identifier are globular clusters).
We can therefore use the knowledge on all object types inferred from the different identifiers of an object, to check if this is consistent with constraints that are present in the ontology, and detect some possibly wrong cross-identifications in SIMBAD.
A typical example would be a single object having an identifier corresponding to an AGB star, and one corresponding to a QSO, if AGB star is a subclass of Star and QSO is a subclass of Galaxy, and Star and Galaxy are disjoint in the ontology, then we detect the inconsistency.
To check the SIMBAD object's consistency with regard to the ontology the following method is used :
A temporary concept is created within the ontology model, with subsumers, subsumees and restrictions matching the properties of the SIMBAD object (in other words, that temporary concept is created so that the SIMBAD object would be an instance of it.)
The temporary concept is checked for consistency.
The temporary concept is destroyed (to avoid interfering with the consistency check of the next SIMBAD object to check.)
This method requires concerting some of a SIMBAD object's properties into ontology restrictions but in return it allows the full use of the reasoner including consistency checking and ontology classification.
The concept hierarchy corresponding to each SIMBAD object checked can be viewed as a graph. The following example corresponds to the entry found inconsistent in the results shown on 18.104.22.168. Indeed, the SIMBAD object types for that object include * and QSO, translated respectively as High-powerRadio-quietAGN and StellarObject and a concept cannot inherit from both, hence the inconsistency.
Following our work on a consistency checker for the SIMBAD database, we adapted the same application to be able to use it with the NASA/IPAC Extragalactic Database (NED).
It shows very little differences with the SIMBAD version. In fact, the most important differences are that, unlike the SIMBAD consistency checker :
The application queries the NED database.
Only one object can be checked at a time.
Less measurements from the database are used for consistency checking, mostly because different measurements are present in the NED and SIMBAD databases.
As stated previously, the NED consistency checker varies from the SIMBAD checker in that it allows to check only one NED entry at a time. Also, the data for a given object may be different in SIMBAD and NED, leading to possibly different results.
The latest evolution of the database semantic consistency checking consisted of not testing the consistency at the T-Box level (concepts) but at the A-Box level (instances).
The overall process is similar to the SIMBAD consistency checking on concepts (3.5) but instead of creating a new concept corresponding to the database entry to check an instance of existing concepts is created. To allow this, an automated instantiation process was set up.
Using instances changes very little in the overall process: instead of checking the object described by a SIMBAD entry by inserting a new concept corresponding to it and checking the ontology's consistency, here it is directly an instance that is added and checked for consistency with regard to the concepts.
The strategy selected to create and check the instance corresponding to the SIMBAD entry to check is the following:
Get all the concepts corresponding to all the otypes of the SIMBAD entry to check.
Create the instance to check as instance of all these concepts.
Check for consistency
If consistent, add restrictions on the instance that corresponds to the data describing the entry
Check for consistency
Destroy the instance
At the moment, both Protégé-OWL API and OWLAPI v2 does not allow to ask a reasoner to check the consistency of a modified ABox without reclassifying the TBox first, even though it should be feasible. This directly implies that it is currently impossible to check if an instance is consistent with regard to the concepts without reclassifying the whole ontology.
Unfortunately, this also means that while this problem is unsolved, the expected performance gain from working on instances is null. Hopefully, OWLAPI version 3 coupled with the more recent versions of Pellet will allow this operation in the near future.
Following the direction of the annotation , a keyword mapper has been developed. Its goals are first to map keywords with regard to the ontology and second to output these mapping in various formats so that they can be either used or compared.
This last option is especially interesting since some keyword mappings already exist so comparing them with the results obtained with the ontology-based mapping may lead to enhancements in ontology annotations or the mapping actually used by other applications.
Mappings can be outputted on screen as well as text files.
The current version of the tool is centered on registries but could be parametrized with any kind of keyword.
Here is an example using a version focused on the mapping between Astronomical Data Center and VizieR registry keywords.
The future of this work is divided in two main orientations: continue to improve the ontology of astronomical object types and develop applications exploiting it.
Future works include further developments on the mapping of keywords and the automated construction of an instance base of SIMBAD objects, the main goal being to dramatically improve the performance, mainly by lowering the reasoning time.
From v1.0 to v1.1:
Registry Resource Finder section added (3.2)
Annotation Hierarchy Graph section updated (3.4)
SIMBAD consistency checker section updated (3.5)
NED consistency checker section updated (3.6)
SIMBAD consistency checker through instances section added (3.7)
Keyword Mapper section updated (3.8)
Concept which is defined by at least one set of necessary and sufficient conditions
domain (of a property)
Concept to which a property can be applied.
Java framework for building Semantic Web applications. It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine. Jena is open source and grown out of work with the HP Labs Semantic Web Programme. (http://jena.sourceforge.net/)
Concept which is not defined by at least one set of necessary and sufficient conditions.
Binary relationship between two concepts or unions of concepts (since you can define a concept as the union of other concepts).
Protégé is a WYSIWYG ontology editor developed by the University of Stanford (http://protege.stanford.edu/). It features a version dedicated to OWL ontologies: Protégé-OWL revolving around an API partially compatible with Jena: the Protégé-OWL API
range (of a property)
Concept where a property takes its value.
Relationship between concepts or properties. It can be roughly summarized as a kind of a “is a” relationship, meaning that children are more specific than their parents.
[Horridge et al., 2004] M. Horridge, H. Knublauch, A. Rector, R. Stevens, C. Wroe A Practical Guide To Building OWL Ontologies Using The Protégé-OWL Plugin and CO-ODE Tools Edition 1.0. University Of Manchester, 2004
[Napoli, 1997] A. Napoli Une introduction aux logiques de descriptions. Rapport de recherche RR 3314, INRIA, 1997.
[Napoli, 2004] A. Napoli Description Logics (DL): general introduction. In : Summer School on Semantic Web and Ontologies, Aussois, June 23, 2004.
[Staab and Studer, 2004] S. Staab and R. Studer Handbook on Ontologies. Springer, Berlin, 2004.
[Uschold and King, 1995] M. Uschold and M. King Towards a Methodology for Building Ontologies. Uschold M. Towards a Methodology for Building Ontologies Workshop on Basic Ontological Issues in Knowledge Sharing, held in conduction with IJCAI-95, 1995.
1Objects and object types in SIMBAD refer to a categorization of the nature of astronomical sources, not to objects and types as in object-oriented programming.
8i.e. “as an anonymous subsumer of the concept”