International  Virtual  Observatory  Alliance
 


Virtual Observatory Architecture Overview
Version 1.0

IVOA Note 2004-06-14

Working Group:
http://www.ivoa.net/twiki/bin/view/IVOA/IvoaArchitecture
Author(s):
Roy Williams
Bob Hanisch
Tony Linde
Johnathan McDowell
Reagan Moore
François Ochsenbein
Masatoshi Ohishi
Guy Rixon
Alex Szalay
Doug Tody

Abstract

This document provides a high-level conceptual overview of the IVOA-supported architecture of Virtual Observatories that has emerged over the last few years.

1. Vision

A car is complicated under the hood, but driving it is simple. This analogy is our vision in creating the Virtual Observatory.

In astronomical analysis packages such as IRAF and IDL, astronomers select building blocks, and build up analysis tools from them. In the VO, the methodology will be much the same, except that the building blocks will be web services that may be running on remote computers.

In these packages, there are core components that form a foundation for higher level functionality, for example, reading a FITS file or coadding images. The VO follows exactly the same model, but what is new is the shift to world of distributed data. The data is so large that it is no longer possible to simply download all of it, and it is difficult to find the relevant data. The data will no longer be copied to a workstation, but rather the computing moves to the data.

Therefore the corresponding core components for the VO are:

We have set out to build the simplest services first, to make it easy for people to publish and discover data, and to use these services.

We hope for a network effect: the paradigm changes when most data is published in VO-compliant form, and most people are using VO protocols. The big data providers are already exposing data through VO protocols, and the VO provides a means for smaller datasets to be published as VO-compliant.

While these changes are occurring "under the hood", the paradigm of building an analysis as connected components will remain much the same -- it's just that the components will now be spread all over the world.

The computational components will be built as "web services" -- a wide-ranging industry standard -- that allows remote functionality to be available as if it were local. These services already carry a "self-description" (WSDL), and the VO projects are adding further astronomy-specific description -- the area of sky that a survey covers for example. These different descriptions of services and their underlying datasets are available through a collection of VO-compliant registries. These registry implementations are built independently, but communicate with each other to present a unified view -- analogous to the Domain Name System (DNS) of the internet.

The rest of this document is a conceptual description of how this vision is achieved in the VO architecture.

Services

As noted above, the architecture of the VO is Service Oriented, meaning that components of the system are defined by the nature of requests and responses to services. Because of this, the description of each service is based on the choice of the protocols for requests and responses, rather than classes and methods. Each service is autonomous, and its boundaries well-defined. Services are inherently distributed, so they can be deployed on any machine that seems optimal.

Data is communicated between services in two basic formats: FITS, this has been an astronomical standard for many years; and XML, a standard syntax for encoding information. In the latter case, the IVOA process allows a new proposed schema to become a standard through a well-defined community process; successful examples so far are VOTable, for representing tabular data with rich metadata; and VOResource, for describing entities in an IVOA-compliant registry (see below). Future standards will include VOFrame, for space-time coordinate systems, and VORegion, for subsets of the celestial sphere.

Thus IVOA-compliant services are built to exchange messages that can be XML documents and dictionaries (sets of keyword-value pairs), as well as traditional binary formats such as FITS files. IVOA-compliant service type is defined by the nature of these messages. The community of data providers is encouraged to implement such services, and the community of data consumers is encouraged to build portals that use the service types.

The following diagram shows the essential components, which will be discussed in the rest of this paper.

2. Architecture Overview

The objective of the Virtual Observatory is shown at the top of the figure: to improve and unify access to astronomical data and services for primarily professional astronomers, but also for the general public. The top bar of the figure represents this objective: discovery of data and services, reframing and analysing that data through computation, publishing and dissemination of results, and increasing scientific output through collaboration and federation. The IVOA does not specify or recommend any specific portal or library by which users can access VO data, but some examples of these portals and tools are shown in the grey box.

Different coloured vertical arrows represent the different service types and XML formats by which these portals interface to the IVOA-compliant services. In the IVOA architecture, we have divided the available services into three broad classes:

These services are implemented at various levels of sophistication, from a stateless, text-based request-response, up to an authenticated, self-describing service that uses high-performance computing to build a structured response from a structured request. In the VO, it is intended that services can be used not just individually, but also concatenated in a distributed workflow, where the output of one is the input of another.

The registry services facilitate publication and discovery of services. If a data center (or individual) puts a new dataset online, with a service to provide access to it, the next step would be to publish that fact to a VO-compliant registry. One way to do this is to fill in forms expressing who, where, and how for the service. In due course, registries harvest each other (copy new records) and so the new dataset service will be known to other VO-registries. When another person searches a registry (by keyword, author, sky region, wavelength, etc), they will discover the published services. In this way the VO advances information diffusion to a more efficient and egalitarian system.

In the VO architecture, there is nobody deciding what is good data and what is bad data, (although individual registries may impose such criteria if they wish). Instead, we expect that good data will rise to prominence organically, as it does on the World Wide Web. We note that while the web has no publishing restrictions, it is still an enormously useful resource; and we hope the same paradigm will make the VO registries useful.

Each registry has three kinds of interface: publish, query, and harvest. People can publish to a registry by filling in web forms in a web portal, thereby defining services, data collections, projects, organizations, and other entities. The registry may also accept queries in a one or more languages (for example an IVOA standard Query Language), and thereby discover entities that satisfy the specified criteria. The third interface, harvesting, allows registries to exchange information between themselves, so that a query executed at one registry may discover a resource that was published at another.

Registry services expect to label each VO resource through a universal identifier, that can be recognized by the initial string ivo://. Resources can contain links to related resources, as well as external links to the literature, especially to the Astronomical Data System. The IVOA registry architecture is compliant with digital library standards for metadata harvesting and metadata schema, with the intention that IVOA-compliant resources can appear as part of every University library.

Data services range from simple to sophisticated, and return tabular, image, or other data. At the simplest level (conesearch), the request is a cone on the sky (direction/angular radius), and the response is a list of "objects" each of which has a position that is within the cone. Similar services (SIAP, SSAP) can return images and spectra associated with sky regions, and these services may also be able to query on other parameters of the objects.

The OpenSkyQuery protocol drives a data service that allows querying of a relational database or a federation of databases. In this case, the request is written in a specific XML abstraction of SQL that is part of ADQL (Astronomical Data Query Language).

The IVOA architecture will also support queries written at a more semantic level, including queries to the registry and through data services. To achieve this, the IVOA is developing a structured vocabulary called UCD (Unified Content Descriptor) to define the semantic type of a quantity.

The IVOA expects to develop standards for more sophisticated services, for example for federating and mining catalogs, image processing and source detection, spectral analysis, and visualization of complex datasets. These services will be implemented in terms of industry-standard mechanisms, working in collaboration with the grid community.

Members of the IVOA are collaborating with a number of IT groups that are developing workflow software, meaning a linked set of distributed services with a dataflow paradigm. The objective is to reuse component services to build complex applications, where the services are insulated from each other through well-defined protocols, and therefore easier to maintain and debug. IVOA members also expect to use such workflows in the context of Virtual Data, meaning a data product that is dynamically generated only when it is needed, and yet a cache of precomputed data can be used when relevant.

In the diagram above, the lowest layer is the actual hardware, but above that are the existing data centers, who implement and/or deploy IVOA standard services. Grid middleware is used for high-performance computing, data transfer, authentication, and service environments. Other software components include relational databases, services to replicate frequently used collections, and data grids to manage distributed collections.

A vital part of the IVOA architecture is MySpace so that users can store data within the VO. MySpace stores files and DB tables between operations on services; it avoids the need to recover results to the desktop for storage or to keep them inside the service that generated them. Using MySpace estblishes access rights and privacy over intermediate results and allows users to manage their storage remotely.

3. Web and Grid Services

The IVOA architecture uses services at different levels of sophistication, as illustrated in the Services bar in the figure. These levels are:

The SOAP and Grid services have a self-description method, so that a bootstrap process is possible: simply knowing the location of the service (URL) allows a client to get the WSDL description, which informs the client how to call the service, which allows the client to use the service for its intended purpose. But the WSDL is only part of the necessary information: it specifies only method signatures with datatypes. In the IVOA architecture, we define a VO-compliant web service as one that can also supply a VOResource description description of the service, including curation, description, sky region, IVOA identifier, and other information (see Registry section below). This description may be obtained from a registry or from a method of the service itself (as the WSDL is obtained).

Therefore a VO Standard Service Interface is being developed to define the meaning of a VO-compliant service, some methods that all such services must implement. The interface can return the VOResource description, and other information such as availability, scheduled downtime, time of last harvesting, etc. Given that three mechanisms are being used to implement the services, the corresponding challenges are how to provide equivalent interfaces across the implementations, how to manage the scale of the data manipulation requests, how to handle public (no charge) access versus use of allocated resources, and how to handle anonymous access versus authenticated access. The choice may be to restrict some implementation requirements (authentication, authorization) to specified services (Grid services).

In addition to authentication and security, the IVOA will also need to decide on a means to deal with long-running compute services, and with events and notification from services.

4. Standard Data Models

Data Models represent a view of an entity as an object in the sense of object-oriented programming; the object can be a subclass of another through inheritance; or the object can be coerced into looking a certain way by implementing a given interface. Data models can be expressed in several semantically-equivalent ways, such as C++ header files, Java interfaces and classes, Unified Modelling Language (UML), or as XML schemata.

Data Models provide the semantic protocol for exchange of queries, metadata, and data between the services and the clients. Registries, data services and compute services will describe data and resources with standard sets of descriptions, so that the same kind of data is described the same way. The data model schemata will provide definitions of the metadata needed for particular kinds of data as well as place that metadata within a standard structure.

The following sections distinguish data structure models and semantic models, reflecting alternate paradigms for building and using these data formats. In the data-structure approach, a generic structure is created, for example table, array, or dictionary, which define these concepts with attention to the specialized astronomical-scientific semantics beyond those needed by the basic computer-science concepts. Generic tools can be built that handle the semantics of the data structure, no matter what astronomy-level semantic content of the data itself. For example, the records of a table could represent quasars, people, or purchase orders; similarly the elements the keyword-value pairs of a dictionary could represent many things. These structural models can then be refined by restriction: for example a cone-search result object is a table such that every record is located on the sky with coordinates, and every record has an ID attribute.

In the semantic modelling approach, a schema is built corresponding to a well-understood scientific data object, such as a spectral passband, a sky image, or a light curve. Data models can also be built in this way to cover very basic scientific concepts, for example a Quantity is defined in terms of its value, a unit, a semantic type, and an error estimate.

The distinction between the two approaches is fuzzy. For many generic astronomical algorithms (smoothing, filtering, etc.) a semantic description which is rather general and close to the data structure is all the algorithm needs - no need to know whether the thing you are smoothing is an image or a time series - while for more specialized algorithms the detailed properties of the physics do matter.

Data-Structure Modelling

FITS

The venerable FITS format will remain a vital part of astronomical data storage and transmission. Its semantics are simple and generic: sets of keyword-value pairs and block of binary data. The VO formats below can be used, for example, to add further metadata to a FITS file, to define a collection of FITS files, or to express the FITS keywords in a different syntax (XML).

Building bridges from FITS to XML allows the utilization of sophisticated tools developed in the business world; to implement web services, for automatic generation of code, to ease the development of complex data models. In particular, the new VO formats strive to create depth of meaning for metadata: take for example a FITS keyword such as "EXPOSURE". This may mean that the value is an exposure time, but it is difficult to discover or verify this, or to find out the units. The keyword may not be a time, it may be a boolean value (exposure was/wasn't made). In the VO formats, by contrast, we expect metadata to belong to a namespace that can be drilled down to documents and organizational contacts.

VOTable Format

The VOTable format is an XML standard for representing a set of tables, aiming at exchanging properly described data between agents acting in the framework of the Virtual Observatory. In this context, a table is an unordered set of rows, each of a uniform format, as specified in the table metadata. Each row in a table is a sequence of table cells, and each of these contains either a primitive data type, or an array of such primitives.

VOTable is designed as a flexible storage and exchange format for tabular data, with particular emphasis on astronomical tables. VOTable has built-in features for big-data and Grid computing. It allows metadata and data to be stored separately, with the remote data linked. Processes can then use metadata to "get ready" for their input data, or to organize third-party or parallel transfers of the data. Remote data allow the metadata to be sent in email and referenced in documents without pulling the whole dataset with it: just as we are used to the idea of sending a pointer to a document (URL) in place of the document, so we can now send metadata-rich pointers to data tables in place of the tables themselves. The remote data are referenced with the URL syntax protocol://location, meaning that arbitrarily complex protocols are allowed.

When we are working with very large tables in a distributed-computing environment (the Grid), the data stream between processors, with flows being filtered, joined, and cached in different geographic locations. It would be very difficult if the number of rows of the table were required in the header. We would need to stream in the whole table into a cache, compute the number of rows, then stream it again for the computation. In the Grid-data environment, the component in short supply is not the computers, but rather these very large caches! Furthermore, these remote data streams may be created dynamically by another process or cached in temporary storage: for this reason VOTable can express that remote data may not be available after a certain time (expires). Data on the net may require authentication for access, so VOTable allows expression of password or other identity information (the "rights" attribute).

Other Formats

The IVOA architecture will also define some other data-structure formats. One example is the Array structure, representing a subscript-indexed set of voxels in n dimensions, where each voxel value has the same primitive type (eg. array of float, array of double-complex). Curation metadata is associated with the array itself, as well as with each axis of the array.

The dictionary structure is in such common use that it can be invisible -- it is an unordered set of keyword-value pairs, each of which may also have a comment. A FITS header can be naturally written as a dictionary. The dictionary may contain naked keywords (no associated value), and there may be a list of values associated with a keyword.

Semantic Modeling

The Data Models Working Group of the IVOA is working on a number of these models. Schemas are under development for, among others:

These semantic models are then mapped to a data structure meta-model such as VOTable, FITS or a specialized XML schema, and thus to a rule for creating serialized instances which can be interoperably exchanged.

In addition to the schema for describing data objects, there are metadata schemas describing protocols of the VO itself. The VOResource data model is used to describe a generic resource, and includes curation information (title, authors, description, date, format, etc), and a unique identifier for the resource. The VOResource may be extended to a model for specific resource types, such as Services, Projects, Organizations, and Datasets.

5. Data Services

A cornerstone of the IVOA architecture is a collection of standard services with well-defined request and response formats, thereby providing a "standard spigot" for astronomical data. The objective is that any IVOA-compliant data consumer can work with any IVOA-compliant data provider.

The scope of the IVOA-compliant Data Services also includes reference software used to implement data access services. Ultimately this will include advanced capabilities such as data subsetting and filtering, data model mediation, and application of server-side analysis components, i.e., for grid computing. Most data access is to virtual data which is computed upon demand. We also include client-side software to demonstrate an end-user analysis capability and to perform end-to-end testing and integration. Ultimately most analysis software will come from the user community, not from IVOA member organizations. Data Services builds upon and integrates VO technology for metadata, data models, data formats, registries, and queries.

The types of science data dealt with by the Data Services potentially include all of the following:

All data services consist of both a query, used to discover and negotiate with a service to determine what data it can provide, and a data access method, used to access the actual data, which is often computed on the fly. Both simple parametric queries and complex query language (parsed) queries are possible: these two cases are discussed in the next two sections. Both types of queries return the same query response table and share the same data access methods.

Simple Access Services

The highest priority goes to object catalogs and 2D sky images, for which prototype data access services are already available. The simplest data spigot is the Cone Search; the request is a point and radius on the sky (a cone), and the response is a VOTable with certain attributes, to ensure that each returned record is associated with a position on the sky.

The Simple Image Access Protocol (SIAP) is based on a rectangular region on the sky defining an "ideal image" which the service should try to return. Other specifications such as bandpass and scale may be used to refine the query. The response is a VOTable of image descriptors, each containing standard dataset and image metadata, including position of the image center, scale, image format, and an access reference URL that points to the image data itself.

1D spectra can be exposed in a similar way, through the Simple Spectral Access Protocol (SSAP), also based on a cone in the sky with other qualifiers, and returning a VOTable of spectrum descriptors and a URL pointing to the data itself. Variations on SSAP are used to provide access services for SEDs and time series data. The more sophisticated spectral data cubes are best treated as a general type of image. Further spigots can be defined to integrate event list data and visibility data into the VO, although our expectation is that most VO users will be interested in images produced from such data rather than the original data. Such image generation may need to be on-the-fly since there is in general no one best way to produce images from event data or UV data.

OpenSkyQuery and ADQL

The Astronomical Data Query Language (ADQL) allows queries that are roughly equivalent to SQL (Structured Query Language), and will evolve to a wider semantic range of queries (see section 9). ADQL has two expressions both of which are equivalent to each other: an SQL-like string and an XML rendition of the query that can be converted to any one of the vendor-specific dialects of SQL. The XML expression of ADQL (ADQL/x) is recommended in the Virtual Observatories for communications between portals and data servers. The string version of ADQL (ADQL/s) is more suitable for human to understand the queries. ADQL includes astronomy-specific extensions, e.g., for proximity queries and for sky regions. ADQL is designed to be the request format of the OpenSkyQuery catalog query protocol.

ADQL servers will be integrated into easy-to-use portals. Typically a Portal queries the registry to find SkyNodes, then interrogates their relational database with the OpenSkyQuery protocol. A client can first request the table names and descriptions, then request the schema of any of the tables, then build a suitable query to select data. ADQL also allows joins across SkyNodes. Similar actions can be done by preparing, e.g., a GUI that creates equivalent ADQL scripts. The Portal will formulate a plan and create multiple queries, typically one per archive. And the results are collected, joined, and served to the users.

ADQL will eventually be integrated into all the data services to provide an advanced query capability. Both simple parameter-based queries and query language-based (ADQL) queries will be provided.

As the VO software and infrastructure becomes more complicated it will become increasingly important to provide some reusable VO framework-level software to simplify the job of those putting up services or writing client-side applications. As we move to grid computing it will become necessary to dynamically deploy computational software on grid-enabled computational resources. For this to be feasible we will need some interoperability standards in the areas of computational frameworks and components.

6. Registry

An IVOA-compliant registry provides XML-formatted metadata in response to queries. The metadata may be in different formats for different audiences: for example a librarian may be interested in receiving metadata conforming to standards established by the library community, such as Dublin Core or METS (Metadata Encoding and Transmission Standard). The METS standard provides a framework for defining extension schema to characterize VO specific administrative, descriptive, structural, and behavioral metadata.

The IVOA has published a standard (VOResource) for metadata that is semantically meaningful to astronomers for use in Data Collections, Projects/Organizations, and Services. If a service is one of the IVOA standard types and is associated with a VOResource description, then it is a VO-compliant service.

The VOResource structure is based on Dublin Core, a metadata standard that has been standardized by the library community for describing in rough measure almost any human creation. Metadata elements include Title, Authors, Description, Format, Date, etc. For VOResource, the structure has been extended with data models and schema that describe regions of the sky, coordinate frames, and services.

Services can be of any of the three types listed above (GET/POST, SOAP, Grid). The second and third types of service must describe themselves with a WSDL file or equivalent. In the IVOA architecture, the WSDL file is considered as just a set of metadata, like the METS schema for the librarians. In the business world, tools and workflows are being set up on a foundation of SOAP services and their WSDL descriptors. IVOA-compliant registries are thus available to these more generic tools. An interesting challenge is the merger of the library METS standard with WSDL service description and Archival Information Packages from the preservation community. Through METS extension schema, it should be possible to use the same metadata framework to describe services, catalogs, and preservation environments.

The VOResource description of a VO-compliant service may be exposed through two methods: it can be produced on demand from the service itself, in the same way that the WSDL can be obtained from a SOAP-compliant service, or it may reside in a registry.

IVOA-compliant registries are able to exchange information through a mechanism called harvesting, so that resource metadata known to one may be replicated in others. In this way, a collection of independent registries can become a single virtual registry. The harvesting protocol is OAI-PMH (Open Archives Initiative -- Protocol for Metadata Harvesting). Registries can query each other for the most recently updated records, then copy them. Every IVOA-compliant registry must support OAI: this ensures:

7. Compute Services and Grid

Bulk Access: Parallelism, Aggregation, and Replication

Very high performance applications may wish to work more directly with the data, without being slowed by layers of software, so there is a possibility for direct, bulk access to data stores. The IVOA standard data services would provide the names of files that are to be downloaded, but then bulk access mechanisms would be used for actually working with these files.

The data engineering requirement can be expressed more generically as latency versus granularity management. Sending a million images over a network one at a time is prohibitively costly (takes a very long time). Aggregating images into an appropriately sized container before transmission can decrease the transmission time by orders of magnitude. On current networks, a similar analysis is needed for choosing between serial or parallel data transport. Granularity analysis is also needed to choose how to aggregate files before storage in an archive to minimize the impact on the archive name space. The tools that manage data granularity are embedded in data grid software and accessed through grid services. The conjunction of all of the granularity analyses occurs in the data flow pipelines that manage the processing or catalog records and archive images.

Efficiency of data access can be further optimized by replication of well-used datasets, in terms of both geography and protocol. Replication means that the IVOA architecture should support the idea that the same logical data object might have multiple physical copies, so that there is a layer of indirection between the logical name and the physical name. The reason may be to find the closest copy of a dataset to a given compute resource; alternatively, the requirement may be to find a copy of tyhe dataset which is exposed through an optimal protocol.

Authentication, Authorization, and MySpace

Authentication is the process where a client proves who they are, and authorization is where that trusted client thereby gains access. There are many reasons for restricting access to services: fresh data that is sequestered until the instrument team has analyzed it; intensive compute services that are not free; restrictions on storage of temporary data (the MySpace concept); knowing who is publishing to a registry or service; or security concerns may require restrictions on service deployment.

In the IVOA architecture, there may be layers of indirection between clients and their authorization, through the concepts of communities and groups. Once a client is authenticated to a system, she may be a member of one or more communities, and within each community she may be in one or more groups. A community is the administrative unit for particular research, supported by a Principal Investigator, a description, web page, etc. Within the community, there may be groups that give different permissions to different classes of users.

Formal authentication to a community may not be needed, in which case that client would be in the anonymous group: for example the public web pages of a community would be accessible to anonymous clients. Authentication may be simply a matter for filling in a form with name and address, giving the client a login/password. More secure authentication may be through passwords, public/private keys, X509 or other certificates, or through one-time password keychain devices.

Services: Persistency, Workflow, and Deployment

In previous sections, it has been assumed that services are stateless, so that a request elicits a response, and the service reverts back to the same state. A persistent service, however, takes part in a stateful conversation with a client. On the server, the state may be kept in the memory or in a database, it may be accessed either through a session ID or through a customized service created by a service factory. In general, the state associated with persistent services will expire after a certain time.

Services may be connected, with the output of one being the input of another, creating a graph of services The concept of workflow concerns the management and control of such a graph. Service deployment may mean sending code to a compute resource for execution, or it may mean sending a script that calls astronomical software pre-installed at the grid node, or it may mean temporarily installing an astronomical web-service on the grid node for a short time.

The grid-service vision involves the creation and life-cycle of an ad hoc workflow of services by automatic selection of compute resources and deployment of relevant services to these.

The IVOA expects to work closely with the grid community to standardize service descriptions to allow composition and deployment.

Virtual Data

Virtual Data is data that may be computed dynamically from other (possibly virtual) data, or it may be drawn from a cache of previously instantiated data -- which may have been computed in batch fashion By merging the batch and on-demand paradigms, we create a flexible and efficient scheme, that can produce popular and static data products quickly through cached copies, yet also produces uncommon or highly volatile data products through the same user-interface, but with computing on demand. The IVOA expects to support this paradigm through creation of standard registry semantics that can express virtual data products and how to build them.

Preservation

Data engineering is also needed for preservation. Whatever technology is chosen today by the IVO will become obsolete within the next 5 years. This includes the choice of encoding format, the choice of service protocol, the data flow pipelines, and the underlying hardware systems. The IVO needs to be able to guarantee continued access to catalogs, image archives, and processing pipelines across multiple generations of technology. This is typically expressed as forms of infrastructure independence or virtualization mechanisms. Again the mechanisms that enable incorporation of new technology are currently incorporated in data grids.

8. Semantics

Data, compute, and registry services defined above are defined in a formal way, in terms of data structures and schemas. It is assumed that a human is the bridge between a semantic question (eg. "where is there polarized submillimeter data of elliptical galaxies"), and the corresponding service request, that may be in ADQL, Xquery, or something else. The human would do simple keyword search in natural-language descriptions of resources to find the relevant datasets and the services that expose them.

The VO is supporting efforts to extend this semantic search capability in several areas. There is the standard vocabulary of "semantic data types" (UCD, below), as well as ontology-based efforts, collected under the umbrella of the third level of the VO Query Language.

Unified Content Descriptors

The Unified Content Descriptor (UCD) is a formal vocabulary for astronomical data that is controlled by the IVOA. The vocabulary is restricted in order to avoid proliferation of terms and synonyms, and controlled in order to reduce ambiguity as far as possible. It is intended to be flexible, so that it is understandable to both humans and computers. UCD describe astronomical data quantities, and they are built by combining words from the formal vocabulary.

A UCD description of a quantity does not define the units or name of the quantity, but rather 'what sort of quantity is this?'; for example phys.temperature is a semantic class description of temperature, without implying a particular unit.

The UCD committee has tried to resist the temptation to allow the UCD syntax to be overly expressive. Every measurement in science has the possibility of essentially infinite description: the people, the instruments, the error analysis, the reasons, the funders, and so on. We have tried to find a way of organizing atomic specifiers (words) so that it is easy to write simple software for machine use, but also possible to write better, more sophisticated software. This organization, in terms of properties and concepts, maps well to knowledge representation methods outside astronomy. We hope to build more sophisticated "intelligent" systems in the future, a project that has come to be called "UCD3". The major goal of UCD is to ensure interoperability between heterogeneous datasets. The use of a controlled vocabulary will hopefully allow an homogeneous, non-ambiguous description of concepts that will be shared between people and computers in the IVO. We hope in the future to put more semantic expressiveness into the UCD framework, but always keeping a pragmatic eye on those who would create and use the software that will parse the UCD vocabulary.

VO Query Language Level 3

The highest level of VOQL is a semantics-based language that allows astronomers to build queries in the language of astronomy rather than the language of databases. Efforts with an ontology of units allows queries expressed in one unit to engage resources expressed in another unit. Similarly astronomical coordinates can be fungible, so that a query in equatorial coordinates can return a resource expressed in galactic coordinates -- but in the correct part of the sky. A similar approach allows federation of spectral data that uses different spectral coordinates.

This level of semantics, describing the structure of astronomical datasets, interacts with the astronomical semantics provided by the UCD schema to quantify use of astronomical knowledge. For example, a data model to define spectra may specify that a spectrum has an array of data representing an observable quantity and an array of values representing the spectral coordinate. The UCDs associated with an instance of this data model will specify whether that particular spectrum has an observable of flux or surface brightness, and a spectral coordinate of frequency or wavelength. A data model may also represent a higher level resource such as a compute service, in which the input parameters required by a particular class of service such as source detection programs are defined. Again, the values of some data model metadata may be UCDs which describe what kind of parameters are to be returned by the source detection.