VOTable issues for possible discussion at the Inter-operability meeting.
Clive Davenhall
acd at roe.ac.uk
Wed Apr 30 09:54:12 PDT 2003
29/4/03.
Dear Colleague,
Next week I shall be presenting a paper about the VOTable at the `XML
Europe 2003' conference in London (http://www.xmleurope.com/2003/about.asp).
I believe that a version of the paper will be available from the
conference Web site (in XML!) and I also hope to make it available in a
more common format. In the meantime, the paper had a section on `Open
Issues' for the VOTable format, and I thought that these might make a
useful contribution to the VOTable discussions at the forthcoming
IVOA Inter-operability meeting in Cambridge. The `open issues' that I
came up with are:
* compatibility between VOTables and FITS,
* is the VOTable likely to be used as a storage format, as well as an
interchange format?
* the efficient representation of large or binary tables,
* should the <DEFANGED_LINK> element be included in the formal standard?
* using the VOTable for tables other than catalogues of astronomical
objects; particularly handling the coordinate systems used in Solar
and Solar-Terrestrial Physics tables,
* including hyper-text and mark-up in the <DESCRIPTION> tag,
* defining semantic rather than syntactic standards.
These points are expanded on below. The text is largely taken from the
paper, so it might read a little oddly as it was originally intended for
a non-astronomical audience.
* The FITS format is widespread in astronomy and is likely to remain
so. Consequently, it is important to be able to continue to access
FITS files. The VOTable was deliberately designed so that, with a
few minor exceptions that are unlikely to be important in practice, any
FITS binary table can be converted to a VOTable without loss of
information. The converse, however, is not true.
One respect in which the two formats differ is that the VOTable is a
streaming format whereas FITS tables are not. The reason is that in
a FITS table a keyword specifying the number of rows in the table must
be included in header information at the front of the table.
Consequently a server transmitting a FITS table must have access to the
complete table, so it can insert the number of rows in the header
before it can start transmitting the table. Conversely, the VOTable
format contains no knowledge of the number of rows in a table.
Consequently a server can start transmitting a VOTable whilst the
query creating the selection from which the VOTable is generated is
still in progress. It remains to be seen whether this distinction
proves important in practice.
* The VOTable was invented as an interchange format for selections
extracted from a catalogue and returned to a remote client via the
Internet. However, experience with other formats, most notably
FITS but also TSV (Tab-Separated Value), indicates that VOTable will
probably also be used as a storage format (FITS began as in interchange
format; remember that the acronym stands for `Flexible Image Transport
System'). Many data archives now use FITS as their storage format.
* Many astronomical tables are large and the subsets extracted
from them as a result of selections can also be large. Representing a
table as a sequence of characters using the <TABULAR> element increases
its size enormously compared to a binary representation. This increase
in size is acceptable for small tables, but for large ones can be a
serious problem. Moreover, the problem appears irrespective of whether
the VOTable is being used as a transport format (there are more bytes to
move) or as a storage format (the files are larger). The mechanisms
invented in the VOTable to circumvent this problem, the <FITS> and
<BINARY> tags, are something of a compromise and not really in the
spirit of XML. Also, if the table is stored as a separate file in FITS
or binary format then there is always the possibility that the files
will become separated and one or the other will be lost. In the future
we are likely to develop XML-based formats for retrieving `bulk data'
such as images and spectra and in these cases having an efficient
representation of binary numeric data is even more important than it is
for catalogues.
* An appendix to the VOTable standard described a <DEFANGED_LINK> element, which
was not part of version 1.0 but which might be included in future
revisions of the standard. This element allows, amongst other uses,
columns to be created (projected in the jargon of relational databases)
on-the-fly when a table is read. The <DEFANGED_LINK> tag is potentially a
powerful feature and we need to decide whether it should be included in
future versions of the format.
* Although the VOTable is primarily intended for representing
catalogues of astronomical objects, it should also be capable of
representing other sorts of astronomical tables, such as atomic line
lists, X-ray event lists and the tables encountered in related
disciplines such as Solar Physics and Solar-Terrestrial Physics. The
VOTable is sufficiently flexible to handle these requirements, but for
Solar and Solar-Terrestrial work the additional coordinate systems used
in these disciplines will need to be supported. In version 1.0 the
coordinate systems allowed are specified in the DTD, which makes
adding new coordinate systems a major revision of the format.
* Currently the <DESCRIPTION> element can contain only plain text. Future
versions should be able to include HTML, or a subset thereof, with
hyper-links to external URLs.
* The VOTable, FITS and similar formats such as TSV, are basically
syntactic rather than semantic standards. That is, they are mostly
concerned with how to represent items of information and do not ascribe
meaning to particular items of information. That is, columns and
parameters representing similar quantities will appear with different
names in different catalogues. For the level of inter-operability
envisaged for the VO, with automatic identification of catalogues
relevant to some query, it will be necessary to assign standard
quantities with agreed meanings to catalogues. The CDS's UCDs for
classifying columns are a very important step in this regard. At a
higher level, astronomy also has a thesaurus of agreed terms (the
`IAU Thesaurus'). There are similar problems with the units in which
quantities are stored in catalogues. Though there are recommended
standard units, a wider range of un-standardised units are encountered
in practice. (These deficiencies have ramifications in the wider VO,
beyond the VOTable.)
An additional issue, which I didn't mention in the paper, is how WCS
information is to be represented in VOTables (though I suppose that this
is a special case of defining semantic standards). I think that there
are two possible approaches:
* simply adopt the FITS definitions `as is', but substituting the FITS
keywords with VOTable PARAMS,
* take an approach similar to Starlink's AST library, which is less
prescriptive than the FITS scheme and allows greater freedom and
flexibility in defining the transformation between pixel positions and
world coordinates. AST describes a WCS as an arbitrary collection of
coordinate frames with interconnecting mappings. These frame and
mapping objects can be imported and exported as a set of structured
FITS header cards, or as standard FITS-WCS header cards, or in other
forms such as XML. AST currently supports many different celestial and
spectral coordinate systems, including automatic conversion between
related systems. Similar support for a range of temporal coordinate
systems is currently being added. On the assumption that AST will be
relatively unfamiliar I include some more information on it below.
I hope that some of the above points can provide food for thought during
the VOTable discussions.
regards,
Clive.
-----------------------------------------------------------------------------
David Berry (dsb at ast.man.ac.uk), the programmer responsible for
supporting the AST library, supplied the following information about it
and the relative merits of the FITS and AST approaches to handling WCS
information:
The AST library provides a comprehensive range of facilities for
attaching world coordinate systems to astronomical data, for retrieving
and interpreting that information and for generating graphical output
based on it.
AST uses an object-oriented approach to the problem of describing WCS
information. It provides a "toolbox" containing a wide variety of simple
self-describing components (transformations and coordinate system
descriptions) which may be "connected together" in any way the developer
chooses to form a "network" of coordinate systems with inter-connecting
transformations. The toolbox can be extended easily to include new
transformations and coordinate systems. This approach give developers
freedom to describe WCS information in the way most suited to their
instruments.
By comparison, the system for managing WCS information described in the
recently published FITS-WCS papers is a prescriptive system which
specifies precisely the steps which must be taken in transforming from
pixel coordinates to world coordinates. Very little freedom is allowed in
the overall form of the transformation. This approach developed as a
natural extension to the older "AIPS conventions" for storing WCS within
FITS headers. Indeed, consistency with the older conventions was one of
the stated goals for the new system.
The prescriptive approach of the FITS-WCS papers probably explains why it
has taken over ten years to produce sufficient consensus to allow the
papers to be published. The chequered and sometimes turbulent history of
the development of these papers amply illustrates the problem that different
groups have different requirements for WCS handling.
The long awaited publication of the FITS-WCS papers will not end this
problem. Already we have the "WCSDEP" extension to the papers proposed by
Steve Allan & Doug Mink. This highlights the fact that the requirements
for WCS do not remain static. The huge effort involved in reaching the
partial consensus required to publish papers I and II could easily result
in a dis-inclination to seek community-wide acceptance of future
developments to the published standard.
The desire to base the new FITS-WCS model on the older "AIPS conventions"
has produced a rigid, inflexible system for describing WCS, which could
rapidly be adorned with a collection of non-standard extensions as people
attempt to work around the restrictions of the model.
The advent of the Virtual Observatory concept gives us the chance to
break out of this prescriptive mould - to develop a WCS system which
gives the freedom to describe WCS in the way which seems most natural to
the developers, whilst still allowing WCS information to be passed freely
between applications.
The Starlink AST library demonstrates that such an approach can be made
to work.
Some documentation about AST is available on-line. There are the
following three ADASS presentations:
- `World Coordinate Systems as Objects' - ADASS 1998:
http://www.stecf.org/adass/adassVII/warrensmithr.html
- `Recent Developments to the AST Astrometry Library' - ADASS 1999:
http://www.adass.org/adass/proceedings/adass99/P2-45/
- `Providing Improved WCS Facilities Through the Starlink AST and NDF
Libraries' - ADASS 2000:
http://www.adass.org/adass/proceedings/adass00/P2-06/
There are also the AST manuals:
- SUN/211 (C interface):
http://www.starlink.rl.ac.uk/star/docs/sun211.htx/sun211.html
- SUN/210 (Fortran interface):
http://www.starlink.rl.ac.uk/star/docs/sun210.htx/sun210.html
-----------------------------------------------------------------------------
Clive Davenhall Institute for Astronomy,
e-mail (internet, JANET): acd @ roe.ac.uk Royal Observatory Edinburgh,
fax from within the UK: 0131-668-8416 Blackford Hill, Edinburgh,
fax from overseas: +44-131-668-8416 EH9 3HJ, Scotland.
More information about the votable
mailing list