VOTable issues for possible discussion at the Inter-operability meeting.

Clive Davenhall acd at roe.ac.uk
Wed Apr 30 09:54:12 PDT 2003


29/4/03.

Dear Colleague,

Next week I shall be presenting a paper about the VOTable at the `XML
Europe 2003' conference in London (http://www.xmleurope.com/2003/about.asp).
I believe that a version of the paper will be available from the
conference Web site (in XML!) and I also hope to make it available in a
more common format.  In the meantime, the paper had a section on `Open
Issues' for the VOTable format, and I thought that these might make a
useful contribution to the VOTable discussions at the forthcoming
IVOA Inter-operability meeting in Cambridge.  The `open issues' that I
came up with are:

*  compatibility between VOTables and FITS,
*  is the VOTable likely to be used as a storage format, as well as an
   interchange format?
*  the efficient representation of large or binary tables,
*  should the <DEFANGED_LINK> element be included in the formal standard?
*  using the VOTable for tables other than catalogues of astronomical
   objects; particularly handling the coordinate systems used in Solar
   and Solar-Terrestrial Physics tables,
*  including hyper-text and mark-up in the <DESCRIPTION> tag,
*  defining semantic rather than syntactic standards.

These points are expanded on below.  The text is largely taken from the
paper, so it might read a little oddly as it was originally intended for
a non-astronomical audience.

*  The FITS format is widespread in astronomy and is likely to remain
   so.  Consequently, it is important to be able to continue to access
   FITS files.  The VOTable was deliberately designed so that, with a
   few minor exceptions that are unlikely to be important in practice, any
   FITS binary table can be converted to a VOTable without loss of
   information.  The converse, however, is not true.

   One respect in which the two formats differ is that the VOTable is a
   streaming format whereas FITS tables are not.  The reason is that in
   a FITS table a keyword specifying the number of rows in the table must
   be included in header information at the front of the table.
   Consequently a server transmitting a FITS table must have access to the
   complete table, so it can insert the number of rows in the header
   before it can start transmitting the table.  Conversely, the VOTable
   format contains no knowledge of the number of rows in a table.
   Consequently a server can start transmitting a VOTable whilst the
   query creating the selection from which the VOTable is generated is
   still in progress.  It remains to be seen whether this distinction
   proves important in practice.

*  The VOTable was invented as an interchange format for selections
   extracted from a catalogue and returned to a remote client via the
   Internet.  However, experience with other formats, most notably
   FITS but also TSV (Tab-Separated Value), indicates that VOTable will
   probably also be used as a storage format (FITS began as in interchange
   format; remember that the acronym stands for `Flexible Image Transport
   System').  Many data archives now use FITS as their storage format.

*  Many astronomical tables are large and the subsets extracted
   from them as a result of selections can also be large.  Representing a
   table as a sequence of characters using the <TABULAR> element increases
   its size enormously compared to a binary representation.  This increase
   in size is acceptable for small tables, but for large ones can be a
   serious problem.  Moreover, the problem appears irrespective of whether
   the VOTable is being used as a transport format (there are more bytes to
   move) or as a storage format (the files are larger).  The mechanisms
   invented in the VOTable to circumvent this problem, the <FITS> and
   <BINARY> tags, are something of a compromise and not really in the
   spirit of XML.  Also, if the table is stored as a separate file in FITS
   or binary format then there is always the possibility that the files
   will become separated and one or the other will be lost.  In the future
   we are likely to develop XML-based formats for retrieving `bulk data'
   such as images and spectra and in these cases having an efficient
   representation of binary numeric data is even more important than it is
   for catalogues.

*  An appendix to the VOTable standard described a <DEFANGED_LINK> element, which
   was not part of version 1.0 but which might be included in future
   revisions of the standard.  This element allows, amongst other uses,
   columns to be created (projected in the jargon of relational databases)
   on-the-fly when a table is read.  The <DEFANGED_LINK> tag is potentially a
   powerful feature and we need to decide whether it should be included in
   future versions of the format.

*  Although the VOTable is primarily intended for representing
   catalogues of astronomical objects, it should also be capable of
   representing other sorts of astronomical tables, such as atomic line
   lists, X-ray event lists and the tables encountered in related
   disciplines such as Solar Physics and Solar-Terrestrial Physics.  The
   VOTable is sufficiently flexible to handle these requirements, but for 
   Solar and Solar-Terrestrial work the additional coordinate systems used
   in these disciplines will need to be supported.  In version 1.0 the
   coordinate systems allowed are specified in the DTD, which makes
   adding new coordinate systems a major revision of the format.

*  Currently the <DESCRIPTION> element can contain only plain text.  Future
   versions should be able to include HTML, or a subset thereof, with
   hyper-links to external URLs.

*  The VOTable, FITS and similar formats such as TSV, are basically
   syntactic rather than semantic standards.  That is, they are mostly
   concerned with how to represent items of information and do not ascribe
   meaning to particular items of information.  That is, columns and
   parameters representing similar quantities will appear with different 
   names in different catalogues.  For the level of inter-operability
   envisaged for the VO, with automatic identification of catalogues
   relevant to some query, it will be necessary to assign standard
   quantities with agreed meanings to catalogues.  The CDS's UCDs for
   classifying columns are a very  important step in this regard.  At a
   higher level, astronomy also has a thesaurus of agreed terms (the
   `IAU Thesaurus').  There are similar problems with the units in which
   quantities are stored in catalogues.  Though there are recommended
   standard units, a wider range of un-standardised units are encountered
   in practice.  (These deficiencies have ramifications in the wider VO,
   beyond the VOTable.)

An additional issue, which I didn't mention in the paper, is how WCS
information is to be represented in VOTables (though I suppose that this
is a special case of defining semantic standards).  I think that there
are two possible approaches:

* simply adopt the FITS definitions `as is', but substituting the FITS
  keywords with VOTable PARAMS,

* take an approach similar to Starlink's AST library, which is less
  prescriptive than the FITS scheme and allows greater freedom and
  flexibility in defining the transformation between pixel positions and
  world coordinates.  AST describes a WCS as an arbitrary collection of
  coordinate frames with interconnecting mappings.  These frame and
  mapping objects can be imported and exported as a set of structured
  FITS header cards, or as standard FITS-WCS header cards, or in other
  forms such as XML.  AST currently supports many different celestial and
  spectral coordinate systems, including automatic conversion between
  related systems.  Similar support for a range of temporal coordinate
  systems is currently being added.  On the assumption that AST will be
  relatively unfamiliar I include some more information on it below.  

I hope that some of the above points can provide food for thought during
the VOTable discussions.

regards,
Clive.

-----------------------------------------------------------------------------

David Berry (dsb at ast.man.ac.uk), the programmer responsible for
supporting the AST library, supplied the following information about it
and the relative merits of the FITS and AST approaches to handling WCS
information:

  The AST library provides a comprehensive range of facilities for
  attaching world coordinate systems to astronomical data, for retrieving
  and interpreting that information and for generating graphical output
  based on it.

  AST uses an object-oriented approach to the problem of describing WCS
  information. It provides a "toolbox" containing a wide variety of simple
  self-describing components (transformations and coordinate system
  descriptions) which may be "connected together" in any way the developer
  chooses to form a "network" of coordinate systems with inter-connecting
  transformations. The toolbox can be extended easily to include new
  transformations and coordinate systems. This approach give developers
  freedom to describe WCS information in the way most suited to their
  instruments.

  By comparison, the system for managing WCS information described in the
  recently published FITS-WCS papers is a prescriptive system which
  specifies precisely the steps which must be taken in transforming from
  pixel coordinates to world coordinates. Very little freedom is allowed in
  the overall form of the transformation. This approach developed as a
  natural extension to the older "AIPS conventions" for storing WCS within
  FITS headers. Indeed, consistency with the older conventions was one of
  the stated goals for the new system.

  The prescriptive approach of the FITS-WCS papers probably explains why it
  has taken over ten years to produce sufficient consensus to allow the
  papers to be published. The chequered and sometimes turbulent history of
  the development of these papers amply illustrates the problem that different
  groups have different requirements for WCS handling.

  The long awaited publication of the FITS-WCS papers will not end this
  problem. Already we have the "WCSDEP" extension to the papers proposed by
  Steve Allan & Doug Mink. This highlights the fact that the requirements
  for WCS do not remain static. The huge effort involved in reaching the
  partial consensus required to publish papers I and II could easily result
  in a dis-inclination to seek community-wide acceptance of future
  developments to the published standard.

  The desire to base the new FITS-WCS model on the older "AIPS conventions"
  has produced a rigid, inflexible system for describing WCS, which could
  rapidly be adorned with a collection of non-standard extensions as people
  attempt to work around the restrictions of the model.

  The advent of the Virtual Observatory concept gives us the chance to
  break out of this prescriptive mould - to develop a WCS system which
  gives the freedom to describe WCS in the way which seems most natural to
  the developers, whilst still allowing WCS information to be passed freely
  between applications.

  The Starlink AST library demonstrates that such an approach can be made
  to work.

Some documentation about AST is available on-line.  There are the
following three ADASS presentations:

- `World Coordinate Systems as Objects' - ADASS 1998:
     http://www.stecf.org/adass/adassVII/warrensmithr.html

- `Recent Developments to the AST Astrometry Library' - ADASS 1999:
     http://www.adass.org/adass/proceedings/adass99/P2-45/ 

- `Providing Improved WCS Facilities Through the Starlink AST and NDF
   Libraries' - ADASS 2000:
     http://www.adass.org/adass/proceedings/adass00/P2-06/

There are also the AST manuals:

- SUN/211 (C interface):
     http://www.starlink.rl.ac.uk/star/docs/sun211.htx/sun211.html

- SUN/210 (Fortran interface):
     http://www.starlink.rl.ac.uk/star/docs/sun210.htx/sun210.html

-----------------------------------------------------------------------------
Clive Davenhall                                      Institute for Astronomy,
e-mail (internet, JANET): acd @ roe.ac.uk        Royal Observatory Edinburgh,
fax from within the UK:   0131-668-8416            Blackford Hill, Edinburgh,
fax from overseas:     +44-131-668-8416                    EH9 3HJ, Scotland.




More information about the votable mailing list