UCD Status and Perscpectives
Anita Richards
amsr at jb.man.ac.uk
Mon Mar 17 08:03:55 PST 2003
Sebastien's 'UCD: Status and Perspectives' gives a very clear summary of
the recent discussions, responding on the basis of the considerable
experience of CDS. Thanks very much - I actually enjoyed reading it,
especially the ontology section. It suggests to me some issues which it
would help if we made decisions on, or at least reached a working
agreement for the immediate future. Maybe Sebastien (or anyone) could
suggest what is most urgently needed for discussion at the IVOA registry
meeting Wednesday 19 Mar - is anyone planning a presentation? This is what
occurs to me:
1) Registries needs UCDs.
'Status and Perspectives'
discusses relevant issues in '3. UCDs and data models'. We need
general UCDs which describe an entire data set ("Header metadata"),
e.g. for which wavelength regime (Radio, IR, Optical etc) - where
such currently UCDs exist, they are often parts or parents, rather
than ones which are instantiated. See e.g.
[[http://www.stsci.edu/~hanisch/NVO/ResourceServiceMetadataV6.doc][Bob
Hanisch ResourceServiceMetadataV6.doc]]
or
[[http://www.stsci.edu/~hanisch/NVO/ResourceServiceMetadataV6.pdf][ResourceServiceMetadataV6.pdf]]
and
[[http://wiki.astrogrid.org/bin/view/Astrogrid/RegistrySchema][AstroGrid
draft registry schema]]
We also need a means to associate header data with column labels,
e.g. if all data in one catalogue is at 1.4 GHz and another at
1.6 GHz this currently has two separate column UCDs,
PHOT_FLUX_RADIO_1.4G and PHOT_FLUX_RADIO_1.6G - dozens of UCDs in
all. Using the atom idea across the header and the columns, this
could be achieved using one column UCD PHOT_FLUX, and header UCDs for
the wavelength regime (radio) and the nominal frequency. This does
require that we evaluate the data corresponding to UCDs, see below.
2) Evaluating data identified by UCDs
Many people think of UCDs as primarily for selection of catalogues for
a human to then view the contents. This is their primary funtion as
far as the Registry is concerned, but for actually executing queries
we need to compare values within cagtalogues and possibly associate
the results with a new UCD (e.g. compare flux densities to derive a
colour).
Until recently, UCDs were only evaluated routinely (e.g. via Vizier)
in one of the most difficult cases - coordinate conversion.
Otherwise, the user said 'give me a list of catalogues containing
information about x' and the service converted x into UCDs and
returned a list of catalogues containing columns corresponding to x.
This is now extended to photometry but only for special formats (see
above or the AVO demo). We now need UCDs which can enable selection
via evaluating UCDs, e.g. if I want flux density measurements between
1.4 and 1.6 GHz I should be able to access data at 1.5 GHz too, via
the registry search seeing there is a catalogue containing radio flux
densities, and the query execution finding OBS_FREQUENCY 1.5 GHz - in
the header, or as a column heading for a collection of measurements at
a range of frequencies, as well as a column PHOT_FLUX.
This may be a problem where one UCD describes several columns
e.g. TIME_DATE for the start and stop of observations, but we want to
evaluate both e.g. for error bars on a proper motion measurement. Or
do we just need an algorithm which says 'if there are two TIME_DATEs
and they are different, take the difference'?
3. Who reads UCDs?
The document asks 'how do I find the proper UCD to decribe my data
set'. At present, this is done automatically if you are someone
contributing a 'simple' table e.g. out of a paper; the average
astronomer is not _forced_ to be exposed to UCDs, and a user doing a
search certainly isn't (although they might want to use them
directly). Data providers of major archives may need to investigate
them to check they are allocated correctly or suggest new ones, as do
VO workers. Thus they should be reasonably human-readable and
available for anyone, but not restricted to the lowest common
denominator of astronomical understanding.
We could increase the present number ten or even 100 fold (still
<10^5) - I am not saying we should, but we need to be afraid of it.
For example, if I am using the SED tool in Aladin, I really do not
want to know that an optical data set has UCDs for photometric zero
point and colour corrections (although an optical astronomer might and
I had to learn...) but fortunately the Aladin prototype SED tool knows
how to use these to plot magnitudes in Jy. Conversely an optical
astronomer does not want to know parameters associated with radio
visibility data but if s/he wants to extract an off-centre image at a
certain resolution from the MERLIN archive, they need the extraction
tool to know about baseline lengths, visibility integration times
etc. to chose a data set with an appropriate field of view and
resolution.
Thus, completeness of UCDs for their purpose is more important than
economy, and in fact the savings in going to an atomic structure would
probably mean we could add UCDs for specific errors, and remove the
degeneracy of things currently under 'TIME_DATE' or 'NUMBER' etc., without
making UCDs unmanagable. As the document says, the creation of new UCDs
should certainly be restricted to defined bodies (e.g. via a central
monitoring panel) to avoid duplication and maintain consistency at a
functional level. This probably means UCDs will proliferate slowly,
acquire some clumsy ones, and then be pruned or refactored over a cycle of
months or years.
Best wishes
Anita
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dr. Anita M. S. Richards, AVO Astronomer
MERLIN/VLBI National Facility, University of Manchester,
Jodrell Bank Observatory, Macclesfield, Cheshire SK11 9DL, U.K.
tel +44 (0)1477 572683 (direct); 571321 (switchboard); 571618 (fax).
More information about the ucd
mailing list