Column Groups in VOTable

Francois Ochsenbein francois at vizir.u-strasbg.fr
Mon Apr 28 11:51:09 PDT 2003


Dear All,

The forthcoming meeting in Cambridge could be a good opportunity
to discuss how  "column groups"  can be introduced in VOTable.
Such a functionality was already expressed (maybe not explicitely);
I feel that this it has also implication on UCDs, and I'm therefore
posting this message to both VOTable and UCD groups. I apologize to
those who will receive this message twice.

-- Francois
================================================================================
Francois Ochsenbein       ------       Observatoire Astronomique de Strasbourg
   11, rue de l'Universite F-67000 STRASBOURG       Phone: +33-(0)390 24 24 29
Email: francois at astro.u-strasbg.fr   (France)         Fax: +33-(0)390 24 24 17
================================================================================

================================================================================

The  "column groups"  proposition tries to answer questions frequently 
themail: Undefined variable
asked about column associations, typically:
--> error (or standard deviation) associated to a column, e.g. a flux
    consists of two numbers: the measured value + the mean error
--> qualities or weights associated to values
--> source or origin (e.g. telescope, or bibliographical reference) of a value
--> individual components e.g. x,y position of a CCD
--> etc...

This "column grouping" has obviously the same role as defining structures
made of columns; defining structures made of structures can also be viewed
as grouping groups of columns.

I see essentially two ways of defining such "column groups" in VOTable:

a) generalize the <COOSYS> method currently used to describe the coordinate 
   systems. This kind of "by reference" method defines a structure,
   and any <FIELD> can declare (via the "ref" attribute) to be a member of 
   that structure.
   As an illustration of a group of columns containing a flux value
   and its error, the XML code could look like:

   1. within the <DEFINITIONS> element, define a structure as e.g.:
      <STRUCTURE ID="Flux1" name="FluxParameters">
	<PARAMETER ID="Freq1" name="Frequency" value="8.6" datatype="float"
	  unit="GHz" ucd="OBS_FREQUENCY" />
      </STRUCTURE>
   
   2. within the <TABLE> definition, columns belonging to this structure
      refer to it:
      <FIELD name="flux" datatype="float" ref="Flux1" unit="mJy" />
      <FIELD name="e_flux" datatype="float" ref="Flux1" unit="mJy" />

b) introduce a new element e.g. <GROUP> in the <TABLE> description which 
   would contain the fields. The same example of a flux + its associated error 
   would be coded as:

   <GROUP name="Flux" ucd="PHOT_FLUX_RADIO_8.4G">
    <FIELD ID="Flux1" name="fluxValue" datatype="float" unit="mJy">
     <DESCRIPTION>Value of the flux at 8.4GHz</DESCRIPTION>
    </FIELD>
    <FIELD ID="e_Flux1" name="errFlux1" ucd="ERROR" datatype="float" unit="mJy">
     <DESCRIPTION>Error on flux value</DESCRIPTION>
    </FIELD>
    <PARAMETER ID="Freq1" name="Frequency" value="8.6" datatype="float"
      unit="GHz" ucd="OBS_FREQUENCY" />
   </GROUP>

There could be a third way which would introduce new tags within each
table element like e.g. <VAL> and <ERR> to give
  <TD><VAL>11.35</VAL><ERR>1.12</ERR></TD>
but it would be against the current philosophy of VOTable which defines
all metadata first, and is followed by the data alone, in order to
keep the efficiency and the FITS compatibility; this third method would 
also require frequent modifications of the schema (XMLSchema) -- generally
disturbing for working applications.

The <GROUP> defined in b) above seems to me to be a good framework for
this definition. I see several advantages:
=> the basic tabular scheme remains -- VOTable can still be viewed as a 
   relational database, and keeps a full compatibility with existing
   FITS binary tables;
=> groups of groups (i.e. recursive <GROUP> tags) enables a definition
   of arbitrary complex structures; 
=> the UCDs  become more accurate when defined in a group:
   -- adding <PARAMETER> tags within a <GROUP> nicely introduces a way
      of parametrizing a UCD
   -- FIELDs defined in a group can acquire the UCD of the group 
      e.g. the error part of the flux group of fields just need the
      "ERROR" UCD.

Using the "ref" attribute in <FIELD> also permits one column to be a 
member of several groups: for example, an error common to two fluxes 
measured at different frequencies can be defined as:

   <GROUP name="Flux" ucd="PHOT_FLUX_RADIO_8.4G">
    <FIELD ID="Flux1" name="fluxValue" datatype="float" unit="mJy">
     <DESCRIPTION>Value of the flux at 8.4GHz</DESCRIPTION>
    </FIELD>
    <FIELD ID="e_Flux1" name="errFlux1" ucd="ERROR" datatype="float" unit="mJy">
     <DESCRIPTION>Error on flux values, both at 8.4 and 7.5GHz</DESCRIPTION>
    </FIELD>
    <PARAMETER ID="Freq1" name="Frequency" value="8.6" datatype="float"
      unit="GHz" />
   </GROUP>
   <GROUP name="Flux" ucd="PHOT_FLUX_RADIO_7.5G">
     <FIELD ID="Flux2" name="fluxValue" datatype="float" unit="mJy">
     <DESCRIPTION>Value of the flux at 7.5GHz</DESCRIPTION>
    </FIELD>
    <FIELD ref="e_Flux1" />
    <PARAMETER ID="Freq2" name="Frequency" value="7.5" datatype="float"
      unit="GHz" />
   </GROUP>

================================================================================



More information about the votable mailing list