Comments on Canadian VO data model
jcm at head-cfa.cfa.harvard.edu
Tue Apr 22 10:40:10 PDT 2003
Canadian VO Data Model Comments - Jonathan McDowell
The Canadian VO have published details of the data model used to
describe images in their archive.
The relevant documents are at
This data model is used to describe images and potentially
spectra and other data products returned from the CVO (the voObs
object), and also to describe entries in the derived source
catalogs (the voSrc object which I do not review here).
I'm sending my comments to the whole list in the hope of prompting
the rest of you to look at their documents too.
I think there are a few changes to the voObs model which could
make it more general. The major comments I have are:
A) lack of uniformity on axes
B) lack of information on observables.
(A) First, the axes: Spatial, Temporal and Spectral. Each of these have a
lot of overlap but not completely; this seems unfortunate because
if you want to add another axis it's hard to generalize.
Specifically, the relevant attributes are:
Spatial Temporal Spectral
Shape _bounds_eq [deg] NONE NONE
Bounds NONE _bounds [s?] _bounds [A]
Sample _sample [deg/bin] _sample [s?/bin] _sample [A/bin]
Bins NONE _bins [bin] _bins [bin]
Fill _fill _fill NONE
Res. _resolution [deg] NONE _resolution [A]
Nyquist _Nyquist NONE _Nyquist
Span _span [deg?] _span [s?] _span [A]
A.1 Spatial bounds are given as polygon nodes in J2000, and
repeated as galactic and ecliptic. See notes on regions and bounds below.
The choice of polygon nodes as the description of 2D
regions is a fair one for the application in question, but doesn't generalize
well to other VO uses. Eventually one should support a general VO region
(which can include a circle, for instance, not supported here).
I would argue that it would be nice to have 'bounds' mean the extreme
bounds of each coordinate, as it does for the other axes.
As described, the spatial bounds can be a complicated polygon giving
the exact shape of the detector, but the temporal bounds are a simple
range giving the outer hull of the temporal window function.
It is useful to have this outer bounds to answer the question 'might this
dataset contain stuff of interest'. The detailed shape (detector polygon,
temporal start and stop intervals) is needed when you get to actually
analysing the data; the next step up is the sensitivity map and effective
exposure depth versus time. The detailed information should
accompany the data when it is retrieved, but arguably may not be needed
at the index layer that this data model seems to represent.
A.2 Why no spatial_bins ? This seems a critical piece of info
(e.g. 1024 x 1024 image, or 1x1-spatial-pixel spectrum...)
A.3 Why no spectral_fill? Not needed very often, but consistency is
I'm not fully convinced fill is that useful a value, since usually what
you want is really to take a variable QE across the detector axis
into account, rather than just an on/off - although I guess in the temporal
case a simple fill number is often useful.
A.4 Why no temporal resolution or Nyquist?
For old, historical observations the accuracy of the recorded
observing time may be poor (I've seen data in the literature,
which one could imagine scanning back in, where the observational
date is only known to a year or so. Bad, bad referee.)
A.5 It seems a bit labored to have Nyquist as a separate attribute
(rather than method) since it is simply the ratio of two other attributes.
The "content properties" attributes give derived properties of an image
that are really the summary of a derived catalog for that image.
But the huge thing that seems to be missing here is a description of
what the pixel values in the data actually represent - I think the
implied assumption of your model is that they are flux values in Jansky
(or if you prefer, Janskys, but please, not "Jansky's" :-)), or something
that can be converted to that.
Even within this assumption, I think there's crucial information that
could be added:
- actual units of image
- is the photometry absolutely calibrated, or not?
- is it linear, or in magnitudes (instrumental or standard)
- other indications of photometric quality
- saturation level
But I think one should allow for the possibility that what is in the
data is not sky intensity but some other quantity:
- spatial image of spectral index (or B-V color)
- spatial image of ISM extinction, or Faraday rotation measure
- spatial image of CMB dT/T anisotropy
- extinction versus wavelength
- integrated line flux versus time
- radial velocity versus time
- observatory humidity versus time
So I would propose
observable_quantity: String [REQUIRED] The quantity represented by the pixel values.
The usual value is "SKY FLUX DENSITY".
observable_unit: String [REQUIRED] The unit of above, e.g. "Jy", "count",
As for the content properties:
I'm intrigued by the choice of S/N = 10 for your point source
reference. I would have thought that S/N = 3 might be more helpful
for people who are interested in 'is there a chance my source might be there?'
which I think is the most common question.
Again, one can generalize on axes. The number density things
are crying out for generalization: How about
Negative spatial features may also be worth counting since they
may indicate localized absorption or incorrect background
More information about the dm