String representations of numeric values
dtody at nrao.edu
Thu Apr 20 12:14:39 PDT 2006
This is my view as well - coordinates should be specified in VOTable
in decimal degrees. VOTable is NOT a user interface. Any formatting
information in VOTable should be hints or suggestions to the UI for
display, and should not apply to the numerical value used within
ISO dates are another matter. There are many ways to represent dates
and a lot more is involved than just the formatting. Probably the
unit specification will handle this in most cases, or the data model
specification, if any, which specifies the specific date field.
Of course, if a client application which uses VOTable (or FITS, etc.)
wants to be clever and relax the stanndard a bit such as Mark describes,
that is ok, but this issue should not be confused with what the standard
specifies. (Given string input a good lexical analyzer can deal with
a range of numerical formats of course).
On Thu, 20 Apr 2006, Thomas McGlynn wrote:
> While Roy and Dave may have hinted at this, let me say very
> plainly that I think this is a bad idea. Coordinates in
> VOTables should be respresented in decimal degrees and
> any other usage should be strongly discouraged. Allowing
> other formats makes the job of software which is to
> read and write VOTables much harder and more error prone.
> User interfaces must support sexagesimal coordinates
> but they will need to be able to convert between
> decimal values and sexagesimal formats in any case so
> there is no savings there. The number of sexagesimal
> formats used in the community is large
> hh mm, hh mm.f, hh mm ss, hh:mm, hhHmm, hhHmm.f, hh:mm, ...
> and this brings up a whole issue of validation of the
> units. I.e., XML can validate that a number is a number
> but unless we do a lot of work it's going to be hard to
> validate XML documents that claim to have sexagesimal
> Bury it... Bury it deep!!
> P.S., for times there is an ISO standard and we should allow that.
> That may be supported in the XML standard but I cannot recall if that's
> Mark Taylor wrote:
>> Dear VOTablers,
>> I have an issue concerning the VOTable format. This has been touched on by
>> the group to some extent before, but no real consensus was reached
>> (at least, no action was taken).
>> - VOTable doesn't allow you to mark columns which are essentially
>> numeric but are formatted as strings (e.g. sexagesimal angles) as such
>> - For some purposes such a facility would be useful
>> - We should modify the VOTable standard accordingly
>> The Problem
>> It is common for VOTables to contain columns which represent numeric values
>> as strings (more precisely, as one-dimensional datatype="char" or
>> "unicodeChar" arrays). The most important cases in astromony are the
>> 1. Sexagesimal angle as hours:minutes:seconds (e.g. "23:04:46.5")
>> 2. Sexagesimal angle as degrees:minutes:seconds (e.g. "+15:12:19")
>> 3. Epoch as ISO-8601 (e.g. "2001-08-16T21:16:51.5")
>> there are other examples; an exhaustive listing is probably neither
>> possible nor desirable.
>> There is currently no formal way for a VOTable to describe how to map
>> these strings to numbers, which means that software can't reliably do
>> anything with them apart from display their values as strings.
>> Software would often like to be able to do something with them which
>> requires their numerical values, for instance use them as coordinates in a
>> plot, define ranges, interpolate between them, order values etc.
>> Various software hacks are possible to work around the current
>> situation and determine the intended numeric values from such
>> encoded string-valued columns, for instance:
>> - If the "units" attribute looks like "hms"/"dms"/"iso-8601" then
>> the corresponding format is assumed.
>> This contravenes the mandated use of the "units" attribute,
>> which the VOTable standard states must be composed as described
>> at http://vizier.u-strasbg.fr/doc/catstd-3.2.htx.
>> - If the "ucd" attribute looks like an RA/Dec/epoch then
>> hh:mm:ss/dd:mm:ss/iso-8601 is assumed.
>> UCDs are really about semantics not form, which makes this a
>> philosophically unattractive solution. There are related
>> practical problems in that you might have multiple choices of
>> representation for a given quantity - e.g. it would prevent you
>> from saying that a right ascension is represented as
>> degrees:minutes:seconds. Also, it's not all that easy to determine
>> programmatically from a UCD whether it is likely to
>> be represented in one string form or another.
>> - You can trawl through some or all of the data in the column - if all
>> the string values you look at appear to be valid sexagesimal/ISO8601
>> strings, then assume that's what they are.
>> Depending on your processing model, this is likely to be
>> inefficient (if you examine all data) or error-prone (if you
>> examine only some). Or maybe both, in the case of string
>> representations which don't have very distinctive formats.
>> These hacks, and possibly others, may have a fair chance of working in
>> practice, but as well as the individual problems listed above they operate
>> outside of the VOTable standard and hence rely on an informal understanding
>> between the VOTable provider and consumer, and different data/software
>> might encode/decode this information differently, or not at all.
>> I therefore think that we need some way of expressing in a
>> datatype="char"/"unicodeChar" FIELD element that a certain value
>> representation format is being used. As usual, the same applies to
>> Proposed Solution
>> I suggest one of the following:
>> P1. Introduce a new attribute "representation" (or some other name)
>> for FIELD/PARAM to contain a special string indicating how values in
>> that column are to be interpreted. The special values
>> "hms", "dms" and "iso8601" (any more?) would be initially noted
>> along with rules for what counts as valid instances of those
>> representations. Such "noting" could be in the VOTable standard
>> itself or in some more dynamic form like a wiki page, or both.
>> VOTable producers would also be free to introduce other values
>> for private use. Such introduced values might possibly be noted in
>> the standard at a later date if it's agreed they are useful.
>> P2. Modify the definition of the "units" attribute so that it is
>> permitted to contain either a catstd-format unit string as at
>> present, or a special representation string as in P1. We could
>> possibly say that for numeric-valued columns it works as at present,
>> but for string-valued columns it has the new sense.
>> However one could conceive of a case where you want both
>> representation and units (though I can't think of any actual
>> examples), which would be problematic for this scheme.
>> P1 is the cleanest since it avoids overloading one variable with two
>> meanings and the associated problems of inadvertently trying to interpret a
>> catstd-format unit string as a representation type and
>> vice versa (though in practice such collisions are not very likely).
>> P2 is basically a fudge which has the advantage that it requires no change
>> (apart from comments) to the schema, and moreover that there are VOTables
>> out there which are already using this solution. I generally favour P1,
>> though could perhaps be swayed to P2 by arguments of pragmatism.
>> Comments about this on the list are welcome; perhaps we can also discuss
>> and hopefully reach a decision in Victoria.
More information about the votable