is UCD out of control?

Roy Williams roy at cacr.caltech.edu
Wed Oct 22 18:02:10 PDT 2003


I must say that I find Tom's version of the UCD paper has a number of
definite improvements, such as the importance of Groups, with the child
inheriting UCD from its parent.

However, I find the suggested syntax confusing and muddying. It seems to be
going back to the old model of "base + other stuff" that we discussed in
Cambridge. What I do not understand is how a machine would parse the other
stuff, these modifiers and attribute properties and so on. I do not
understand which is a modifier and which is an attribute. The reason we went
with the new scheme was that we couldn't imagine writing code to disentangle
the Cambridge scheme.

In the 1.9.9 document, the first word of the UCD corresponds to the thing
that has the units. In "stat.variance; phys.length", we know that the unit
is L*L (its a variance). The second word was the concept to which this
relates.

Everything in UCD2 should be of the form "The <property> of the <concept>".

Forget the attempts to justify three words. Leave that for UCD3.

Every UCD has at most two words. Keep It Simple!

In the 1.9.9 document, we tried to keep as close as we can to the metadata
mines -- the 3000 tables of Vizier from which all this comes. We thought
that had more validity than somebody (anybody) sitting down and inventing
structure. Look at the problems we get when we move away from mining real
metadata: Tom thinks that "error" belongs in a tree called "measurement",
and the earlier version put it in a tree called "statistics". There is no
right or wrong here, just opinion. I pointed this out in the earlier
document concerning the "equinox" concept, but that has been deleted. We
must make every attempt to follow what 3000 published paper have done -- not
push our own opinions.

In Tom's paper, there seem to be lots of new attributes (value, vector,
multiplet, local, human, soft) that further stretch the scope UCD. If there
are multiple values in a table cell, then the VOTable will indicate this in
other ways. Perhaps Tom can put in a few more attibutes so we can find out
if the data quantitiy is a float or an integer? UCD is about *semantic
type*, not all this other stuff. What *real* tables use the "human" section?
Are humans base, attribute, or modifier?

I think we can all agree that UCD as currently formulated cannot express the
complexity inherent in its task. What is really needed is a well-thought RDF
vocabulary of predicates and objects, and that is the idea of UCD3. The
intention of UCD2 is to provide a stopgap that will be backward compatible
when UCD3 arrives. We use only one predicate for now "propertyOf". But Tom
has chosen to remove all the discussion of why and what we are doing, where
we are going, and driven instead down a road that tries to put a lot of
complexity into this string representation. The result is something terribly
complicated and not very understandable.

Of course the proof is in the pudding. As usual in the VO, we are making a
language that is very expressive, then hope to eventually write the code
that understands it. So let's think it through now. How do I construct code
that "understands" something like "phot.flux; em.optical; intent.calculated;
value". I want to know what kind of data structure can be created from this,
I want to know how to compare UCDs, I want to know how to convert a UCD into
a human-readable description of what it represents. I know how to do these
things with the 2-word property/concept style, but not with this grab-bag of
attributes and modifiers.

In conclusion is my IF ... ELSE clause:

IF {

we cannot find a killer app for UCD2, if we cannot write code to understand
them, we should stick with UCD1, that has been improved and groomed in the
last months. Then next year we can make UCD3.

} ELSE {

I like simplicity. I want to turn every table cell into "<property> of the
<concept>" so that every UCD2 would have at most two words.

}



--------
Caltech Center for Advanced Computing Research
roy at cacr.caltech.edu
626 395 3670



More information about the ucd mailing list