Vocabularies: next steps
Frederic V. Hessman
hessman at astro.physik.uni-goettingen.de
Tue Nov 27 03:04:50 PST 2007
>> Set: a-Z, 0-9
> You're quite right. I meant the concept URI: the concept fragment
> should I believe/agree, be drawn from [a-z0-9], though I wouldn't
> push very hard against [a-zA-Z0-9]. The prefLabel and altLabel
> fields should be Unicode.
> [AG] I would probably argue for [a-zA-Z0-9]
"a-Z,0-9" was meant to mean exactly this. By now, I think we can all
agree on this.
>>> The number of top concepts in the IAU thesaurus
>> Huh? The IAU thesaurus is the IAU thesaurus. If "top concepts"
>> are defined either as 1) not having a BT or 2) having a NT, then
>> the number is already fixed. Basta.
> [AG] I still feel that for the IAU93 Thesaurus we should adopt the
> of tokens given in the web version. However, I agree with Norman that
> the top concepts are there to aid the navigation and for no other
> reason. When it comes to the IVOAT, I would think that the top
> are those that do not have a BT.
For simplicity and consistency, I would argue that we define "top
concepts" as those not having a BT.
This should be part of the IVOA vocabulary guidelines, e.g. (here's
my first cut)
1. A single SKOS document defines the vocabulary and must be
publically available at some URI, preferably
at the central IVOA vocabulary repository http://www.ivoa.net/?????
at least as a copy.
2. A concept token has the form
where the token should consist only of the letters a-z, A-Z, and the
numbers 0-9. The URI root and vocabulary
name should be set centrally and not in the definition of each
token. For example, if a nominal concept is
(root="http://www.ivoa.net/Thesauri/", name="Food", token="Apple"),
then the SKOS definition begins with
3. One is encouraged to use human-readable forms for the tokens with
some obivous connection to
the preferred labels, e.g. conversion from the label via dropping
characters not included in the
above list and sub-token separation via capitalization (e.g. "My
favorite idea-label #42" ->
4. Vocabulary entries should be singular unless based on previously
determined sources where the
conversion to singular forms would impare the usefulness of the
5. Thesaurus entries (BT/NT/RT) are encouraged but not required.
6. If thesaurus entries are included, they should be complete (all BT
links are reflected in corresponding
NT links in the referenced entries).
7. "TopConcept" entries should normally be those not having a BT
reference but the maintainers of
a vocabulary can decide to restrict the choice of TopConcepts if
8. Use of standard SKOS documentation is encouraged but not required:
scopeNote to clarify usage
historyNote to identify when the vocabulary entry was created
changeNote to identify changes in already created entries
9. The maintainers of a vocabulary should provide on-line
documentation permitting the easy perusal of labels
and any thesaurus and usage information. The IVOA will try to
maintain a list of links to known vocabularies
and may choose to provide it's own consistent on-line documentation
based on the SKOS files alone.
10. The maintainers of a vocabulary should attempt to cross-reference
their vocabulary with one or more IVOA
supported vocabularies, e.g. UCD1 and/or IVOAT.
Anything else? Having just Ten Commandments would be nice.
>>> The grammatical number of the concept names (singular or plural)
>> Singular, please! - it's a real pain to use the formal system of
>> singular concepts and plural countables and I agree that singular
>> should make the vocabulary simple to use
> I think this is also a non-issue. If a term is plural in the
> vocabulary we're adapting (IAU93 and A&A use this convention) then it
> should remain plural in the SKOS version, otherwise we're making
> gratuitious changes; if it's singular in the original vocabulary
> (AOIM) then it should remain singular in SKOS, for the same reason.
> [AG] The issue raises its head when it comes to the IVOAT. However,
> since this is based on the IAU93 thesaurus we could, as I believe
> is the
> case, just adopt the IAU93 practice.
No, in fact I want to remove the plural terms from IVOAT as soon as
possible (I finally got to this point in my list of things to do).
External vocabularies like IAU93, AOIM and A&A are pre-defined and so
are what they are. With IVOAT, we can choose to have what we want.
>>> I wouldn't want to bet which of the vocabularies will end up the
> useful in the end...
Well, the whole purpose of IVOAT is to create something useful. If
we're already failing, please tell me so I can stop now...... :-(
>> Tricky question: we don't want to refer too much to IAU93,
>> because the suggestion will be that it's useful (which it really
>> isn't) and UCD1 really doesn't cover very many concepts contained
>> in the above vocabularies. Stationary targets like the first list
>> are admittedly much easier to do, but I've already started to
>> connect IVOAT and UCD1, which is a good exercise since they are
>> only partially matchable. IAU93 and IVOAT are so closely related -
>> even with the syntactic and content cleanups - that one could
>> automate that connection without too much trouble.
> I'm with you on the potential for trickiness. However, it might be
> simpler than this. Perhaps we should just declare as many
> correspondences as we can, and see if a reasoner agrees the result is
Sounds like a good idea to me: we stick in whatever we can manage and
see if anybody notices/benefits. This is why I would like to test
the UCD1<-->IVOAT connection so that one can ask questions like "I've
got an UCD1 label in my VOTable - is there an IVOAT entry which would
enable me to put it into a more general context?" or "I've got
something easily described by an IVOAT token - can I trivially put
this in a VOTable using some UCD1 label?". Andeas is interested in
getting the A&A vocabulary convertable to some other vocabulary to
show that, e.g., the MNRAS or ApJ vocabularies can be shown to be
equivalent at some stage - the question is only what intermediary
vocabularies are usable (we've been praying that IVOAT as the
replacement of the SV would be this medium, since the others are not
More information about the semantics