Format of tokens
agray at dcs.gla.ac.uk
Wed Nov 14 05:05:25 PST 2007
Hi Rick, All,
Again, comments preceded by an [AG].
From: owner-semantics at eso.org [mailto:owner-semantics at eso.org] On Behalf
Of Frederic V. Hessman
Sent: 14 November 2007 11:24
To: IVOA semantics
Subject: Re: Format of tokens
On 14 Nov 2007, at 11:24 am, Alasdair Gray wrote:
>> Number of TopConcepts: 1325
> I do not agree with this figure (see next comment).
>> Thus, you can't assume that the BT's and NT's are all present in
>> the original (trex.txt). Alasdair's figure of 512 top concepts
>> assumed that the IAU thesaurus was reasonably complete and self-
> I cannot claim to have looked closely at the BT/NT relationships in
> the original (trex.txt) file. However, the IAU thesaurus also
> issues a hierarchy file (hierlist.txt). This file gives the
> hierarchy of the original thesaurus and it is this that has 516 top
> level concepts. Rick has assumed that a top level concept is one
> that does not have a broader term. For the IVOAT I would agree with
> this as it should result in a less confusing hierarchy that matches
> users expectations. However, for the IAU93 this is wrong as it
> results in a different number of top level concepts (although I
> would have thought that it would have been less then 516 since some
> of these terms appear as narrower terms of other concepts) and thus
> a different hierarchy from the original version of the thesaurus.
Aha! I'm sure I'll leave this fine point to the experts, but I would
have thought that a "TopConcept" is one which is at the top of a
connection-hierarchy (after being chastened, I won't say
"ontological"). If there is a concept "gummi bears" but no "BT
candy" then the authors of the vocabulary have obviously left "candy"
out for some reason, making "gummi bears" pretty top-level to me.
[AG] In principle I agree that top level concepts should not have a
broader term. However, we are trying to accurately model the IAU 1993
thesaurus not correct it. Thus, the top level concepts should be those
that appear in the top level in their hierarchy list, not what we feel
should be a top level concept. We can correct this for the IVOAT :)
Or is my naivite showing? I assumed that hierlist.txt was simply
their best attempt back when all of this was much more painful (yes,
this project has now forced me to learn lots of python, as I
intended, but at least I'm not doing this on paper or with an Intel
286 under DOS).
[AG] They used a thesaurus management system (LEXICON) to generate their
files. Maybe I'm showing my naivety in trusting the output of this
software. (Just because it is old doesn't mean that it is wrong.) We are
merely trying to produce a mapping from the output generated by that
tool into the new SKOS format.
Dropping the TopConcept links to entries with no NT's is trivial - is
this the general consensus?
Alasdair J G Gray
Research Associate: Explicator Project
Computer Science, University of Glasgow
0141 330 6292
More information about the semantics