Beyond the draft proposal
agray at dcs.gla.ac.uk
Wed Feb 6 03:11:31 PST 2008
Brian Thomas wrote:
> Hi Alasdair, all,
> On Tuesday 05 February 2008, Alasdair Gray wrote:
>> Brian Thomas wrote:
>>> I'm aware they are different, however, I don't like messy things,
>>> and its not clear to me that a vocabulary is purely for human-machine
>>> interaction, as seems to be implied above. Perhaps it is, but I still
>>> don't understand why that makes the vocabulary necessiarily messy.
>>> Having a controlled, clean set of unique tokens, seems to me a very good
>> If the compound terms are going to be in common usage by the astronomers
>> who will ultimately be using the ivoa software that makes use of the
>> vocabularies then they need to be first class citizens in the vocabulary
>> and not derived from some grammar for combining terms.
> I have no argument with this, but are you sure that all compound terms
> are needed/useful? I suppose this is an eye of the beholder sort of thing,
> some of these compound terms look nasty to me.
The question then becomes one of where do you start drawing the lines,
which I guess is what this discussion is about.
>>> Do we really have to canvas every possible meaning, and way of
>>> expressing that meaning, into the vocabulary? Some terms seem to be
>>> of very limited utility. I point to the earlier example of having "volcano"
>>> included as a token/term.
>> It depends on how wide you want the coverage of your vocabulary to be.
>> If the idea of the IVOAT is to cover all terms then yes, they all need
>> to be in there. This does not preclude the setting up of smaller, more
>> focused vocabularies with clearly defined mappings to the IVOAT.
> Well, perhaps it is time to ask (and I suppose this is the sort of thing
> Frederick was getting at earlier), what is the purpose of the IVOAT?
> From my own point of view:
> As I have written earlier, we are in bad need of a list of standard tokens
> which identify astronomical objects, as well as instrumentation, and
> all the other concepts which are involved in doing Astronomy and online
> research. I don't know how to define the exact scope of the vocabulary better
> than that. Probably what I just listed could result in 60,000 terms if one
> is fairly pedantic, but I would hope it would be smaller than that..if only
> because it would take years to get a 60,000 word vocabulary assembled
> and agreed on...
I'm afraid this is one where I cannot be of assistance due to my lack of
domain knowledge. I am happy to help verify/validate the resulting
vocabulary in terms of skos compliance, etc.
>>>> And I think the result _should_ look much like the IAU original. My
>>>> impression of what was being aimed at in the IVOAT was a tidied up and
>>>> updated IAU93. Let's keep it simple and quick.
>>> Yes, well, we are beyond simple and quick now. To my mind that would
>>> have encompassed no more than technical editing (just enough to get
>>> the IVOAT into SKOS). But we have added terms and have (at last count)
>>> 4 vocabularies in total (are all of those going into the draft??). So its a
>>> matter of opinion that the process has been sufficiently limited.
>> The skos version of the IAUT should not alter its content at all.
>> However, the IVOAT should contain the concepts that are in use now.
>>> Soo.. you are in favor of including something beyond repeating the token
>>> name under skos:description?
>> I would say that the IAUT, A&A keywords and AOIM vocabularies will
>> unfortunately not contain very good definitions as the original source
>> vocabularies are lacking in this area. However, the IVOAT *should*,
>> actually *must*, contain definitions of all of the concepts, otherwise
>> the whole exercise is wasted as no-one will know the true meaning of the
>> concepts. Whether taking these from on-line dictionaries is the best
>> approach is open for debate.
> Well, machine assignment, as a starter, is a good thing. I didn't
> say we just let the machine assign stuff and forget about it. But
> in my experience, definitions form WordNet usually give you the right
> definition with no trouble at all. Absolutely, a human needs to validate all
> the entries, but its faster to have the machine generate most of the
> text and then a human checking (and editing as needed) rather than
> the human going it alone and typing it all in.
That sounds like a reasonable approach to me, particularly since the
current version just repeats the preferred label which I realise is easy
but is not helpful for those who do not have any domain knowledge.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the semantics