Decomposition and the Searchability of Q and Accuracy is important (Was: Re: [QUANTITY] Data Model for Quantity v0.5
brian.thomas at gsfc.nasa.gov
Tue May 11 21:50:28 PDT 2004
On Tuesday 11 May 2004 10:59 pm, Jonathan McDowell wrote:
> Brian continued the discussion on whether BasicQ should
> have Accuracy.
> bt: "This option doesn't work with the above needs 1 and 2"
> I think it works fine with 2, I think 1A is a red herring and
> I don't see why 1B is true (even if you want to do this
> decomposition, which I wouldn't, you can always represent an n=100
> CoreQ as 100 n=1 CoreQs, why is that worse than 100 BasicQs?).
CoreQ (CQ) doesn't do as well as basicQ (BQ) for 2 reasons:
1. Its another level up from basic Q that someone must implement.
Part of the rationale for decomposing all the way down to BQ is
that it allows those in the community which don't want to have anything
higher than that, to just have a BQ.
2. I don't think that CoreQ is as searchable. As (right now) the "values"
may be delimited by strings so you must engineer a different XPath statement in
order to extract the correct value you want to have. Even with tagged
values, you still have a rather complex XPath statement you have to make
and its going to be different from the one you would use on a BQ or SQ
(StandardQuantity). That last point may be the most important. Consider
that with no decomposition, that any searcher is now in a position to have
to apply one of 3 different XPath statements in order to search (value-based)
information from Q's of interest e.g. an XPath for BQ's, CQ's and SQ's each.
There may be further complications when the document contains a heterogeneous
mixture of each, and may require a tailored XPath to search the document
(haven't tried it, so that last point should be taken as a *possible* problem).
As searchability is a high level requirement for Q, I see this as a very important
> But perhaps the question is, as Pat and David suggest, whether we
> need to split BasicQ into 2 classes, one my way and one your way
> (AtomicQuantity, with no errors, and ScalarQuantity or ScienceQuantity,
> with errors?)
I'd say don't split them, as it will make decomposition, without information loss,
impossible (decomposing a higher dimensional Q into AtomicQuantity could
mean loosing all accuracy information for example).
And while I'm on this point..
I'd like to hear *why* people don't want errors/accuracy in Q's (??). So far, I
hear stuff that falls into 1 of 3 camps which I just don't see as valid reasons :
1. Its hard to implement (!).
2. I don't want to write code that has to check for accuracies (and don't want to
get null's or errors or empty values back).
3. Some data lack errors/accuracy.
I can't accept either of these as a reason to design out accuracy from Q's. The first
just isn't true. I've just finished writing most of the Java code to prototype the Q,
and accuracy was pretty trivial (since each accuracy object is itself a Q).
As for the second point: IF you don't want to check for accuracies, then you don't
have to worry about calling the method, no? If you design a Q package, its pretty
trivial to write code that returns nulls/NoAccuracyDefined/throws an error (choice
of which is still TBD by accuracy interface). Jeez, I can write the Java code to do this
in about 10 min. Its just not that insanely hard.
For the last point : these data lack accuracy because of one of the following reasons:
1. Laziness on the data creator 2. they are constants/definitions, or ??
I say the *appropriate* thing to do is to specify which of these cases apply. The default
might be to return a "NoAccuracyBecauseNotSpecified" with the other option of
"NoAccuracyImConstantOrDefinitaionValue". And there might be other situations beyond
that that need "no accuracy" definitions.
What is the cost of not having accuracy? Pretty steep by comparison : you don't have
scientific measurements. No errors, no science data. You also will lack the ability to
decompose the higher level Q's into basic Q's. The cost for this is that searchability
of the Q becomes more difficult (for reasons outlined above).
* Dr. Brian Thomas
* Dept of Astronomy/University of Maryland-College Park
* Code 695/Goddard Space Flight Center-NASA
* fax: (301) 286-1775
* phone: (301) 286-6128 [GSFC]
(301) 405-2312 [UMD]
More information about the dm