From francois at cdsarc.u-strasbg.fr Mon Jul 4 05:38:23 2011 From: francois at cdsarc.u-strasbg.fr (Francois Ochsenbein (ext.52429)) Date: Mon, 04 Jul 2011 14:38:23 +0200 Subject: Nulls in VOTables in TAP In-Reply-To: <4E0DD8D4.8090503@nasa.gov> References: <4E0DD8D4.8090503@nasa.gov> Message-ID: <20110704123823.20FF625EA3@cdsarc.u-strasbg.fr> Hi Tom, I basically agree with all of Mark Taylor's answers: * yes, VOTable was designed on the basis of FITS, not as a DBMS subset -- NaN and a database 'null' are considered as the same thing as it is in fits binary table; and in the case of an array of floats/doubles in seralization, a simple space can't work, hence the "NaN" alternative of the empty ... * yes there is some confusion for the boolean, the FITS document indicates only the possibilities T F and hexa 00 (but the hexa 00 can't be used for an array in the seralization, problem similar to the NaN for doubles) * for integers, no bit pattern exists for undefined value. It is just "suggested" in the section 4.7 to use the value -32768 for short integers. In fact the lowest integer numbers are frequently used as the bit pattern for "null" integers (the lowest integer numbers are their own opposite); these numbers are: -32768 (0x8000) for short int, -2147483648 (0x80000000) for 32-bit integers, -9223372036854775808 (0x8000000000000000) for longs These values are those assigned by the gnu C compiler (and fortran as far sa I know) in instructions like i = x if x is a double with NaN value and i is an integer. Unfortunately, it seems that the java compiler does not use the same convention, a Double.shortValue/intValue/longValue() returns a value of zero as the corresponding integer of a NaN double... Cheers, francois > >My recent security issues have caused me to relook at some of the >formatting options for VOTables and in doing so I've become a bit >confused about how database nulls should be handled properly. It >doesn't look like any VOTable representation can do a proper job of >handling nulls as they appear in databases consistently with the >recommendations of the VOTable standard. > >The TABLEDATA representation could do pretty well. It could in >principle represent nulls for most types by having empty text in the >appropriate TD element. This could work for all types except that it >cannot distinguish between 0 length arrays and null arrays. Most >databases allow for 0 length strings distinct from null strings so >that's a bit of an issue but we can probably live with it. However >the VOTable standard seems to suggest that using empty string values >is not supported for anything other than boolean and float/complex >data types. [The text is actually a bit confused here. E.g., at one >point (4.7) it suggests that booleans will require a value attribute >to specify a null, but later (6) on it describes how nulls should be >represented for that type and makes the empty cell the default way.] > >E.g., if I have an 'int' field and represent the value of this field >in some row with just the interpretation of that value seems to >be undefined by the standard. > >The VOTable standard also suggests conflating the ideas of null and >NaN for floating point values. If I have a 'double' field, then the >standard suggest that should be interpreted as identical to >NaN. These are very distinct in the database world but it >looks like this distinction may be lost when we return results using TAP. > >In the BINARY and FITS serializations there is no natural way to >represent null values for any types. The only avenue is to use the >value/null attribute. The conflation of null and NaN numbers is >explicitly mandated. > >For all representations there is a significant penalty for the short >integer types (bytes, shorts and ints), where collisions between null >values and actual occurrences of any reserved value are likely. > >One solution for TAP services might be to promote integer types. >E.g., if I have a short in the underlying database I could represent >it as an int in TAP so that I can be assured of not having collisions >in the VOTable response. > >However it's all pretty inelegant for me at least. Am I >misunderstanding something here? As far as I can tell neither the >ADQL nor TAP standards actually talk about null values (except that >TAP notes in some cases that certain metadata values are null) so the >VOTable standard is where the action is. > > Regards, > Tom ======================================================================= Francois Ochsenbein ------ Observatoire Astronomique de Strasbourg 11, rue de l'Universite 67000 STRASBOURG Phone: +33-(0)368 85 24 29 Email: francois at astro.u-strasbg.fr (France) Fax: +33-(0)368 85 24 17 ======================================================================= From m.b.taylor at bristol.ac.uk Fri Jul 29 10:05:51 2011 From: m.b.taylor at bristol.ac.uk (Mark Taylor) Date: Fri, 29 Jul 2011 18:05:51 +0100 (BST) Subject: Further thoughts on nulls in VOTables. In-Reply-To: <4E286295.50308@nasa.gov> References: <4E286295.50308@nasa.gov> Message-ID: Readers of the dal mailing list may be interested to know that Tom's comments are copied and are being discussed on a page on the wiki: http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/VOTableIssues I have CC'd this message to the votable list, which didn't get a copy of the original message, as well. Mark On Thu, 21 Jul 2011, Tom McGlynn wrote: > A brief discussion we had a few weeks ago with regard to nulls in VOTables and > TAP noted the origins of the current conventions which I had been blissfully > ignorant of. I remain concerned that there are real issues that may arise as > we try to support general table ingest and queries using VOTables for the > serialization of the data. The issues I see are: > > 1. Real fields in VOTables cannot distinguish nulls and NaNs > 2. The specification of nulls for strings and the mechanisms > for distinguishing null strings and 0 length strings are unclear. > 3. Integer columns may be required to confuse actual data and nulls > 4. Robust serialization of query results involving integers and > strings is incompatible with streaming results. > > The attached document elaborates on these and I'd be interested in others' > thoughts. > > Regards, > Tom McGlynn > -- Mark Taylor Astronomical Programmer Physics, Bristol University, UK m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/