From Thomas.A.McGlynn at nasa.gov Fri Jul 1 04:02:28 2011
From: Thomas.A.McGlynn at nasa.gov (Tom McGlynn)
Date: Fri, 1 Jul 2011 07:02:28 -0400
Subject: TAP, automated site monitoring, and gzip encoding.
In-Reply-To: <042A3CD3-5D1E-4860-A012-87F550BA2E7F@manchester.ac.uk>
References: <4E0CDDFC.7050404@nasa.gov>
<042A3CD3-5D1E-4860-A012-87F550BA2E7F@manchester.ac.uk>
Message-ID: <4E0DA944.1060101@nasa.gov>
[I hit reply rather than reply all, so originally this went only to
Paul. TMcG.]
In my discussions with the security people, they made it clear that
both the query and the results were part of what made the transaction
suspicious. They explicitly stated that were the results gzip-encoded
we would not have triggered their alarms. One alternative that Mark
Taylor suggested in another message was to use binary-encoded
VOTables. GAVO uses these, so I've just begun to support reading them
myself, but I'd not seen them elsewhere. Is support for reading
these widespread? TOPCat handles them of course.
Tom
Paul Harrison wrote:
> On 2011-06 -30, at 21:35, Tom McGlynn wrote:
>
>> NASA sites are a prominent target for hackers and so Goddard uses automated tools that look for a variety of exploits including SQL injection attacks. Currently TAP schema queries can trigger these. While our security folks don't want to be too specific as to what the triggers are I believe that the combination of:
>>
>> Support of arbitrary SQL in the query
>> Lack of passwords
>> Results that look like table schemas (because they are)
>> Output in clear text
>>
>> play a major role in making things look suspicious. While they can turn off checking altogether that would mean that any real successful SQL injection attack could go undetected and we have lots of attempts every day.
>>
>> One solution that I had hoped might work was to use a GZIP transfer encoding (or content encoding) for the query results. Unfortunately it doesn't look like clients currently note the HTTP encoding headers.
>>
>> NASA is probably a bit more paranoid about this than some, but I suspect that this will become a more common issue as time goes on.
>> Support for content or transfer encoding is an HTTP level issue so I don't think it requires any change to the TAP standard, just clients that look for the appropriate HTTP headers. Would it be reasonable to request that clients support gzip encoding? In addition to address this security issue I suspect this would generally substantially decrease the size of downloaded data and make our queries more responsive.
>>
> Surely the appearance of SQL in the query is the what triggers the anti-hack filter - the results cannot be the cause as they are in VOTable and I would be very surprised if any anti-hacker tools know about VOTable....
> So I bet looking for some form of encoding for the query would be more effective in this case - however if it was any sort of standard encoding then the anti-hacker tool ought to be decoding it anyway if it is any good, so I think that would not work either...
>
> SQL injection attacks are a legitimate concern for the implementors of TAP servers too - don't pass the query in a raw unparsed state straight to your database in your TAP server...So I think that the TAP server implementations have to be the guardians in this case and the general anti-hack tool turned off for the TAP servers...
>
>
> Paul.
From Thomas.A.McGlynn at nasa.gov Fri Jul 1 07:25:24 2011
From: Thomas.A.McGlynn at nasa.gov (Tom McGlynn)
Date: Fri, 1 Jul 2011 10:25:24 -0400
Subject: Nulls in VOTables in TAP
Message-ID: <4E0DD8D4.8090503@nasa.gov>
My recent security issues have caused me to relook at some of the
formatting options for VOTables and in doing so I've become a bit
confused about how database nulls should be handled properly. It
doesn't look like any VOTable representation can do a proper job of
handling nulls as they appear in databases consistently with the
recommendations of the VOTable standard.
The TABLEDATA representation could do pretty well. It could in
principle represent nulls for most types by having empty text in the
appropriate TD element. This could work for all types except that it
cannot distinguish between 0 length arrays and null arrays. Most
databases allow for 0 length strings distinct from null strings so
that's a bit of an issue but we can probably live with it. However
the VOTable standard seems to suggest that using empty string values
is not supported for anything other than boolean and float/complex
data types. [The text is actually a bit confused here. E.g., at one
point (4.7) it suggests that booleans will require a value attribute
to specify a null, but later (6) on it describes how nulls should be
represented for that type and makes the empty cell the default way.]
E.g., if I have an 'int' field and represent the value of this field
in some row with just
| the interpretation of that value seems to
be undefined by the standard.
The VOTable standard also suggests conflating the ideas of null and
NaN for floating point values. If I have a 'double' field, then the
standard suggest that | should be interpreted as identical to
NaN | . These are very distinct in the database world but it
looks like this distinction may be lost when we return results using TAP.
In the BINARY and FITS serializations there is no natural way to
represent null values for any types. The only avenue is to use the
value/null attribute. The conflation of null and NaN numbers is
explicitly mandated.
For all representations there is a significant penalty for the short
integer types (bytes, shorts and ints), where collisions between null
values and actual occurrences of any reserved value are likely.
One solution for TAP services might be to promote integer types.
E.g., if I have a short in the underlying database I could represent
it as an int in TAP so that I can be assured of not having collisions
in the VOTable response.
However it's all pretty inelegant for me at least. Am I
misunderstanding something here? As far as I can tell neither the
ADQL nor TAP standards actually talk about null values (except that
TAP notes in some cases that certain metadata values are null) so the
VOTable standard is where the action is.
Regards,
Tom
From abrazier at astro.cornell.edu Fri Jul 1 08:28:44 2011
From: abrazier at astro.cornell.edu (Adam Brazier)
Date: Fri, 01 Jul 2011 11:28:44 -0400
Subject: TAP, automated site monitoring, and gzip encoding.
In-Reply-To: <4E0DA944.1060101@nasa.gov>
References: <4E0CDDFC.7050404@nasa.gov> <042A3CD3-5D1E-4860-A012-87F550BA2E7F@manchester.ac.uk>
<4E0DA944.1060101@nasa.gov>
Message-ID: <4E0DE7AC.4050107@astro.cornell.edu>
Surely the input is the bigger concern, though? Schema are useful
information for hackers, but if you're taking in SQL queries they must
already know some elements of the schema (although it could be
higher-level than base tables, like views). Even RESTful DB interfaces
require schema knowledge (although shouldn't vulnerable to SQL injection
and they'd be unlikely to ping automated security audits).
Seems to me that the sensible security response is to be able to
demonstrate that, even with these practices and concerns, the database
itself is safe from injection (indeed, we ought to both know that's the
case and ideally have documented it anyhow, as a matter of good practice).
Cheers
Adam
On 7/1/2011 7:02 AM, Tom McGlynn wrote:
> [I hit reply rather than reply all, so originally this went only to
> Paul. TMcG.]
>
> In my discussions with the security people, they made it clear that
> both the query and the results were part of what made the transaction
> suspicious. They explicitly stated that were the results gzip-encoded
> we would not have triggered their alarms. One alternative that Mark
> Taylor suggested in another message was to use binary-encoded
> VOTables. GAVO uses these, so I've just begun to support reading them
> myself, but I'd not seen them elsewhere. Is support for reading
> these widespread? TOPCat handles them of course.
>
> Tom
>
> Paul Harrison wrote:
>> On 2011-06 -30, at 21:35, Tom McGlynn wrote:
>>
>>> NASA sites are a prominent target for hackers and so Goddard uses
>>> automated tools that look for a variety of exploits including SQL
>>> injection attacks. Currently TAP schema queries can trigger these.
>>> While our security folks don't want to be too specific as to what
>>> the triggers are I believe that the combination of:
>>>
>>> Support of arbitrary SQL in the query
>>> Lack of passwords
>>> Results that look like table schemas (because they are)
>>> Output in clear text
>>>
>>> play a major role in making things look suspicious. While they can
>>> turn off checking altogether that would mean that any real
>>> successful SQL injection attack could go undetected and we have lots
>>> of attempts every day.
>>>
>>> One solution that I had hoped might work was to use a GZIP transfer
>>> encoding (or content encoding) for the query results. Unfortunately
>>> it doesn't look like clients currently note the HTTP encoding headers.
>>>
>>> NASA is probably a bit more paranoid about this than some, but I
>>> suspect that this will become a more common issue as time goes on.
>>> Support for content or transfer encoding is an HTTP level issue so I
>>> don't think it requires any change to the TAP standard, just clients
>>> that look for the appropriate HTTP headers. Would it be reasonable
>>> to request that clients support gzip encoding? In addition to
>>> address this security issue I suspect this would generally
>>> substantially decrease the size of downloaded data and make our
>>> queries more responsive.
>>>
>> Surely the appearance of SQL in the query is the what triggers the
>> anti-hack filter - the results cannot be the cause as they are in
>> VOTable and I would be very surprised if any anti-hacker tools know
>> about VOTable....
>> So I bet looking for some form of encoding for the query would be
>> more effective in this case - however if it was any sort of standard
>> encoding then the anti-hacker tool ought to be decoding it anyway if
>> it is any good, so I think that would not work either...
>>
>> SQL injection attacks are a legitimate concern for the implementors
>> of TAP servers too - don't pass the query in a raw unparsed state
>> straight to your database in your TAP server...So I think that the
>> TAP server implementations have to be the guardians in this case and
>> the general anti-hack tool turned off for the TAP servers...
>>
>>
>> Paul.
>
From m.b.taylor at bristol.ac.uk Fri Jul 1 08:40:10 2011
From: m.b.taylor at bristol.ac.uk (Mark Taylor)
Date: Fri, 1 Jul 2011 16:40:10 +0100 (BST)
Subject: Nulls in VOTables in TAP
In-Reply-To: <4E0DD8D4.8090503@nasa.gov>
References: <4E0DD8D4.8090503@nasa.gov>
Message-ID:
Tom,
yes an empty TD for integer types is not permitted in VOTable;
a null in an integer column can only be represented by use of the
VALUES/null attribute. And yes NaN and null are not distinguished
for floating point types.
VOTable was designed (I believe) as FITS-with-metadata rather than
serialized-database, and from this point of view those decisions
look sensible. So, you can map a column of numeric data from a VOTable
(or FITS table) into a simple array of primitive integer/floating values,
which makes storage in C/Fortran-like programming languages,
translation between TABLEDATA/BINARY/FITS VOTable formats, or
translation between FITS and VOTable straightforward. With a more
database-like value space these things would be more problematic.
In my personal opinion the conflation of NaN and null is not a
serious issue - I can't think of many astronomical processing
situations where the distinction would make much practical difference
(though I'm willing to be corrected). I do agree that having to come
up with an out-of-band value for nulls in nullable integer typed
columns makes life difficult for TAP services (and other generators
of VOTable, or FITS, tables), but you haven't misunderstood,
that's the way the VOTable standard is.
Mark
On Fri, 1 Jul 2011, Tom McGlynn wrote:
> My recent security issues have caused me to relook at some of the formatting
> options for VOTables and in doing so I've become a bit confused about how
> database nulls should be handled properly. It doesn't look like any VOTable
> representation can do a proper job of handling nulls as they appear in
> databases consistently with the recommendations of the VOTable standard.
>
> The TABLEDATA representation could do pretty well. It could in principle
> represent nulls for most types by having empty text in the appropriate TD
> element. This could work for all types except that it cannot distinguish
> between 0 length arrays and null arrays. Most databases allow for 0 length
> strings distinct from null strings so that's a bit of an issue but we can
> probably live with it. However the VOTable standard seems to suggest that
> using empty string values is not supported for anything other than boolean and
> float/complex data types. [The text is actually a bit confused here. E.g., at
> one point (4.7) it suggests that booleans will require a value attribute to
> specify a null, but later (6) on it describes how nulls should be represented
> for that type and makes the empty cell the default way.]
>
> E.g., if I have an 'int' field and represent the value of this field in some
> row with just | the interpretation of that value seems to be undefined by
> the standard.
>
> The VOTable standard also suggests conflating the ideas of null and NaN for
> floating point values. If I have a 'double' field, then the standard suggest
> that | should be interpreted as identical to
> NaN | . These are very distinct in the database world but it looks like
> this distinction may be lost when we return results using TAP.
>
> In the BINARY and FITS serializations there is no natural way to represent
> null values for any types. The only avenue is to use the value/null
> attribute. The conflation of null and NaN numbers is explicitly mandated.
>
> For all representations there is a significant penalty for the short integer
> types (bytes, shorts and ints), where collisions between null values and
> actual occurrences of any reserved value are likely.
>
> One solution for TAP services might be to promote integer types. E.g., if I
> have a short in the underlying database I could represent it as an int in TAP
> so that I can be assured of not having collisions in the VOTable response.
>
> However it's all pretty inelegant for me at least. Am I misunderstanding
> something here? As far as I can tell neither the ADQL nor TAP standards
> actually talk about null values (except that TAP notes in some cases that
> certain metadata values are null) so the VOTable standard is where the action
> is.
>
> Regards,
> Tom
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
From m.b.taylor at bristol.ac.uk Fri Jul 1 08:46:50 2011
From: m.b.taylor at bristol.ac.uk (Mark Taylor)
Date: Fri, 1 Jul 2011 16:46:50 +0100 (BST)
Subject: TAP, automated site monitoring, and gzip encoding.
In-Reply-To: <4E0DA944.1060101@nasa.gov>
References: <4E0CDDFC.7050404@nasa.gov>
<042A3CD3-5D1E-4860-A012-87F550BA2E7F@manchester.ac.uk>
<4E0DA944.1060101@nasa.gov>
Message-ID:
Regarding support of BINARY-encoded VOTables: any code which uses
STIL will read them just as happily as TABLEDATA-encoded ones.
Although BINARY-encoding was a bit slow to appear in VOTable
parsers, I have the impression that it's fairly widespread these
days, though it would be interesting to hear one way or the other
from the relevant developers. I would certainly expect a VOTable-aware
application or toolkit to provide this support, since it's a mandatory
part of the VOTable standard. One exception however is (presumably)
processing done using XSLT - this is unlikely to make sense of VOTable
data which is not presented as TRs and TDs.
Mark
On Fri, 1 Jul 2011, Tom McGlynn wrote:
> [I hit reply rather than reply all, so originally this went only to Paul.
> TMcG.]
>
> In my discussions with the security people, they made it clear that both the
> query and the results were part of what made the transaction suspicious. They
> explicitly stated that were the results gzip-encoded we would not have
> triggered their alarms. One alternative that Mark Taylor suggested in another
> message was to use binary-encoded VOTables. GAVO uses these, so I've just
> begun to support reading them myself, but I'd not seen them elsewhere. Is
> support for reading these widespread? TOPCat handles them of course.
>
> Tom
>
> Paul Harrison wrote:
> > On 2011-06 -30, at 21:35, Tom McGlynn wrote:
> >
> > > NASA sites are a prominent target for hackers and so Goddard uses
> > > automated tools that look for a variety of exploits including SQL
> > > injection attacks. Currently TAP schema queries can trigger these. While
> > > our security folks don't want to be too specific as to what the triggers
> > > are I believe that the combination of:
> > >
> > > Support of arbitrary SQL in the query
> > > Lack of passwords
> > > Results that look like table schemas (because they are)
> > > Output in clear text
> > >
> > > play a major role in making things look suspicious. While they can turn
> > > off checking altogether that would mean that any real successful SQL
> > > injection attack could go undetected and we have lots of attempts every
> > > day.
> > >
> > > One solution that I had hoped might work was to use a GZIP transfer
> > > encoding (or content encoding) for the query results. Unfortunately it
> > > doesn't look like clients currently note the HTTP encoding headers.
> > >
> > > NASA is probably a bit more paranoid about this than some, but I suspect
> > > that this will become a more common issue as time goes on.
> > > Support for content or transfer encoding is an HTTP level issue so I don't
> > > think it requires any change to the TAP standard, just clients that look
> > > for the appropriate HTTP headers. Would it be reasonable to request that
> > > clients support gzip encoding? In addition to address this security issue
> > > I suspect this would generally substantially decrease the size of
> > > downloaded data and make our queries more responsive.
> > >
> > Surely the appearance of SQL in the query is the what triggers the anti-hack
> > filter - the results cannot be the cause as they are in VOTable and I would
> > be very surprised if any anti-hacker tools know about VOTable....
> > So I bet looking for some form of encoding for the query would be more
> > effective in this case - however if it was any sort of standard encoding
> > then the anti-hacker tool ought to be decoding it anyway if it is any good,
> > so I think that would not work either...
> >
> > SQL injection attacks are a legitimate concern for the implementors of TAP
> > servers too - don't pass the query in a raw unparsed state straight to your
> > database in your TAP server...So I think that the TAP server implementations
> > have to be the guardians in this case and the general anti-hack tool turned
> > off for the TAP servers...
> >
> >
> > Paul.
>
>
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
From m.b.taylor at bristol.ac.uk Mon Jul 4 02:32:31 2011
From: m.b.taylor at bristol.ac.uk (Mark Taylor)
Date: Mon, 4 Jul 2011 10:32:31 +0100 (BST)
Subject: TAP, automated site monitoring, and gzip encoding.
In-Reply-To: <4E0CDDFC.7050404@nasa.gov>
References: <4E0CDDFC.7050404@nasa.gov>
Message-ID:
On Thu, 30 Jun 2011, Tom McGlynn wrote:
> One solution that I had hoped might work was to use a GZIP transfer encoding
> (or content encoding) for the query results. Unfortunately it doesn't look
> like clients currently note the HTTP encoding headers.
>
> NASA is probably a bit more paranoid about this than some, but I suspect that
> this will become a more common issue as time goes on.
> Support for content or transfer encoding is an HTTP level issue so I don't
> think it requires any change to the TAP standard, just clients that look for
> the appropriate HTTP headers. Would it be reasonable to request that clients
> support gzip encoding? In addition to address this security issue I suspect
> this would generally substantially decrease the size of downloaded data and
> make our queries more responsive.
>
> Tom McGlynn
FWIW, although TAP does not address this, the SSA standard
(PR-SSA-1.1-20110417) does discuss compression in section 7.3:
7.3 Data Compression
If the query parameter COMPRESS is present then the service may return
a compressed dataset, using some standard compression technique such
as gzip, in place of a normal dataset, without indicating this in the
query response. Basically the client is indicating that it is prepared
to receive either compressed or uncompressed datasets and does not
care which is delivered (the service should pick whichever is more
efficient). This should be distinguished from protocol-level compression,
which is transparent to the client, and may occur at the level of the
HTTP protocol if both client and server support HTTP protocol compression.
In case of an HTTP GET the keyword Content-Encoding informs the receiver
about the encoding of the output data, and should have a value such as
gzip. Note that the encoding is distinct from the MIME-type (Content-Type)
of the returned data object.
the tone seems to suggest that Content-Encoding is something that
clients might (but not MUST) be expected to do as a matter of course.
Probably DALI ought to say what the general assumption is for DAL
services about content- and/or transfer-encoding.
Mark
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
From francois at cdsarc.u-strasbg.fr Mon Jul 4 05:38:23 2011
From: francois at cdsarc.u-strasbg.fr (Francois Ochsenbein (ext.52429))
Date: Mon, 04 Jul 2011 14:38:23 +0200
Subject: Nulls in VOTables in TAP
In-Reply-To: <4E0DD8D4.8090503@nasa.gov>
References: <4E0DD8D4.8090503@nasa.gov>
Message-ID: <20110704123823.20FF625EA3@cdsarc.u-strasbg.fr>
Hi Tom,
I basically agree with all of Mark Taylor's answers:
* yes, VOTable was designed on the basis of FITS, not as
a DBMS subset -- NaN and a database 'null' are considered
as the same thing as it is in fits binary table; and
in the case of an array of floats/doubles in
seralization, a simple space can't work, hence the "NaN"
alternative of the empty | ...
* yes there is some confusion for the boolean, the FITS
document indicates only the possibilities T F and hexa 00
(but the hexa 00 can't be used for an array in the
seralization, problem similar to the NaN for doubles)
* for integers, no bit pattern exists for undefined value.
It is just "suggested" in the section 4.7 to use the value
-32768 for short integers.
In fact the lowest integer numbers are frequently used as the
bit pattern for "null" integers (the lowest integer numbers
are their own opposite); these numbers are:
-32768 (0x8000) for short int,
-2147483648 (0x80000000) for 32-bit integers,
-9223372036854775808 (0x8000000000000000) for longs
These values are those assigned by the gnu C compiler
(and fortran as far sa I know) in instructions like
i = x
if x is a double with NaN value and i is an integer.
Unfortunately, it seems that the java compiler does not use
the same convention, a Double.shortValue/intValue/longValue()
returns a value of zero as the corresponding integer of a
NaN double...
Cheers, francois
>
>My recent security issues have caused me to relook at some of the
>formatting options for VOTables and in doing so I've become a bit
>confused about how database nulls should be handled properly. It
>doesn't look like any VOTable representation can do a proper job of
>handling nulls as they appear in databases consistently with the
>recommendations of the VOTable standard.
>
>The TABLEDATA representation could do pretty well. It could in
>principle represent nulls for most types by having empty text in the
>appropriate TD element. This could work for all types except that it
>cannot distinguish between 0 length arrays and null arrays. Most
>databases allow for 0 length strings distinct from null strings so
>that's a bit of an issue but we can probably live with it. However
>the VOTable standard seems to suggest that using empty string values
>is not supported for anything other than boolean and float/complex
>data types. [The text is actually a bit confused here. E.g., at one
>point (4.7) it suggests that booleans will require a value attribute
>to specify a null, but later (6) on it describes how nulls should be
>represented for that type and makes the empty cell the default way.]
>
>E.g., if I have an 'int' field and represent the value of this field
>in some row with just | the interpretation of that value seems to
>be undefined by the standard.
>
>The VOTable standard also suggests conflating the ideas of null and
>NaN for floating point values. If I have a 'double' field, then the
>standard suggest that | should be interpreted as identical to
>NaN | . These are very distinct in the database world but it
>looks like this distinction may be lost when we return results using TAP.
>
>In the BINARY and FITS serializations there is no natural way to
>represent null values for any types. The only avenue is to use the
>value/null attribute. The conflation of null and NaN numbers is
>explicitly mandated.
>
>For all representations there is a significant penalty for the short
>integer types (bytes, shorts and ints), where collisions between null
>values and actual occurrences of any reserved value are likely.
>
>One solution for TAP services might be to promote integer types.
>E.g., if I have a short in the underlying database I could represent
>it as an int in TAP so that I can be assured of not having collisions
>in the VOTable response.
>
>However it's all pretty inelegant for me at least. Am I
>misunderstanding something here? As far as I can tell neither the
>ADQL nor TAP standards actually talk about null values (except that
>TAP notes in some cases that certain metadata values are null) so the
>VOTable standard is where the action is.
>
> Regards,
> Tom
=======================================================================
Francois Ochsenbein ------ Observatoire Astronomique de Strasbourg
11, rue de l'Universite 67000 STRASBOURG Phone: +33-(0)368 85 24 29
Email: francois at astro.u-strasbg.fr (France) Fax: +33-(0)368 85 24 17
=======================================================================
From dtody at nrao.edu Mon Jul 4 12:58:57 2011
From: dtody at nrao.edu (Douglas Tody)
Date: Mon, 4 Jul 2011 13:58:57 -0600 (MDT)
Subject: TAP, automated site monitoring, and gzip encoding.
In-Reply-To:
References: <4E0CDDFC.7050404@nasa.gov>
Message-ID:
Right - we distinguished between compression of the dataset itself and
compression as used in the transport protocol. HTTP already supports
the latter and ideally the client and server would both support stream
compression. But of course it is optional (where we really need this is
to speed up feeding large text VOTables back to the client). If
security is the main issue it might be better to require an
authenticated (HTTPS) connection. Or just limit the TAP implementation
and client connection to data which could not be compromised by any
amount of SQL trickery.
- Doug
On Mon, 4 Jul 2011, Mark Taylor wrote:
> On Thu, 30 Jun 2011, Tom McGlynn wrote:
>
>> One solution that I had hoped might work was to use a GZIP transfer encoding
>> (or content encoding) for the query results. Unfortunately it doesn't look
>> like clients currently note the HTTP encoding headers.
>>
>> NASA is probably a bit more paranoid about this than some, but I suspect that
>> this will become a more common issue as time goes on.
>> Support for content or transfer encoding is an HTTP level issue so I don't
>> think it requires any change to the TAP standard, just clients that look for
>> the appropriate HTTP headers. Would it be reasonable to request that clients
>> support gzip encoding? In addition to address this security issue I suspect
>> this would generally substantially decrease the size of downloaded data and
>> make our queries more responsive.
>>
>> Tom McGlynn
>
> FWIW, although TAP does not address this, the SSA standard
> (PR-SSA-1.1-20110417) does discuss compression in section 7.3:
>
> 7.3 Data Compression
>
> If the query parameter COMPRESS is present then the service may return
> a compressed dataset, using some standard compression technique such
> as gzip, in place of a normal dataset, without indicating this in the
> query response. Basically the client is indicating that it is prepared
> to receive either compressed or uncompressed datasets and does not
> care which is delivered (the service should pick whichever is more
> efficient). This should be distinguished from protocol-level compression,
> which is transparent to the client, and may occur at the level of the
> HTTP protocol if both client and server support HTTP protocol compression.
>
> In case of an HTTP GET the keyword Content-Encoding informs the receiver
> about the encoding of the output data, and should have a value such as
> gzip. Note that the encoding is distinct from the MIME-type (Content-Type)
> of the returned data object.
>
> the tone seems to suggest that Content-Encoding is something that
> clients might (but not MUST) be expected to do as a matter of course.
>
> Probably DALI ought to say what the general assumption is for DAL
> services about content- and/or transfer-encoding.
>
> Mark
>
> --
> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
>
From Thomas.A.McGlynn at nasa.gov Tue Jul 5 08:00:06 2011
From: Thomas.A.McGlynn at nasa.gov (Tom McGlynn)
Date: Tue, 5 Jul 2011 11:00:06 -0400
Subject: TAP, automated site monitoring, and gzip encoding.
In-Reply-To:
References: <4E0CDDFC.7050404@nasa.gov>
Message-ID: <4E1326F6.7080206@nasa.gov>
Just to bring people up to date...
The problem was that a standard TAP metadata query/response looked
like a SQL injection attack and triggered flags in GSFC security monitors.
We changed to using the STREAM/BINARY encoding in the VOTables used in
our TAP interface (suggested by Mark) and so far this seems to be
satisfying our security types. [I don't want to get into a discussion
of what should or should not trigger such flags.]
There was some discussion that we should simply assume that we are
going to have to be responsible for our own security and tell our
security monitors to ignore everything from our TAP server. Certainly
it's the case that we need to do our best to ensure that there are no
security holes in our TAP interface. If we had to do so, we could
have gone this way. However an independent layer of checking is
something I don't want to forego if I don't have to. There are lots
of hackers out there and I daresay many are smarter and certainly more
versed in the holes in our database's security than I. So I'm hopeful
that our format change will enable our security scanners to continue
monitoring our services without burdening them with large numbers of
false intrusion detections.
With regard to encoding.... I'd originally thought to use
transfer-encoding rather than content-encoding since my rather vague
understanding is that transfer-encoding is something that clients
aren't supposed to see, while content-encoding is not. However it's
not clear that gzip is really meant to be used as a transfer encoding
in any case. Transfer-encoding seems to be something envisaged for
chunked downloads.
I'm a little confused by Mark's quote from the SSA standard, since the
compress keyword seems to be duplicating the role of the
Accept-encoding header at the HTTP level. I'd agree that some overall
strategy that addresses all of the DAL interfaces would be desirable.
Personally I'd suggest that we recommend/require support for some
level of compression using the standard HTTP protocols and not add
anything to the DAL protocols themselves.
Tom
Douglas Tody wrote:
> Right - we distinguished between compression of the dataset itself and
> compression as used in the transport protocol. HTTP already supports
> the latter and ideally the client and server would both support stream
> compression. But of course it is optional (where we really need this is
> to speed up feeding large text VOTables back to the client). If
> security is the main issue it might be better to require an
> authenticated (HTTPS) connection. Or just limit the TAP implementation
> and client connection to data which could not be compromised by any
> amount of SQL trickery.
>
> - Doug
>
>
> On Mon, 4 Jul 2011, Mark Taylor wrote:
>
>> On Thu, 30 Jun 2011, Tom McGlynn wrote:
>>
>>> One solution that I had hoped might work was to use a GZIP transfer encoding
>>> (or content encoding) for the query results. Unfortunately it doesn't look
>>> like clients currently note the HTTP encoding headers.
>>>
>>> NASA is probably a bit more paranoid about this than some, but I suspect that
>>> this will become a more common issue as time goes on.
>>> Support for content or transfer encoding is an HTTP level issue so I don't
>>> think it requires any change to the TAP standard, just clients that look for
>>> the appropriate HTTP headers. Would it be reasonable to request that clients
>>> support gzip encoding? In addition to address this security issue I suspect
>>> this would generally substantially decrease the size of downloaded data and
>>> make our queries more responsive.
>>>
>>> Tom McGlynn
>>
>> FWIW, although TAP does not address this, the SSA standard
>> (PR-SSA-1.1-20110417) does discuss compression in section 7.3:
>>
>> 7.3 Data Compression
>>
>> If the query parameter COMPRESS is present then the service may return
>> a compressed dataset, using some standard compression technique such
>> as gzip, in place of a normal dataset, without indicating this in the
>> query response. Basically the client is indicating that it is prepared
>> to receive either compressed or uncompressed datasets and does not
>> care which is delivered (the service should pick whichever is more
>> efficient). This should be distinguished from protocol-level compression,
>> which is transparent to the client, and may occur at the level of the
>> HTTP protocol if both client and server support HTTP protocol compression.
>>
>> In case of an HTTP GET the keyword Content-Encoding informs the receiver
>> about the encoding of the output data, and should have a value such as
>> gzip. Note that the encoding is distinct from the MIME-type (Content-Type)
>> of the returned data object.
>>
>> the tone seems to suggest that Content-Encoding is something that
>> clients might (but not MUST) be expected to do as a matter of course.
>>
>> Probably DALI ought to say what the general assumption is for DAL
>> services about content- and/or transfer-encoding.
>>
>> Mark
>>
>> --
>> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
>> m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
>>
From abrazier at astro.cornell.edu Tue Jul 5 08:19:51 2011
From: abrazier at astro.cornell.edu (Adam Brazier)
Date: Tue, 05 Jul 2011 11:19:51 -0400
Subject: TAP, automated site monitoring, and gzip encoding.
In-Reply-To: <4E1326F6.7080206@nasa.gov>
References: <4E0CDDFC.7050404@nasa.gov>
<4E1326F6.7080206@nasa.gov>
Message-ID: <4E132B97.4010204@astro.cornell.edu>
Just to bring people up to date...
>
> The problem was that a standard TAP metadata query/response looked
> like a SQL injection attack and triggered flags in GSFC security
> monitors.
>
> We changed to using the STREAM/BINARY encoding in the VOTables used in
> our TAP interface (suggested by Mark) and so far this seems to be
> satisfying our security types. [I don't want to get into a discussion
> of what should or should not trigger such flags.]
>
> There was some discussion that we should simply assume that we are
> going to have to be responsible for our own security and tell our
> security monitors to ignore everything from our TAP server. Certainly
> it's the case that we need to do our best to ensure that there are no
> security holes in our TAP interface. If we had to do so, we could
> have gone this way. However an independent layer of checking is
> something I don't want to forego if I don't have to. There are lots
> of hackers out there and I daresay many are smarter and certainly more
> versed in the holes in our database's security than I. So I'm hopeful
> that our format change will enable our security scanners to continue
> monitoring our services without burdening them with large numbers of
> false intrusion detections.
>
I would stress that even if the alarms go away, the reason they're
happening now is that consuming raw SQL is generally a warning sign
(although not disastrous, particularly if permissions are properly set;
query-checking is built-in to some frameworks, too, although I've never
wanted to rely on it) so that making the warnings go away by hiding the
SQL is just about making the warnings go away (so that other benefits
from monitoring may be sustained).
Ideally, it seems to me, the local monitoring should be fine-grained
enough that we can tell it to stop protesting about SQL being ingested
whilst maintaining other monitoring activities. As SQL consumption is
*required* by us, we firstly can't really do anything about the initial
risks that presents while we should also ensure that we secure our
databases; if local IT security need convincing it's safe, then all to
the good, I'd say, as additional eyes are helpful in assessing security.
If it's just a case of getting all the monitoring or none, then I guess
that work-arounds might be required, but it feels a bit icky to me.
Cheers
Adam
>
> Douglas Tody wrote:
>> Right - we distinguished between compression of the dataset itself and
>> compression as used in the transport protocol. HTTP already supports
>> the latter and ideally the client and server would both support stream
>> compression. But of course it is optional (where we really need this is
>> to speed up feeding large text VOTables back to the client). If
>> security is the main issue it might be better to require an
>> authenticated (HTTPS) connection. Or just limit the TAP implementation
>> and client connection to data which could not be compromised by any
>> amount of SQL trickery.
>>
>> - Doug
>>
>>
>> On Mon, 4 Jul 2011, Mark Taylor wrote:
>>
>>> On Thu, 30 Jun 2011, Tom McGlynn wrote:
>>>
>>>> One solution that I had hoped might work was to use a GZIP transfer
>>>> encoding
>>>> (or content encoding) for the query results. Unfortunately it
>>>> doesn't look
>>>> like clients currently note the HTTP encoding headers.
>>>>
>>>> NASA is probably a bit more paranoid about this than some, but I
>>>> suspect that
>>>> this will become a more common issue as time goes on.
>>>> Support for content or transfer encoding is an HTTP level issue so
>>>> I don't
>>>> think it requires any change to the TAP standard, just clients that
>>>> look for
>>>> the appropriate HTTP headers. Would it be reasonable to request
>>>> that clients
>>>> support gzip encoding? In addition to address this security issue
>>>> I suspect
>>>> this would generally substantially decrease the size of downloaded
>>>> data and
>>>> make our queries more responsive.
>>>>
>>>> Tom McGlynn
>>>
>>> FWIW, although TAP does not address this, the SSA standard
>>> (PR-SSA-1.1-20110417) does discuss compression in section 7.3:
>>>
>>> 7.3 Data Compression
>>>
>>> If the query parameter COMPRESS is present then the service may
>>> return
>>> a compressed dataset, using some standard compression technique such
>>> as gzip, in place of a normal dataset, without indicating this in
>>> the
>>> query response. Basically the client is indicating that it is
>>> prepared
>>> to receive either compressed or uncompressed datasets and does not
>>> care which is delivered (the service should pick whichever is more
>>> efficient). This should be distinguished from protocol-level
>>> compression,
>>> which is transparent to the client, and may occur at the level of
>>> the
>>> HTTP protocol if both client and server support HTTP protocol
>>> compression.
>>>
>>> In case of an HTTP GET the keyword Content-Encoding informs the
>>> receiver
>>> about the encoding of the output data, and should have a value
>>> such as
>>> gzip. Note that the encoding is distinct from the MIME-type
>>> (Content-Type)
>>> of the returned data object.
>>>
>>> the tone seems to suggest that Content-Encoding is something that
>>> clients might (but not MUST) be expected to do as a matter of course.
>>>
>>> Probably DALI ought to say what the general assumption is for DAL
>>> services about content- and/or transfer-encoding.
>>>
>>> Mark
>>>
>>> --
>>> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
>>> m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
>>>
>
From dtody at nrao.edu Tue Jul 5 14:39:01 2011
From: dtody at nrao.edu (Douglas Tody)
Date: Tue, 5 Jul 2011 15:39:01 -0600 (MDT)
Subject: TAP, automated site monitoring, and gzip encoding.
In-Reply-To: <4E1326F6.7080206@nasa.gov>
References: <4E0CDDFC.7050404@nasa.gov>
<4E1326F6.7080206@nasa.gov>
Message-ID:
On Tue, 5 Jul 2011, Tom McGlynn wrote:
> Just to bring people up to date...
>
> The problem was that a standard TAP metadata query/response looked like a SQL
> injection attack and triggered flags in GSFC security monitors.
>
> We changed to using the STREAM/BINARY encoding in the VOTables used in our
> TAP interface (suggested by Mark) and so far this seems to be satisfying our
> security types. [I don't want to get into a discussion of what should or
> should not trigger such flags.]
Since probably few clients will be able to handle such as response this
is probably not the best solution, although technically legal I guess.
It would be better to address the real problem (smarter security
checking) as someone else suggested.
> I'm a little confused by Mark's quote from the SSA standard, since the
> compress keyword seems to be duplicating the role of the Accept-encoding
> header at the HTTP level. I'd agree that some overall strategy that
> addresses all of the DAL interfaces would be desirable. Personally I'd
> suggest that we recommend/require support for some level of compression using
> the standard HTTP protocols and not add anything to the DAL protocols
> themselves.
Note that we are discussing this very same issue again right now on the
DM list, in connection with ObsTAP. ObsTAP (ObsCore) describes the file
formats of archive data products, independently of the particular
transport protocol used for any subsequent data access, e.g., HTTP, FTP,
whatever. Support is included to describe the compression type if used
(for Rob's benefit this can include the astronomy-specific compression
schemes). The query capabilities of the DAL protocols used for data
discovery and description are higher level and have nothing to do with
any possible stream compression or accept-type file format capabilities
of the particular lower level transport protocol used.
- Doug
> Douglas Tody wrote:
>> Right - we distinguished between compression of the dataset itself and
>> compression as used in the transport protocol. HTTP already supports
>> the latter and ideally the client and server would both support stream
>> compression. But of course it is optional (where we really need this is
>> to speed up feeding large text VOTables back to the client). If
>> security is the main issue it might be better to require an
>> authenticated (HTTPS) connection. Or just limit the TAP implementation
>> and client connection to data which could not be compromised by any
>> amount of SQL trickery.
>>
>> - Doug
>>
>>
>> On Mon, 4 Jul 2011, Mark Taylor wrote:
>>
>>> On Thu, 30 Jun 2011, Tom McGlynn wrote:
>>>
>>>> One solution that I had hoped might work was to use a GZIP transfer
>>>> encoding
>>>> (or content encoding) for the query results. Unfortunately it doesn't
>>>> look
>>>> like clients currently note the HTTP encoding headers.
>>>>
>>>> NASA is probably a bit more paranoid about this than some, but I suspect
>>>> that
>>>> this will become a more common issue as time goes on.
>>>> Support for content or transfer encoding is an HTTP level issue so I
>>>> don't
>>>> think it requires any change to the TAP standard, just clients that look
>>>> for
>>>> the appropriate HTTP headers. Would it be reasonable to request that
>>>> clients
>>>> support gzip encoding? In addition to address this security issue I
>>>> suspect
>>>> this would generally substantially decrease the size of downloaded data
>>>> and
>>>> make our queries more responsive.
>>>>
>>>> Tom McGlynn
>>>
>>> FWIW, although TAP does not address this, the SSA standard
>>> (PR-SSA-1.1-20110417) does discuss compression in section 7.3:
>>>
>>> 7.3 Data Compression
>>>
>>> If the query parameter COMPRESS is present then the service may return
>>> a compressed dataset, using some standard compression technique such
>>> as gzip, in place of a normal dataset, without indicating this in the
>>> query response. Basically the client is indicating that it is prepared
>>> to receive either compressed or uncompressed datasets and does not
>>> care which is delivered (the service should pick whichever is more
>>> efficient). This should be distinguished from protocol-level
>>> compression,
>>> which is transparent to the client, and may occur at the level of the
>>> HTTP protocol if both client and server support HTTP protocol
>>> compression.
>>>
>>> In case of an HTTP GET the keyword Content-Encoding informs the
>>> receiver
>>> about the encoding of the output data, and should have a value such as
>>> gzip. Note that the encoding is distinct from the MIME-type
>>> (Content-Type)
>>> of the returned data object.
>>>
>>> the tone seems to suggest that Content-Encoding is something that
>>> clients might (but not MUST) be expected to do as a matter of course.
>>>
>>> Probably DALI ought to say what the general assumption is for DAL
>>> services about content- and/or transfer-encoding.
>>>
>>> Mark
>>>
>>> --
>>> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
>>> m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
>>>
>
From msdemlei at ari.uni-heidelberg.de Wed Jul 6 00:44:53 2011
From: msdemlei at ari.uni-heidelberg.de (Markus Demleitner)
Date: Wed, 6 Jul 2011 09:44:53 +0200
Subject: BINARY support (Was: site monitoring)
In-Reply-To:
References: <4E0CDDFC.7050404@nasa.gov>
<4E1326F6.7080206@nasa.gov>
Message-ID: <20110706074452.GB8569@ari.uni-heidelberg.de>
Hi,
On Tue, Jul 05, 2011 at 03:39:01PM -0600, Douglas Tody wrote:
> >We changed to using the STREAM/BINARY encoding in the VOTables used
> >in our TAP interface (suggested by Mark) and so far this seems to
> >be satisfying our security types. [I don't want to get into a
> >discussion of what should or should not trigger such flags.]
>
> Since probably few clients will be able to handle such as response this
> is probably not the best solution, although technically legal I guess.
> It would be better to address the real problem (smarter security
> checking) as someone else suggested.
Although I agree that all this has nothing to do with security in any
sense of the word: Doug, do you suggest that client support for
VOTables with binary streams is rare?
Since I'm delivering binary VOTables by default that would be
worrying for me (plus, I'm only aware of missing binary support in
old versions of VOPlot and now specview). Before I start checking
libraries and clients myself: Does anyone already have an idea what
clients or libraries don't support binary VOTables yet?
Cheers,
Markus
From m.b.taylor at bristol.ac.uk Wed Jul 6 10:24:22 2011
From: m.b.taylor at bristol.ac.uk (Mark Taylor)
Date: Wed, 6 Jul 2011 18:24:22 +0100 (BST)
Subject: taplint update
In-Reply-To:
References:
Message-ID:
Dear DAL,
following my release of taplint (TAP test suite) last week, at least
one person has pointed out there is an error in the logic of how
row overflows are checked. You can get a version that fixes that here:
ftp://andromeda.star.bris.ac.uk/pub/star/stilts/pre/
thanks
Mark
On Thu, 30 Jun 2011, Mark Taylor wrote:
> Dear all,
>
> There is a new release of STILTS, v2.3-1, available at the usual place:
>
> http://www.starlink.ac.uk/stilts/
>
> The main new item in this release is the taplint command, which
> is a test suite for TAP services. This is not expected to be of
> interest to most users, but those developing or running a TAP
> service might like to run it against their service to see what
> issues are identified (hence the DAL list crosspost).
>
> Documentation for taplint is at:
>
> http://www.starlink.ac.uk/stilts/sun256/taplint.html
>
> and invocation having downloaded the jar file is as straightforward as:
>
> java -jar stilts.jar taplint
>
> See the documentation for discussion of options to customise the
> output or tests performed.
>
> Taplint is not comprehensive and is somewhat experimental. It is
> possible that it reports some compliance errors for compliant services,
> and it certainly does not test everything. Although I don't have
> an unlimited amount of time to work on it, I would be happy to
> work with TAP developers to fix errors or possibly improve or add
> tests. All feedback is in any case welcome.
>
> A couple of bugfixes and minor functionality enhancements are also
> included in this release, and there is an accompanying release
> of STIL (v3.0-2).
>
> Mark
>
> --
> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
>
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
From dtody at nrao.edu Wed Jul 6 11:11:27 2011
From: dtody at nrao.edu (Douglas Tody)
Date: Wed, 6 Jul 2011 12:11:27 -0600 (MDT)
Subject: BINARY support (Was: site monitoring)
In-Reply-To: <20110706074452.GB8569@ari.uni-heidelberg.de>
References: <4E0CDDFC.7050404@nasa.gov>
<4E1326F6.7080206@nasa.gov>
<20110706074452.GB8569@ari.uni-heidelberg.de>
Message-ID:
Markus -
My impression has been that implementation of binary streams in VOTable
software is spotty and that the feature is rarely used. However I don't
have any hard data on the issue. Perhaps others can comment if they
have tried to use this feature or encountered such data.
- Doug
On Wed, 6 Jul 2011, Markus Demleitner wrote:
> Hi,
>
> On Tue, Jul 05, 2011 at 03:39:01PM -0600, Douglas Tody wrote:
>>> We changed to using the STREAM/BINARY encoding in the VOTables used
>>> in our TAP interface (suggested by Mark) and so far this seems to
>>> be satisfying our security types. [I don't want to get into a
>>> discussion of what should or should not trigger such flags.]
>>
>> Since probably few clients will be able to handle such as response this
>> is probably not the best solution, although technically legal I guess.
>> It would be better to address the real problem (smarter security
>> checking) as someone else suggested.
>
> Although I agree that all this has nothing to do with security in any
> sense of the word: Doug, do you suggest that client support for
> VOTables with binary streams is rare?
>
> Since I'm delivering binary VOTables by default that would be
> worrying for me (plus, I'm only aware of missing binary support in
> old versions of VOPlot and now specview). Before I start checking
> libraries and clients myself: Does anyone already have an idea what
> clients or libraries don't support binary VOTables yet?
>
> Cheers,
>
> Markus
>
From Thomas.A.McGlynn at nasa.gov Wed Jul 6 11:36:31 2011
From: Thomas.A.McGlynn at nasa.gov (Tom McGlynn)
Date: Wed, 6 Jul 2011 14:36:31 -0400
Subject: BINARY support (Was: site monitoring)
In-Reply-To:
References: <4E0CDDFC.7050404@nasa.gov> <4E1326F6.7080206@nasa.gov> <20110706074452.GB8569@ari.uni-heidelberg.de>
Message-ID: <4E14AB2F.9020605@nasa.gov>
The only place I had seen BINARY data used is from GAVO. However
since they have one of the few highly functional TAP services there is
some incentive to be able to read it. XSLT transformations are not
possible (or at least are very much harder) for this format than for
the TABLEDATA format, so VOView and its relations will have difficulty
with these data.
Tom
Douglas Tody wrote:
> Markus -
>
> My impression has been that implementation of binary streams in VOTable
> software is spotty and that the feature is rarely used. However I don't
> have any hard data on the issue. Perhaps others can comment if they
> have tried to use this feature or encountered such data.
>
> - Doug
>
>
> On Wed, 6 Jul 2011, Markus Demleitner wrote:
>
>> Hi,
>>
>> On Tue, Jul 05, 2011 at 03:39:01PM -0600, Douglas Tody wrote:
>>>> We changed to using the STREAM/BINARY encoding in the VOTables used
>>>> in our TAP interface (suggested by Mark) and so far this seems to
>>>> be satisfying our security types. [I don't want to get into a
>>>> discussion of what should or should not trigger such flags.]
>>>
>>> Since probably few clients will be able to handle such as response this
>>> is probably not the best solution, although technically legal I guess.
>>> It would be better to address the real problem (smarter security
>>> checking) as someone else suggested.
>>
>> Although I agree that all this has nothing to do with security in any
>> sense of the word: Doug, do you suggest that client support for
>> VOTables with binary streams is rare?
>>
>> Since I'm delivering binary VOTables by default that would be
>> worrying for me (plus, I'm only aware of missing binary support in
>> old versions of VOPlot and now specview). Before I start checking
>> libraries and clients myself: Does anyone already have an idea what
>> clients or libraries don't support binary VOTables yet?
>>
>> Cheers,
>>
>> Markus
>>
From m.b.taylor at bristol.ac.uk Mon Jul 11 10:12:25 2011
From: m.b.taylor at bristol.ac.uk (Mark Taylor)
Date: Mon, 11 Jul 2011 18:12:25 +0100 (BST)
Subject: vs:TAPType BOOLEAN?
Message-ID:
Dear Reg/DAL,
VODataService defines the TAPType type which is an enumeration of
possible values for the content of a
element, defining the data type of a table column (see the first
table in VODataService version 1.1 section 3.5.3).
One of the entries is in this list is "BOOLEAN".
However, TAP version 1.0 section 2.5, which lists data type mappings
for uploaded TAP tables, has no corresponding entry - the space
opposite the VOTable "boolean" type is labelled "Not supported".
Since there appears(?) to be no BOOLEAN type in ADQL, I suspect that
the inclusion of BOOLEAN in the VODataService TAPType list is in
error, but perhaps there's some other explanation.
Can any expert on VODataService (Ray?) or DAL (Pat?) comment?
thanks
Mark
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
From Thomas.A.McGlynn at nasa.gov Thu Jul 21 10:32:05 2011
From: Thomas.A.McGlynn at nasa.gov (Tom McGlynn)
Date: Thu, 21 Jul 2011 13:32:05 -0400
Subject: Further thoughts on nulls in VOTables.
Message-ID: <4E286295.50308@nasa.gov>
A brief discussion we had a few weeks ago with regard to nulls in
VOTables and TAP noted the origins of the current conventions which I
had been blissfully ignorant of. I remain concerned that there are
real issues that may arise as we try to support general table ingest
and queries using VOTables for the serialization of the data. The
issues I see are:
1. Real fields in VOTables cannot distinguish nulls and NaNs
2. The specification of nulls for strings and the mechanisms
for distinguishing null strings and 0 length strings are unclear.
3. Integer columns may be required to confuse actual data and nulls
4. Robust serialization of query results involving integers and
strings is incompatible with streaming results.
The attached document elaborates on these and I'd be interested in
others' thoughts.
Regards,
Tom McGlynn
-------------- next part --------------
A non-text attachment was scrubbed...
Name: NullsInVOTables.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 18726 bytes
Desc: not available
URL:
From m.b.taylor at bristol.ac.uk Fri Jul 29 10:05:51 2011
From: m.b.taylor at bristol.ac.uk (Mark Taylor)
Date: Fri, 29 Jul 2011 18:05:51 +0100 (BST)
Subject: Further thoughts on nulls in VOTables.
In-Reply-To: <4E286295.50308@nasa.gov>
References: <4E286295.50308@nasa.gov>
Message-ID:
Readers of the dal mailing list may be interested to know that Tom's
comments are copied and are being discussed on a page on the wiki:
http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/VOTableIssues
I have CC'd this message to the votable list, which didn't get a copy
of the original message, as well.
Mark
On Thu, 21 Jul 2011, Tom McGlynn wrote:
> A brief discussion we had a few weeks ago with regard to nulls in VOTables and
> TAP noted the origins of the current conventions which I had been blissfully
> ignorant of. I remain concerned that there are real issues that may arise as
> we try to support general table ingest and queries using VOTables for the
> serialization of the data. The issues I see are:
>
> 1. Real fields in VOTables cannot distinguish nulls and NaNs
> 2. The specification of nulls for strings and the mechanisms
> for distinguishing null strings and 0 length strings are unclear.
> 3. Integer columns may be required to confuse actual data and nulls
> 4. Robust serialization of query results involving integers and
> strings is incompatible with streaming results.
>
> The attached document elaborates on these and I'd be interested in others'
> thoughts.
>
> Regards,
> Tom McGlynn
>
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/