[TAP] case sensitivity of query
gerard.lemson at mpe.mpg.de
Wed Mar 4 01:35:37 PST 2009
> The real issue here is that for a case-sensitive database to
> handle a case insensitive query in general, the SQL must be
> completely parsed and all references to columns and tables
> analyzed and compared so that the case of the input can be
> adjusted to whatever the native case is. Thus instead of
> being able to defer the parsing of the SQL largely to the
> database (perhaps with some filtering of the REGION
> constraint), the server code must completely analyze it. No
> thin layer on top of existing frameworks here. This is
> independent of whether there are multiple fields
> distinguished only by case. If the database happens to
> contain such fields, then any query which references them is
> ambiguous and probably would have to be rejected by a fully
> conformant implementation.
> Solution 1, basically says that case sensitive databases may
> reject queries where the 'wrong' case is used for inputs.
> The idea is that users will build the query based upon
> metadata supplied by the database which should supply values
> in the proper case. So long as users don't change the names
> they should be OK.
> Solution 2 is the fully conformant implementation discussed above.
> Speaking only for the HEASARC, we're not likely to implement
> anything like that on the soon. However we may move to a
> Postgres database sometime next year, so the point may become
> moot for us.
I agree with you here that 2 is cumbersome.
There is another solution which I'll number 0 (because this is what I
suggested in my comments):
Allow both case-sensitive and case-insensitive TAP services, but add
a metadata keyword (CASE_SENSITIVE=TRUE/FALSE) to the TAP capabilities that
indicates which choice is made.
Case sensitive services are allowed to throw an error if table/column names
do not use the case that is provided by the metadata queries.
The checking of this can in general be left to the database engine itself.
> D. Gerard brings up the case sensitivity of data within the database.
> Here I believe the SQL standard is that the '=' operator is
> case sensitive. It's also true in practice in the Postgres,
> MySQL and Sybase implementations I've tried. I believe that
> the 'LIKE' operator is supposed to be case-insensitive.
> While I agree with Gerard that it is often the case that the
> query we want to provide to the user is case insensitive, I
> don't think that making the '=' operator work that way is
> correct. E.g., joins based on exact match are much faster
> than those based upon a case-insensitive match. Maybe keys
> are build case-insensitively when '=' is not case sensitive.
In SQLServer one can define per string column the "collation" using the
COLLATE keyword. The choice influences alphabet, case sensitivity, ordering
Choosing a case-insensitive collation does not influence comparisons.
> In practice I don't think this will matter too much. Once
> users can get
> to the databases, they'll learn how to filter queries as they wish.
I don't think that is quite so easy.
Especially for "wordy" data models, where one wants to explore contents,
case insensitive queries are useful.
I am thinking of metadata models like SimDB.
> Perhaps there should be some metadata which indicates the
> case-sensitivity of the '=' operator.
This is comparable but not quite equivalent to what I proposed.
I suggested that per CHAR/VARCHAR(/text?)column we could define whether it
is case sensitive or not.
It would add a column to the TAP_SCHEMA.columns table.
More information about the dal