implementing REGION

Patrick Dowler patrick.dowler at nrc-cnrc.gc.ca
Thu Nov 23 12:40:25 PST 2006


Here is a brief description of how we implement geometry querying (REGION). We 
are using a clustered DB2 database for this (8 nodes, 16 partitions at the 
moment). 

Spatial querying is implemented using a feature of DB2 called Multi 
Dimensional Clustering (MDC), which is essentially a multi-dimensional 
clustered index. A simple example:

table positions (
-- the real columns:
	ra double precision not null,
	dec double precision not null,
-- in cartesian coordinates:
	x double precision not null,
	y double precision not null,
	z double precision not null,

-- generated columns: v+1 in [0,2] 
-- so iv =[0,2]*5 in [0,10] for ~1000 bins
        ix generated always as (smallint( ( x + 1 ) * 5 )),
        iy generated always as (smallint( ( y + 1 ) * 5 )),
        iz generated always as (smallint( ( z + 1 ) * 5 ))
)
organize by (ix, iy, iz)

Then we have to transform something like REGION(a, b, r, ICRS) via:

* transform a,b to tx,ty,tz (using same code/conventions that populated the 
table), and then put this into the SQL for fast, indexed bounding cube 
search:

WHERE x between tx-r and tx+r
   AND y between ty-r and ty+r
   AND z between tz-r and tz+r

DB2 knows that the binning columns are generated from x,y,z so it can use the 
binning columns to optimise the search. For points we can make refine the SQL 
to be a sphere intersecting the unit sphere (equivalent to a circle on the 
sky) by sticking some extra conditons into the SQL, such as:

  AND ((tx-x)*(tx-x) + (ty-y)*(ty-y)+ (tz-z)*(tz-z)) < r*r

or we can do some post-filtering in software to throw away some edge cases 
that get past the first cube conditions. In practice this is what we do for 
observations because instead of ra,dec we have a polygon, instead of x,y,z we 
have a bounding cube, and instead of the simple distance computation we have 
to do full geometric intersection, which I didn't want to put in the DB.

So, our approach is a combination of replacing REGION(...) with some other 
simple SQL conditions and some post-filtering of the results to remove 
(literally) edge cases. 

As a point of interest, this approach is taking advantage of the parallel 
query processing in our clustered DB2 system and the fact that using MDC 
(scan bins instead of whole table) makes the query performance dependent more 
on bandwidth (disk/memory/cpu throughout) rather than latency (disk seek 
time), which just doesn't scale all that well.... a high performance but 
totally brute force approach with minimal software complexity :-)

We also use MDC to "index" spectral and temporal intervals and the related 
intersection kind of conditions.

-- 

Patrick Dowler
Tel/Tél: (250) 363-6914                  | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre   | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada                  | Gouvernement du Canada
5071 West Saanich Road               | 5071, chemin West Saanich
Victoria, BC                                  | Victoria (C.-B.)



More information about the voql-teg mailing list