head 1.1; access; symbols; locks http:1.1; strict; comment @# @; expand @b@; 1.1 date 2004.05.24.20.58.43; author ClivePage; state Exp; branches; next ; desc @none @ 1.1 log @XMATCH function @ text @ࡱ> ;:&( n/ 0DArialNew 0B 0DTimes New Roman0B 0G0.  @@n?" dd@@  @@@@`` `+ .R!#%&'()+, 0AA@@3ʚ;ʚ;g4KdKd B 0ppp@@ <4ddddLpC 0 80___PPT10 pp?  %!Cross-matching tVery important functionality  by combining datasets we often get new scientific results. In DBMS terms it needs a spatial join  the join criterion is the overlap of the error-regions. Error-region always small patch of sky, never just a point because of errors of measurement, extended objects, proper motions, etc. Shapes of error-regions vary: often elliptical, sometimes circular, occasionally more complex. Size depends on confidence level of the match  often expressed as say x% confidence, or y-sigma (the latter assumes some error distribution, e.g. Gaussian).<;Z>: Other cross-match complications ~Difference in epochs of catalogues means some objects will have moved  apply proper motions? Need epoch metadata. Different users will want different confidence levels and hence sizes of error regions Large region may produce too many false positives. May need to adjust confidence level in the light of experience. ``Ws`Ws? Current ADQL Syntax J & WHERE XMATCH(x, y, !z) > 3 AND & Some problems: XMATCH is not a quite a function: if a confidence level or N-sigma value is needed it should be one of the arguments. Better to express matching probability in X% confidence than as N-sigma, as former makes no assumptions about the functional form of the error distribution. No syntax for LEFT OUTER JOIN (return unmatched sources as well as matched ones).2Ae1t.   Specifying a join in SQL JTwo methods in the standards: SELECT * FROM t1, t2 WHERE t1.x = t2.y & SELECT * FROM t1 [LEFT OUTER] JOIN t2 ON t1.x = t2.y WHERE & The latter form is more verbose but allows a number of different types of join, the join criterion is explicit. Propose that we use latter, e.g. ON XMATCH(& )xO%N.O% N . %  Some cross-match implementations  BCross-match of two catalogues of ~N sources is an O(N) operation unless some form of indexing/sorting is used, which can reduce it to O(N log N) or better. Known algorithms Join using spatial index such as R-tree (Oracle, Sybase, MySQL, Postgres& ) or Grid-file (DB2) Join using pixel code and B-tree (more complex and slower, but feasible with just about any DBMS) Sort/sweep algorithm of Dave Abel and colleagues. Note: all these use bounding boxes drawn around the error ellipses (or whatever). Refinement stage weeds out the false matches. LZZZ5y  \5 y9    + Algorithm limitations With most algorithms, if the size of the error region changes it is necessary to generate new indices - slow. But: the hard part is done by the DBMS in reducing an O(N) problem to one of O(N log N) or better. Proposed solution: Always carry out the cross-match using largest error region that is scientifically justifiable (99.9% or 3) The user can then refine the crude selection using the relatively small table of results, rejecting sources too far apart for the actual error-regions and confidence level. In this case: can omit confidence level (or N-) value in the XMATCH function  apply only in refine stage. 8ZZ>  b +}  = +Selecting sources which have no counterpart+ Syntax is: XMATCH(x, !z) Standard RDBMS can do this using LEFT OUTER JOIN of x and z INNER JOIN of x and z Take difference between results of the last two. Or is there a simpler way? I think that it is at least as important to have a defined syntax for LEFT OUTER JOIN as knowing which sources have no counterpart is often scientifically important. Propose plus symbol, e.g. XMATCH(x+, y)N?ZbZZ?bH  g   -Cross-match can also find clusters of objects- Scientific examples: find clusters of stars, galaxies, objects affected by gravitational lensing. Method: cross-match catalogue with itself but with a much larger maximum offset than the error-regions. Problem: needs index generated using bounding boxes much larger than error-regions used for finding counterparts. May need additional function like XMATCH but with additional parameters e.g. for maximum offset.Z.Y  ?   ` ` ̙33` 333MMM` ff3333f` f` f` 3>?" dd@@ v?" dd@@  " @@ ` n?" dd@@   @@@@``PR    @@ ` ` p>> @@%(    6 " )<  T Click to edit Master title style! !$  00  "v  RClick to edit Master text styles Second level Third level Fourth level Fifth level!     S  0 "`w  _ 2003 May 24 0     0| "`   v*Clive Page 0    0 "`   b*0H  0޽h ? 80___PPT10.?@@ edi200304 (    6,/  {!Implementation of XMATCH function""0"! H  0޽h ? 80___PPT10.=?`4fh$  0$(  r  S  )<   r  S v  H  0޽h ? 80___PPT10.=?'$  P$(  r  S % )<   r  S %v  H  0޽h ? 80___PPT10.??H $  `$(  r  S * )<   r  S t+v  H  0޽h ? 80___PPT10.@@?0.i$  $(  r  S H )<   r  S Xv  H  0޽h ? 80___PPT10.@@0 /$  p$(  r  S |Ĩ )<   r  S v  H  0޽h ? 80___PPT10.A?p*$  $(  r  S b )<   r  S cv  H  0޽h ? 80___PPT10.B? Q}  $(  r  S Ԇ )<   r  S \dv  H  0޽h ? ___PPT10i.B?pB;+D=' = @@B +$  $(  r  S  )<   r  S sv  H  0޽h ? 80___PPT10.@@^Nr8 &p/136r:<> F8OAe{COh+'0P hp    Simple ADQL Enhancementsi Page A edi200304L  Page3010gMicrosoft PowerPointnts@@PAc@@`G??@@0_AdG&g  9  Y--$XX--'@@Times New Roman-. 2 5 2003 May 24 ."System -@@Times New Roman-. 2  Clive Page.-@@Times New Roman-. 92 x!Implementation of XMATCH function        .-l՜.+,0    mOn-screen ShowUniversity of LeicesterC@@   ArialTimes New Roman edi200304Slide 1Cross-matching Other cross-match complicationsCurrent ADQL SyntaxSpecifying a join in SQL!Some cross-match implementationsAlgorithm limitations,Selecting sources which have no counterpart.Cross-match can also find clusters of objects  Fonts UsedDesign Template Slide Titles _C Page Page  !#$%&'()+,-./013456789<Root EntrydO)Current User2SummaryInformation("PowerPoint Document(CDocumentSummaryInformation8*@