International

    Virtual

    Observatory

Alliance

 

 

 

MOC – HEALPix Multi-Order Coverage map

 

Version 1.0

IVOA Note  12th April 2012

 

 

Previous version(s):

            None

           

Authors:

            Thomas Boch

Tom Donaldson
Pierre Fernique

Wil O'Mullane

Martin Reinecke

Mark Taylor

 

Editor:

            Pierre Fernique       

 

 

 

Abstract

 

This note describes a simple and powerful method to specify sky regions. The goal is to have a way  for providing very fast comparisons and data access methods. The principle is based on HEALPix sky tessellation. It boils down to defining a list of sky cells, grouped hierarchically. This method has been tested and validated in two tools: Aladin and Topcat, and in a front-end tool for VizieR catalogs.

 

 

 

Status of This Document

This is an IVOA Note. This is the first release of this document.

 

This is an IVOA Note expressing suggestions from and opinions of the authors. It

is intended to share best practices, possible approaches, or other perspectives

on interoperability with the Virtual Observatory. The document describes a standard created by IVOA partners and is not a formal IVOA standard.

A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/.

Acknowledgements

The authors would like to thank all HEALPix developers for their great help in this study.

 

Contents

 


Table of Contents


 Acknowledgements.................................................................................................... 2

1 Introduction................................................................................................................ 3

1.1 The idea................................................................................................................. 3

1.2 Examples............................................................................................................... 4

1.3 HEALPix tessellation........................................................................................... 5

1.3.1 Equal areas....................................................................................................... 5

1.3.2 Computing time................................................................................................. 5

1.3.3 Accuracy........................................................................................................... 5

1.3.4 Standard............................................................................................................ 5

1.4 Examples of use cases....................................................................................... 5

1.4.1 Visualize the coverage...................................................................................... 6

1.4.2 Compare the coverage of multiple data sets.................................................... 6

1.4.3 Remote cross-match (TOPCat use case)........................................................ 6

1.4.4 Multi-service positional search.......................................................................... 6

1.5 Scope and performance..................................................................................... 7

1.5.1 Scope of usage................................................................................................. 7

1.5.2 MOC resolution................................................................................................. 7

1.5.3 Size of MOC..................................................................................................... 8

1.5.4 MOC performance............................................................................................ 8

2 MOC encoding........................................................................................................... 9

2.1 MOC specificity..................................................................................................... 9

2.1.1 Celestial reference............................................................................................ 9

2.1.2 NESTED numbering......................................................................................... 9

2.1.3 Well-formed  MOC encoding.......................................................................... 10

2.1.4 Sky coordinates and HEALPix cells............................................................... 10

2.2 Two MOC encoding formats............................................................................ 10

2.2.1 JSON MOC.................................................................................................... 11

2.2.2 FITS MOC....................................................................................................... 11

2.3 Existing implementations and libraries......................................................... 12

2.4 Links to VO registry............................................................................................ 12

 References................................................................................................................. 13

 Appendix A – MOC size vs MOC resolution......................................................... 15


 

1      Introduction

Many VO protocols are based on simple cone queries (CONE-SEARCH, SIA, SSA ...). This is mainly due to the fact that most astronomical databases have been built  for answering such requests efficiently. But when the queries concern more complex sky regions (rectangles, polygons, ...) the methods used are diverse, difficult to compare or not interoperable, often complex and usually slow. This fact inevitably hampers the development of VO tools for global comparisons, such as generic catalogs cross-match, computation of data set intersections, etc.. In fact, the VO has no efficient method to correlate two data sets as soon as they are not integrated in the same data base.


The method we describe in this paper attempts to influence this situation by providing  a basis for unique, simple and effective comparison of sky coverage data or “footprints”.
We called it ′Multi-Order Coverage Map′ or ′MOC′. This method has been tested and validated in two of the main tools of the VO: Aladin and Topcat. It has been used to describe all available data sets provided by CDS (all VizieR tables and Simbad). Such a system shared by several data centers, or even being accepted as a standard at the IVOA level, would significantly improve in the short term a lot of VO tools.

 

1.1    The idea

The goal is to get a spherical geometry system for sky regions providing very fast and accurate union, interection and equality operations. For this, the key idea is to base such a system on a regular and hierarchical partitioning of the sphere.


We opted for the HEALPix tessalisation [1] that divides the sphere into 12 diamonds, each of them sub-divided  into 4 diamonds recursively. Thus the sphere at level 1 will consist of 48 diamonds, 192 diamonds at level 2, 768 at level 3 and so on.


A region of the sky will be determined by the list of the diamonds involved. Union, intersection and equality on such lists are straighforward and particularly fast if this list has been previously sorted.


To reduce the volume of such lists, four consecutive diamonds will be replaced by their "parent", recursively. The deepest level will be defined by the resolution required for describing  the data set.

1.2    Examples


In order to illustrate the MOC usage, we represent in this illustration an excerpt of GLIMPSE survey and its corresponding MOC map

GALEX AIS MOC (HEALPix order 8 – 70,000 cells) and SDSS (HEALPix order 10,- 225,000 cells):


1.3    HEALPix tessellation

Defining a sky region by a subset of regular sky tessellation is not a new idea. The advantage of sharing a homogeneous tessellation method has already been described in several articles, notably in the VO context [2], [3].

There are several methods of sphere partitioning. The three most commonly used in astronomy are QBOX, HTM and HEALPix.


QBOX is projecting the sphere onto the 6 faces of a cube, each of them being recursively subdivided into 4 squares. This method is the basis of the indexing system of  VizieR [5]. HTM is splitting the sphere into 8 triangles (octahedron), each of them being recursively subdivided into 4 triangles [4]. This method is notably used by the SLOAN database.

 

A comparison between HTM and HEALPix has been already presented in [4] in 2001.

1.3.1    Equal areas

By construction, HEALPix consists of diamonds with equal spherical surfaces. Thus the area of a region is trivial to compute. This is not the case either for QBOX or HTM for which the surface computation is a complex and necessarily slow task compared to HEALPix.

1.3.2    Computing time

Unlike HTM, HEALPix has the peculiarity that the  computing time does not depend on the hierarchical level. Conversely HTM requires the knowledge of the “parent” cell to determine the four sub-cells. It means that in HEALPix the computation time is constant.

1.3.3    Accuracy

The best implementation is presently at 0.3 arcsec for HTM (level 20) [3], at 75 arcsec for QBOX [5]HEALPix provides a library which allows the calculation accuracy to 0.4 mas (level 29). [9]

1.3.4    Standard

 There exist many HEALPix libraries: C++, Java, Fortran, IDL ... Also, there is a standard for encoding the HEALPix maps into a simple FITS file. And HEALPix was selected for several missions such as WMAP, Planck and GAIA soon. The HEALPix site is hosted at JPL [6].

 

These four points provide good reasons to choose HEALPix.

1.4    Examples of use cases

Such a system has a lot of potential use cases. We describe here the four main use cases for which there are already prototype implementations.

1.4.1    Visualize the coverage

First of all, the MOC structure allows to visualize easily any data set coverage. The MOC meshing nature provides an easy interpretation.  With a simple 4 diamond corner drawing approximation such as  shown in illustration 1.2, the time required for drawing such coverage is typically a few milliseconds. If necessary, thanks to its hierarchical structure this drawing time can be limited by stopping the drawing process at a coarser level than the best resolution.

1.4.2    Compare the coverage of multiple data sets

The computation of data set intersections is trivial (simple list comparisons). The result of such operations can be used in further operations. For instance it is  straightforward to compute the sky region determined by the intersection of two MOC surveys, and after that, query a data base such as Simbad for retrieving objects inside this intersection. Further, we can imagine that the data base uses itself an internal spatial index based on HEALPix. In this case all intermediate sky computations will be removed and the response time will be the best that we can imagine.

1.4.3    Remote cross-match (TOPCat use case)

One popular feature in TOPCAT [7] is the multiple cone search, which allows one to perform a remote positional cross-match between a local table and a remote one (accessible through the CS protocol). If the remote table has a sparse coverage, TOPCAT will fire up unnecessary queries for positions not covered by the table. Performance can be improved by providing TOPCAT with a way to retrieve the global coverage of the queried table.

1.4.4    Multi-service positional search

An attractive if simple form of virtual observatory user query is to request all records from the whole VO at a given sky position. While this is in principle possible by dispatching one narrow positional query to every registered Cone Search, SSA or SIA service, in practice the number of queries required leads to an unacceptable load on both clients and services, and moreover most of these queries will deliver no results because most services lack coverage in the queried region. This "All-VO" search capability was provided in early versions of the AstroGrid AstroScope tool, and subsequently deprecated because of these resource usage issues. If footprint information was available for all registered services, only those services with coverage in the region of interest, and hence which might provide a non-empty result set, could be easily identified, providing a great reduction in the number of service queries needed for a comprehensive search. MOC offers the opportunity to provide this coverage information in a uniform way and to store it locally or centrally in a way which supports querying coverage at a given point for a large number of services.

1.5    Scope and performance

1.5.1    Scope of usage

MOC is typically designed for defining coverage regions of data sets, notably pixel surveys or source catalogs.

Even though MOC allows to describe regions on the sky, it is not designed to accurately define regions. For this use case, STC is better suited[1]. For instance, MOC is clearly not designed to draw instrument fields of view or such kind of overlays.

Conversely, the comparison of two regions will definitely be easier and faster using MOC rather than STC. The main STC issue for comparisons is due to the fact that there is no canonical way of expressing a region. Thus, a basic equality test with STC may raise a complex spherical geometry problem. And this kind of problem can be really difficult to implement in a software. Conversely, a MOC region is always described in one way – a list of cells – comparison algorithms are simple and fast.

1.5.2    MOC resolution

The  MOC resolution is determined by the greatest HEALPix level used for defining the region. It depends on the accuracy chosen for the region definition.

As data sets do not follow the HEALPix cell borders, a MOC is generally an upper-approximation of the data set coverage. The quality of this approximation will be directly dependent on the chosen MOC resolution.

Note: There is no problem to compare two MOCs with different resolutions.

This table provides the HEALPix cell angular resolution for each HEALPix level.


Level

Best cell resolution(*)

 

Level

Best cell resolution(*)

0

58.63°

 

15

6.442”

1

29.32°

 

16

3.221”

2

14.66°

 

17

1.61”

3

7.329°

 

18

805.2mas

4

3.665°

 

19

402.6mas

5

1.832°

 

20

201.3mas

6

54.97'

 

21

100.6mas

7

27.48'

 

22

50.32mas

8

13.74'

 

23

25.16mas

9

6.871'

 

24

12.58mas

10

3.435'

 

25

6.291mas

11

1.718'

 

26

3.145mas

12

51.53”

 

27

1.573mas

13

25.77”

 

28

786.3µas

14

12.88”

 

29

393.2µas

(*) mean value. The HEALPix cell has constant area, not constant linear dimensions.

1.5.3    Size of MOC

The MOC describes a sky region as an explicit list of cells. The coding size can vary a lot from a few bytes to several kilobytes and more if required. As MOC is hierarchical, its size mainly depends of two factors: 

1.    The chosen MOC resolution;

2.    The geometry of the region (“compactness”).

Concretely, in case of MOCs describing pixel survey coverages, the number of cells is mainly dependent on the size of the coverage perimeter (size of borders).

In the case of MOCs describing source catalog coverages, the number of cells is mainly dependent on the distribution and the density of the catalog (for instance, for a catalog with a low density and a high distribution, it is possible to choose a very good MOC resolution without increasing significantly the MOC size – see appendix A).

 

1.5.4    MOC performance

The required MOC operations are classical set operations: equality, union, intersection... The algorithms for implementing this kind of operation are based on list manipulations. Good performance can be obtained by dichotomic algorithms on sorted lists.

To verify the power of such a system, a Java/TOMCAT prototype accessible via a Web browser has been developed. Its role is to load the 7,000 MOCs (VizieR catalog coverages) and wait for user requests.This prototype loads the MOCs at maximum level 8 corresponding to cells of 14 arcmin. At this level, the size of the 7,000 Mocs takes approximatively 90MB of memory without compression. The user request takes the form of an URL providing an unique HEALPix number. The goal is to retrieve the list of catalogs which intersect this cell. Similarly, it is possible to provide as parameter query, not a unique HEALPix cell, but a full MOC. The performances of such a prototype are provided in the table below. For instance less than 100 ms are required for retrieving the list of catalogs intersecting GALEX MOC.

2.5s

for loading 7,000 VizieR MOCs (level 8)

7ms

for HEALPix query (amongst 7,000 VizieR tables, which tables are concerned by this cell?)

90ms

for MOC query (amongst 7,000 VizieR tables, which tables intersect this MOC (4000 cells)?

30ms

for generating MOC intersection (SDSS level 10 & GALEX level 8)

(realized on a basic laptop)

We can imagine that a data base using the same HEALPix tessellation for its spatial index  will be very efficiently interoperable with a such method.

2      MOC encoding

As HEALPix is already used for other purposes, it has its own vocabulary. A HEALPix level is called an ORDER and a cell number is called a NPIX.

2.1    MOC specificity

2.1.1    Celestial reference

HEALPix allows three celestial references: galactic, equatorial or ecliptic reference. Allowing various references would limit a lot the possibility to compare efficiently MOCs:there is no equivalence between HEALPix cell described in a reference and a cell, or a list of sub-cells expressed in another reference system. Consequently, the MOC definition is restricted to equatorial reference, and more precisely, ICRS reference. This choice has been motivated by the fact that most catalogs are natively provided with equatorial coordinates. And also by the fact that this restriction does not change anything for whole sky surveys (WMAP, PLANCK...)

2.1.2    NESTED numbering

Another restriction concerns the numbering scheme used in MOC for specifying the cell numbers. HEALPix supports both "NESTED" and "RING" numbering. It is possible to convert the cell numbers from one system to the other, but it takes time. For fast comparisons the MOC must use the NESTED numbering scheme only.

 

The NESTED HEALPIX numbering consists of enumerating all cells in a specific order. For instance, at level 1, there are 48 diamonds (12x4) enumerated from 0 to 47. In the NESTED numbering scheme, the 4 sub- diamonds of diamond M have the numbers:  (Mx4)+3,  (Mx4)+1, (Mx4)+2, (Mx4) in reading order. And reciprocally, the father number of diamond N is N/4.


Note: The level 0 is a special case, it contains only 12 diamonds enumerated from 0 to 11.

2.1.3    Well-formed  MOC encoding

To keep the canonical property of MOC, the redundant cells are not allowed and the hierarchical encoding principle must be  respected.  Concretely, it is not allowed to encode 4 sibling cells instead of their parent cell (only at level 0 for the 12 first diamonds). By this simple constraint, there is one and only one way for describing a same region.

2.1.4    Sky coordinates and HEALPix cells

The way to determine which HEALPix cell contains such or such sky coordinates is described in the reference HEALPix article [1]. Several libraries already implement these algorithms [6]. For all tests concerning this note we have used the most recent java HEALPix library [9] written in the framework of the Gaia project.

 

These libraries are only required to generate or draw MOCs, and also to compare them with sky coordinates . But there is no need of these libraries for comparing MOCs.

 

2.2    Two MOC encoding formats

In order to easily manipulate such HEALPix cell lists, ie list of NPIX, two complementary encoding formats are defined.

The first one is based on ASCII text string following the JSON convention [13] – easy to write, to read and to parse. Its main feature is its versatility rather than performance.

The second one is  designed to be processed as quick as possible by a computer (no parsing procedure required, loading or writing in one fast step,...). It is an adaptation of the HEALPix standard for storing a classical HEALPix map as a FITS binary table.

A MOC “provider” must be able to deliver MOCs in both formats, a MOC “consumer” can only support one format.

2.2.1    JSON MOC

A JSON MOC is written as a simple ASCII stream following this syntax:
{ “order”:[npix,npix,...], “order”:[npix, npix...], ... } .

The values must be fully sorted and the MOC must be well-formed.

Example of a JSON MOC

{1:[1,2,4], 2:[12,13,14,21,23,25]}

2.2.2    FITS MOC

The FITS MOC format consists of a single binary table in which each value merges both the level and the number of the cell according to the NUNIQ HEALPix  coding method[2]. Depending on the maximum level, this table should be coded in 32-bits (maximum level less than 14) or 64-bits signed integers.

uniq = 4 * (4^order) + npix

The inverse operation is:

order = log2(uniq/4) /2
npix = uniq – 4 * (4^order)

The FITS values must be fully sorted and the MOC must be well-formed.

As the FITS MOC is  derived from a more generic standard dedicated to HEALPix maps, the table FITS header must contains these following keywords/value.

PIXTYPE = 'HEALPIX '    / HEALPix magic code
ORDERING= 'NUNIQ    '   / Order and NpixNest coding method
COORDSYS= 'C       '   / ICRS reference frame

It must also contain the MOC dedicated keyword “HPXMOC” which provides the resolution of the MOC. This keyword can be used to distinguish a classical HEALPix map from a MOC.

Example of FITS MOC headers:

SIMPLE  =                    T

BITPIX  =                    8

NAXIS   =                    0

EXTEND  =                    T

END

 

XTENSION= 'BINTABLE'           / HEALPix Multi Order Coverage map

BITPIX  =                    8

NAXIS   =                    2

NAXIS1  =                    4

NAXIS2  =                16461

PCOUNT  =                    0

GCOUNT  =                    1

TFIELDS =                    1

TFORM1  = '1J      '

TTYPE1  = 'NPIX    '           / HEALPix UNIQ pixel number

PIXTYPE = 'HEALPIX '           / HEALPix magic code

ORDERING= 'NUNIQ    '          / Order and NpixNest coding method

COORDSYS= 'C       '           / ICRS reference frame

HPXMOC  =                   12 / MOC resolution (best order)

END

 

Note:The FITS MOC is derived from the standard dedicated to HEALPix maps (partial definition ie a list a couples Cell_number/Pixel_value). However there are two specificities: 1- Only the cell number is stored, there is no associated pixel value. 2 – The HEALPix map concerns only one specific level. In FITS MOC, the level and the cell number are coded together.

 

2.3    Existing implementations and libraries

Presently a Java library is available under GPL v3 license, supporting the two MOC encoding formats. It allows basic operations: equality test, union, intersection... [10].

1.    Aladin tool [11] has implemented this java library in order to display MOCs on a background sky.

2.    The TOPCAT and STILTS tools have implemented it for improving their multi-conesearch cross match facility.  The cone corresponding to each source query is compared against the MOC of the service in order to consider only the sources present in the intersection of the data sets.  This will reduce considerably the number of unnecessary queries.

3.    A front-end of VizieR and Simbad has been deployed for providing the MOC associated to all VizieR tables, and Simbad data base – see next section.

2.4    Links to VO registry

 

We developed a web service getMoc (accessible through http://alasky.u-strasbg.fr/footprints/getMoc ) which delivers a MOC for a given cone search service. The cone search is identified by its base URL.

 

Example:  http://alasky.u-strasbg.fr/footprints/getMoc?baseUrl=http%3A%2F%2Fvizier.u-strasbg.fr%2Fviz-bin%2Fvotable%2F-A%3F-source%3DIII%2F105%26  will provide the MOC file for the cone search service which base URL is http://vizier.u-strasbg.fr/viz-bin/votable/-A?-source=III/105

 

The getMoc service can deliver MOCs for all 7,000+ VizieR cone search services, and also for a couple of other services (SDSS DR8, UKIDSS DR7 provided by the Royal Observatory, Edinburgh)

This service is currently used by TOPCAT when performing multiple cone search, in order to reduce the number of remote queries.

This mechanism is not limited to cone search and could be extended for other kind of data (images or spectra data for instance).
In the mid-term, we advocate an extension of the registry schema which would allow, for a given registry resource, to point to a MOC file describing the resource coverage.

 

References

[1] Gorski K. et al ., HEALPix: A Framework for High-Resolution Discretization and Fast Analysis of Data Distributed on the Sphere, The Astrophysical Journal, Volume 622, Issue 2, pp. 759-771. 2005, 2005ApJ...622..759G

[2] O'Mullane, Gorski K., Sky, Indexation, Pixelization, and the VO – Astronomical Data Analysis Software and Systems XIV – 347, 3 – 2005 2005ASPC..347....3O

[3] Budavàri T. et al., Searchable sky coverage of astronomical observations: footprints and exposures - Astronomical Society of the Pacific, 122, 1375 – 2010  2010PASP..122.1375B

[4] O'Mullane et al., Splitting the Sky - HTM and HEALPix, Proceedings of the MPA/ESO/MPE Workshop, Garching, 638 – 2001 2001misk.conf..638O

[5] VizieR QBOX , Ochsenbein F. -  http://cdsarc.u-strasbg.fr/doc/man//qbox.htx

[6] HEALPix  Web site (JPL) – http://healpix.jpl.nasa.gov/

[7] Taylor M. B., "TOPCAT & STIL: Starlink Table/VOTable Processing Software", Astronomical Data Analysis Software and Systems XIV, 347, p29, 2005 2005ASPC..347...29T

[8] Space-Time Coordinate Metadata for the Virtual Observatory - http://www.ivoa.net/Documents/latest/STC.html

[9] Hivon E., Goski K., Reinecke M. et al. - HEALPix java library -  http://sourceforge.net/projects/healpix/

[10] Fernique P., Thomas B. - MOC java library -  http://cds.u-strasbg.fr/resources/lib/exe/fetch.php?media=mocsrc.jar

[11] Bonnarel F., Fernique P. et al., The ALADIN interactive sky atlas. A reference tool for identification of astronomical sources  - Astron. Astrophys., Suppl. Ser., 143, 33-40 (2000) - April(I) 2000.  2000A&AS..143...33B

[12] Boch T. - CDS MOC server –   http://alasky.u-strasbg.fr/footprints/list/

[13] JavaScript Object Notation   RFC 4627

 

Appendix A – MOC size vs MOC resolution

     

The MOC size of source catalogs depends of the density and the distribution of these sources over the sky. We provide in this appendix four illustrating examples (Bright Star, Tycho, GCP, WISE) for which the MOCs are provided at  resolution 6,7,8 and 9.

 

·         Bright Star Catalog [V/50]: 9110 sources

Order

Number of cells

Increasing factor


6

7939

 


7

8630

x 1.09


8

8842

x 1.02


9

8934

x 1.01


 

 

·         General Catalog of Photometric Data [II/167]:165 896 sources

Order

Number of cells

Increasing factor


6

19998

 


7

65616

x 3.28


8

99879

x 1.52


 

 

·         Tycho [I/239]:1 058 332 sources

Order

Number of cells

Increasing factor


6

783

 


7

32320

x 41.27


8

347064

x 10.74


9

790483

x 2.28


 

WISE preliminary data catalog [II/307]: 257 310 278 sources

 

Order

Number of cells

Increasing factor


6

1322

 


7

2845

x 2.15


8

5709

x 2


9

12231

x 2.32


 



[1]        STC is a recommended IVOA standard which allows to define a sky region by operations (union, intersection...) with simple shapes (circles, ellipses, polygons...)

[2]    Eric Hivon proposal (HEALPix co-author [1])