VOTable for simulations
c.gheller at cineca.it
Thu Aug 31 05:55:13 PDT 2006
in the meantime I had thought a litlle about possible formats for the
VOTable. In fact I come to the conclusion that there is little new to
add to the already existing VOTable specification, both for grids and
The only parameters that I think we have to add is the "rank" parameter
(it may already exist, but I could have missed it).
Rank is the only parameter that makes grids different from particles,
scalars from vectors. For the rest, particles are completely the same as
grids. NO different approaches are needed.
Rank = 1 --> scalar on particles (a sequence of scalar values
associated to the N particles, one info per particle, N values)
Rank = 2 --> vector on particles (sets of three values per particles, Nx3)
Rank = 3 --> scalar on grids (one value per grid point, NxNxN - assuming
a cubic grid for simplicity)
Rank = 4 --> vector on grids (set of three values per grid point, NxNxNx3)
At the moment let's consider only 3D simulations.
From the example belowe you can notice that "rank" is a ridondant info
that can be obtained also directly from the "arraysize" parameter. But
you must go through a parsing and therefore it could be useful to keep
it highlighted in a specific parameter.
In this version, more variables, of different sizes, can be stored in
the SAME file. The file could have different formats (fits, hdf... that
must be specified properly). I assume, for the moment, a raw binary
file, where variables are written one after the other (the standard
table structure in row and colums is not efficient or even possible).
The entry point for each variable can be easily calculated using the
"arraysize" and "datatype" parameters. Furthermore, the order in which
they are sotred must be specified. And this could be the order in which
the FIELDs are stored in the VOTable.
Example: our data file contains a scalar field on a mesh, a vector field
on a mesh, a scalar and a vector fields on particles:
<TABLE name="BmTemperature" ID="MyTestTable" >
<FIELD name="BmTemperature" ID="myTestObject1"
ucd="" datatype="float" rank="3" arraysize="41x41x41" unit="K" />
<FIELD name="BmVelocity" ID="myTestObject2"
ucd="" datatype="float" rank="4" arraysize="41x41x41x3" unit="km/sec" />
<FIELD name="ParticlPos" ID="myTestObject3"
ucd="" datatype="float" rank="2" arraysize="10000x3" unit="Mpc" />
<FIELD name="PartDens" ID="myTestObject4"
ucd="" datatype="float" rank="1" arraysize="10000" unit="g/cm3" />
Let me know your opinion.
>Sorry for the late reply to this email. I'm Cc-ing the theory group as well
>I gather you are thinking of grid simulation data here, so this mail does
>not apply to N-body. Anyway, for that I think we can use the VOTable spec as
>it stands, in particular section 5.3 dealing with binary serialisation (see
>In the case you address, would it make sense to try to mimick FITS in the
>naming of key words, so use NAXIS for rank, and NAXIS1 for size0, NAXIS2 for
>size1 etc for the dimensions ? If I am not mistaken VOTable itself is based
>on the FITS binary table spec, so your proposal might be seen as a
>translation of a FITS datacube (IMAGE). Did we actually not think about
>using FITS as is for (uniform) grid simulations ? In that case your proposal
>could also be used I guess, where iso STREAM we'd have FITS as in standard
>VOTable usage (though I don't know whether votable presumes that the FITS
>file contains a table).
>I am not sure whether FITS images/datacubes allow multiple values per cell
>(i.,e. have an array size), but don't think so. Otherwise we could probbaly
>generalise in that direction.
>Do you propose to follow the VOTable/FITS directions on little-vs big-endian
>>From: Claudio Gheller [mailto:c.gheller at cineca.it]
>>Sent: Thursday, July 20, 2006 12:37 PM
>>To: Gerard Lemson; Ugo Becciani; Alessandro Costa; Marco Comparato; R.
>>Subject: VOTable for simulations
>>I have tried to figure out the structure of a VOTable for simulated
>>data. In the following the result.
>>I made the following assumptions:
>>1. data are binary
>>2. the binary file is a raw stream of byte, with no structure (no fits,
>>no hdf...). It is external to the VOTable (at the moment I've not
>>considered base64 conversion for performance reasons)
>>3. Each file has an XML descriptor associated. The descriptor at
>>present gives only the necessary infos to deal with the file.
>>4. Each file contains ONE variable. This is suggested for the following
>>- data rank and size can change from variable to variable.
>>- complex description
>>- The association direct XML header file - bin file - variable, is
>>easier to handle.
>>- smaller files
>>- files easier to handle by external applications (also not VO-compliant)
>>- drawback: proliferation in the number of files
>>However we can consider the support to more complex files or even
>>formats, like FITS or HDF5. But let's start with something simple.
>>At this point I made the Snap program create binary files (at present
>>still HDF5, but just for backward compatibility) and associated XMLs.
>>test.h5 ----> snapped data
>>test.h5.xml ----> associated VOTable:
>> <RESOURCE name=myTestResource>
>> <TABLE name="BmTemperature" ID="MyTestTable" >
>> <FIELD name="BmTemperature" ID="myTestObject"
>>ucd="" datatype="float" arraysize="41x41x41" unit="Kelvin" />
>> <PARAM name="rank" datatype="int" value="3"/>
>> <PARAM name="size0" datatype="long" value="41"/>
>> <PARAM name="size1" datatype="long" value="41"/>
>> <PARAM name="size2" datatype="long" value="41"/>
>> <STREAM href="file:///scratch/myhome/test.h5"/>
>>Notice that the rank and size of the dataset is expressed in the
>>arraysize keyword of FIELD. It is also written in the 4 PARAM fields.
>>This is just to avoid the parsing of the string to get the basic info of
>>rank and size and to have them directly as numbers (with their precise
>>type). At present there are no UCD and no reference to the SNAP
>>protocol, since both are not yet defined. I'm working on the latter...
>>This is the very first attempt!!! Let me know all your comments.
>>Dr. Claudio Gheller, Ph.D.
>>High Performance System Division
>>CINECA - Bologna - Italy
Dr. Claudio Gheller, Ph.D.
High Performance System Division
CINECA - Bologna - Italy
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the theory