VOTable for simulations

Fri Sep 1 02:43:16 PDT 2006

Ciao Gerard,

>
>>From their example I gather that arraysize="41x41x41x3" means "three data
>cubes of dimensions 41x41x41", 
>not "one 3D-vector valued datacube of dimensions 41x41x41".
>"41x41x41" would mean "41 2D datafields of dimension 41x41". I think that
>therefore a 3D vector field 
>could/has to be encoded as (for example)
>  
>
Ok, in order to keep alligned with VOTable standards let's adopt this 
solution. EACH <FIELD ... > CONTAINS ONLY a SCALAR FIELD, that can be 
both a real scalar quantity and a component of a multidimensional array. 
In this way we could lose the "nature" of the quantity (scalar or 
vector), but this could be recovered by the associated UCD.

><?xml version="1.0"?>
><VOTABLE xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
>  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
>  xmlns="http://vizier.u-strasbg.fr/xml/VOTable-1.1.xsd">
> <RESOURCE name=myVectorField>
>   <TABLE name="VelocityField" ID="Vel">
>      <FIELD name="vx" ID="vx1" ucd="phys.veloc;pos.cartesian.x"
>datatype="float" 
>             arraysize="41x41x41x1"   unit="km/s" />
>      <FIELD name="vy" ID="vy1" ucd="phys.veloc;pos.cartesian.y"
>datatype="float" 
>             arraysize="41x41x41x1"   unit="km/s" />
>      <FIELD name="vz" ID="vz1" ucd="phys.veloc;pos.cartesian.z"
>datatype="float" 
>             arraysize="41x41x41x1"   unit="km/s" />
>      <DATA>
>        <BINARY>
>          <STREAM href="file:///scratch/myhome/test.bin"/>
>        </BINARY>
>      </DATA>
>    </TABLE>
>  </RESOURCE>
></VOTABLE>  
>
>This makes the content of the individual field components more explicit.
>Each gets it own UCD for example.
>  
>
>I have removed the rank attribute for the moment.
>  
>
Fine, we can get the associated information from the "arraysize"

>There is no way yet to specify the spatial coordinates of the grid cells.
>For a grid one can specify 
>the spatial coordinates in general in a shorthand way, for example using a
>set of standard parameters 
>as in the FITS array keywords (see
>http://fits.gsfc.nasa.gov/standard21b/fits_standard.pdf 5.4.2.5), 
>CRPIXn, CDELTn etc. I think we need to specify something like that here as
>well, it is definitly more 
>efficient than having separate cubes with the coordinates.
>Luckily in general our coordinate system will not require the full WCS like
>formalism in general.
>  
>
I will take a look at these FITS standards. Obvioulsly, for a regular 
grid we do not need to specify the grid points but only the number of 
points and the mesh resolution per each dimension. I guess this 
information can be specified at the level of the  <RESOURCE> element 
(but I'm not sure it is the right place). Obviously if data are mixed, 
mesh and particles in the same file, the information refers only to the 
mesh components.

>Then, though it is possible to use this same formalism for particle data as
>well, I think there the tabular approach is more natural in many
>circumstances. In particular in the work that I have been doing with
>databases,
>the natural representation of more complex individual objects is as a table,
>with all the properties, including
>now the positions, in a row. The way to store such tabular datasets in
>binary form is specified exactly in the 
>the existing VOTable spec, in section 5.3. An equivalent C-struct oriented
>format in binary files is what I have encountered consistently for more
>complex objects coming for example from the postprocessing of cosmological
>simulations at the MPA in Garching.
>
>But you're right that many people also store particle data in individual
>arrays for each particle property.
>  
>
I agree that we should support both the approaches. The tabular approach 
is more natural, since you can find "close to each other" all the 
information related to a particle. However, often the user needs to 
select only one (or few) of the properties of the particles. For 
example, for visualization you may need to load only the position. For 
the mass functions, the positions and the masses. For the velocity 
dispersion, the three component of the velocity. In this cases, it is 
much faster and more efficient to store each variable separately (you 
load all the data in a single contiguous read operation). Furthermore, 
many codes use this kind of storage technique (e.g. Gadget and Enzo, 
between the most popular).

>That is more naturally mapped in the sense of your rank 1/2 examples. Making
>the same adjustment as above for
>the datacubes I would propose to allow also something as in the following
>example:
>
><?xml version="1.0"?>
><VOTABLE xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
>  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
>  xmlns="http://vizier.u-strasbg.fr/xml/VOTable-1.1.xsd">
> <RESOURCE name=myParticles>
>   <TABLE name="Particles" ID="NBody">
>      <FIELD name="x" ID="x1" ucd="pos.cartesian;pos.cartesian.x"
>datatype="float" 
>             arraysize="100000x1"   unit="Mpc" />
>      <FIELD name="y" ID="y1" ucd="pos.cartesian;pos.cartesian.y"
>datatype="float" 
>             arraysize="100000x1"   unit="Mpc" />
>      <FIELD name="z" ID="z1" ucd="pos.cartesian;pos.cartesian.z"
>datatype="float" 
>             arraysize="100000x1"   unit="Mpc" />
>      <FIELD name="vx" ID="vx1" ucd="phys.veloc;pos.cartesian.x"
>datatype="float" 
>             arraysize="100000x1"   unit="km/s" />
>      <FIELD name="vy" ID="vy1" ucd="phys.veloc;pos.cartesian.y"
>datatype="float" 
>             arraysize="100000x1"   unit="km/s" />
>      <FIELD name="vz" ID="vz1" ucd="phys.veloc;pos.cartesian.z"
>datatype="float" 
>             arraysize="100000x1"   unit="km/s" />      <DATA>
>        <BINARY>
>          <STREAM href="file:///scratch/myhome/test.bin"/>
>        </BINARY>
>      </DATA>
>    </TABLE>
>  </RESOURCE>
></VOTABLE>  
>
>
>  
>
At this point, I would drop the "x1" in the "arraysize".

>In the latter case we still need something to distinguish between particle
>data and image data.
>Your rank basically does that, just the name might be unfortunate. We might
>want to be more explicit
>about the kind of data that is stored, an attribute with values MESH, N_BODY
>maybe ?
>  
>
ok... name of the attribute? maybe "geometry"...

>In your example you use an HDF5 binary file. VOTable does not support that,
>though it does support FITS,
>I suppose as BINARY table (see VOTable spec section 5.2). Is there a natural
>mapping from VOTable key words 
>to HDF metadata structures ? Or shall we first concetrate on the binary
>serialisations specified in VOTable ?
>  
>
At the moment I would focus on "pure" binaries. HDF5 was cited only as 
an example.

I go on...
Thanks a lot for all the comments!!!

C.

-- 
------------------------------------
Dr. Claudio Gheller, Ph.D.
High Performance System Division
CINECA - Bologna - Italy
Tel. +39-051-6171560
Fax. +39-051-6137273
------------------------------------