VOTable for simulations
Claudio Gheller
c.gheller at cineca.it
Fri Sep 1 02:43:16 PDT 2006
Ciao Gerard,
>
>>From their example I gather that arraysize="41x41x41x3" means "three data
>cubes of dimensions 41x41x41",
>not "one 3D-vector valued datacube of dimensions 41x41x41".
>"41x41x41" would mean "41 2D datafields of dimension 41x41". I think that
>therefore a 3D vector field
>could/has to be encoded as (for example)
>
>
Ok, in order to keep alligned with VOTable standards let's adopt this
solution. EACH <FIELD ... > CONTAINS ONLY a SCALAR FIELD, that can be
both a real scalar quantity and a component of a multidimensional array.
In this way we could lose the "nature" of the quantity (scalar or
vector), but this could be recovered by the associated UCD.
><?xml version="1.0"?>
><VOTABLE xmlns:xsd="http://www.w3.org/2001/XMLSchema"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xmlns="http://vizier.u-strasbg.fr/xml/VOTable-1.1.xsd">
> <RESOURCE name=myVectorField>
> <TABLE name="VelocityField" ID="Vel">
> <FIELD name="vx" ID="vx1" ucd="phys.veloc;pos.cartesian.x"
>datatype="float"
> arraysize="41x41x41x1" unit="km/s" />
> <FIELD name="vy" ID="vy1" ucd="phys.veloc;pos.cartesian.y"
>datatype="float"
> arraysize="41x41x41x1" unit="km/s" />
> <FIELD name="vz" ID="vz1" ucd="phys.veloc;pos.cartesian.z"
>datatype="float"
> arraysize="41x41x41x1" unit="km/s" />
> <DATA>
> <BINARY>
> <STREAM href="file:///scratch/myhome/test.bin"/>
> </BINARY>
> </DATA>
> </TABLE>
> </RESOURCE>
></VOTABLE>
>
>This makes the content of the individual field components more explicit.
>Each gets it own UCD for example.
>
>
>I have removed the rank attribute for the moment.
>
>
Fine, we can get the associated information from the "arraysize"
>There is no way yet to specify the spatial coordinates of the grid cells.
>For a grid one can specify
>the spatial coordinates in general in a shorthand way, for example using a
>set of standard parameters
>as in the FITS array keywords (see
>http://fits.gsfc.nasa.gov/standard21b/fits_standard.pdf 5.4.2.5),
>CRPIXn, CDELTn etc. I think we need to specify something like that here as
>well, it is definitly more
>efficient than having separate cubes with the coordinates.
>Luckily in general our coordinate system will not require the full WCS like
>formalism in general.
>
>
I will take a look at these FITS standards. Obvioulsly, for a regular
grid we do not need to specify the grid points but only the number of
points and the mesh resolution per each dimension. I guess this
information can be specified at the level of the <RESOURCE> element
(but I'm not sure it is the right place). Obviously if data are mixed,
mesh and particles in the same file, the information refers only to the
mesh components.
>Then, though it is possible to use this same formalism for particle data as
>well, I think there the tabular approach is more natural in many
>circumstances. In particular in the work that I have been doing with
>databases,
>the natural representation of more complex individual objects is as a table,
>with all the properties, including
>now the positions, in a row. The way to store such tabular datasets in
>binary form is specified exactly in the
>the existing VOTable spec, in section 5.3. An equivalent C-struct oriented
>format in binary files is what I have encountered consistently for more
>complex objects coming for example from the postprocessing of cosmological
>simulations at the MPA in Garching.
>
>But you're right that many people also store particle data in individual
>arrays for each particle property.
>
>
I agree that we should support both the approaches. The tabular approach
is more natural, since you can find "close to each other" all the
information related to a particle. However, often the user needs to
select only one (or few) of the properties of the particles. For
example, for visualization you may need to load only the position. For
the mass functions, the positions and the masses. For the velocity
dispersion, the three component of the velocity. In this cases, it is
much faster and more efficient to store each variable separately (you
load all the data in a single contiguous read operation). Furthermore,
many codes use this kind of storage technique (e.g. Gadget and Enzo,
between the most popular).
>That is more naturally mapped in the sense of your rank 1/2 examples. Making
>the same adjustment as above for
>the datacubes I would propose to allow also something as in the following
>example:
>
><?xml version="1.0"?>
><VOTABLE xmlns:xsd="http://www.w3.org/2001/XMLSchema"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xmlns="http://vizier.u-strasbg.fr/xml/VOTable-1.1.xsd">
> <RESOURCE name=myParticles>
> <TABLE name="Particles" ID="NBody">
> <FIELD name="x" ID="x1" ucd="pos.cartesian;pos.cartesian.x"
>datatype="float"
> arraysize="100000x1" unit="Mpc" />
> <FIELD name="y" ID="y1" ucd="pos.cartesian;pos.cartesian.y"
>datatype="float"
> arraysize="100000x1" unit="Mpc" />
> <FIELD name="z" ID="z1" ucd="pos.cartesian;pos.cartesian.z"
>datatype="float"
> arraysize="100000x1" unit="Mpc" />
> <FIELD name="vx" ID="vx1" ucd="phys.veloc;pos.cartesian.x"
>datatype="float"
> arraysize="100000x1" unit="km/s" />
> <FIELD name="vy" ID="vy1" ucd="phys.veloc;pos.cartesian.y"
>datatype="float"
> arraysize="100000x1" unit="km/s" />
> <FIELD name="vz" ID="vz1" ucd="phys.veloc;pos.cartesian.z"
>datatype="float"
> arraysize="100000x1" unit="km/s" /> <DATA>
> <BINARY>
> <STREAM href="file:///scratch/myhome/test.bin"/>
> </BINARY>
> </DATA>
> </TABLE>
> </RESOURCE>
></VOTABLE>
>
>
>
>
At this point, I would drop the "x1" in the "arraysize".
>In the latter case we still need something to distinguish between particle
>data and image data.
>Your rank basically does that, just the name might be unfortunate. We might
>want to be more explicit
>about the kind of data that is stored, an attribute with values MESH, N_BODY
>maybe ?
>
>
ok... name of the attribute? maybe "geometry"...
>In your example you use an HDF5 binary file. VOTable does not support that,
>though it does support FITS,
>I suppose as BINARY table (see VOTable spec section 5.2). Is there a natural
>mapping from VOTable key words
>to HDF metadata structures ? Or shall we first concetrate on the binary
>serialisations specified in VOTable ?
>
>
At the moment I would focus on "pure" binaries. HDF5 was cited only as
an example.
I go on...
Thanks a lot for all the comments!!!
C.
--
------------------------------------
Dr. Claudio Gheller, Ph.D.
High Performance System Division
CINECA - Bologna - Italy
Tel. +39-051-6171560
Fax. +39-051-6137273
------------------------------------
More information about the theory
mailing list