Time Series Cube DM - IVOA Note
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Thu Mar 16 12:03:45 CET 2017
Hi Mark, Hi DM,
I've had a look at Jiri's Time Series DM, which I think is a good
test case for design patterns -- but before that...
On Mon, Feb 27, 2017 at 10:56:04AM -0500, CresitelloDittmar, Mark wrote:
> boundaries, instead of defining a relationship. This is a very different
> approach than the vo-dml and compliant model work has been taking over the
> past few years. I don't even know how to express that in the model. It
> seems a bit late in the game to be expressing this concern so urgently.
Well, we (or at least I) are only now starting to actually serialise
annotated data and then try to do something with these annotations.
I don't think it's surprising that that's the point when, well,
implementation feedback comes in. Also, I'm not sure there's
*terribly* much that needs changing; essentially, I suspect dropping
some inheritance relationships and thinking hard whether complex
objects need to be referenced/embedded or if referencing atomic
entities with additional annotations (examples below) will do is
almost enough.
[I mention in passing that Arnold will certainly and eye-rollingly
testify that I've been clamouring for small, compartimentalised data
models for a long, long time]
I'll take the liberty of illustrating what I'm proposing taking up an
example from Jiri's Time Series serialisation proposal. His basic
annotation looks like this:
<GROUP id="timeseries" vodml="ndcube:TimeSeriesCube">
<GROUP id="independent_axes" vodml="ndcube:CubeAxis">
<GROUP name="dateTimeAxis" vodml="ndcube:CubeAxis">
<FIELDref ref="HJD" id="field"/>
<GROUPref ref="datestc" id="model" vodml="VODML Model"/>
</GROUP>
...
<GROUP id="dependent_axes" vodml="ndcube:CubeAxis">
<GROUP name="fluxAxis" vodml="CubeAxis">
<FIELDref ref="FLX" id="field"/>
<FIELDref ref="FLXERR" id="error"/>
<GROUPref id="model" vodml="VODML Model"/>
</GROUP>
<GROUP name="magnitudeAxis" vodml="ndcube:CubeAxis">
<FIELDref ref="MAG" id="field"/>
<FIELDref ref="MAGERR" id="error"/>
<GROUPref id="model" vodml="VODML Model"/>
</GROUP>
</GROUP>
-- essentially, this says "there's a (sparse) data cube somewhere in
this data set that has a time as the independent axis as FLX and MAG
as observables. Plus, there are references to additional metadata,
and the thing groups values and errors together.
Leaving aside that this is invalid XML (you can't have multiple
elements with the same @id; these should have been role annotations),
I'm convinced it's wrong to model something like "value with error"
separately in each DM. I also don't think it's helpful to have a
reference to *the* DM annotation for an axis somewhere -- there can
always be multiple annotations (e.g., photometry-1.0, photometry-2.1,
and provenance) on a given thing in the VO-DML world. If I had to
name a single killer feature of VO-DML, that's what I'd name.
Now, here's how I'd like such an annotation to look like. I'm using
vodml-type and -role as attributes rather than elements here for
readability, and I'm interspersing the comments; please allow me to
indulge in improvising class and attribute names for the time being.
I hope if they don't match current models they should at least
readily map to them:
================= Dataset annotation =======================
<GROUP vodml-type="ds:Dataset">
<PARAM vodml-role="dataproductType" value="Timeseries"/>
<PARAM vodml-role="publisherDID" value="ivo://example.org/prod?ts0000"/>
<GROUP vodml-type="ds:BaseTarget" vodml-role="target">
<GROUPref vodml-role="position" ref="targetPosition"/>
<!-- this is an exampel of a reference to a complex entity;
I believe we should reduce these as much as possible,
because they introduce hard dependencies of DMs and will
lead to a combinatorial catastrophe if used too much.
As long as any stc2 annotation will work here, though,
we might still pull it off -->
</GROUP>
...
</GROUP>
<!-- That's it - no embedding of this, no turning up of the
attribute names somewhere else. If it's a dataset, you have a
GROUP[@vodml-type='ds:Dataset'], and if there's a
*[@vodml-role='dataproductType'] in there, that's where you figure
out where to get the dataproduct type (could be a PARAM, PARAMref or
even FIELDref if you have a metadata table for lots of ds:Datasets
-->
<GROUP vodml-type="stc:Position2D" id="targetPosition">
<!-- any STC client can now figure out there's a position here,
and it can be referenced from multiple annotations. It just
*happens* that this group works for dataset's target position -->
<PARAM vodml-role="c1" value="54.3"/>
<PARAM vodml-role="c2" value="-12"/>
<GROUP vodml-type="SpaceFrame" vodml-role="Frame">
...
</GROUP>
</GROUP>
================== Cube annotation =========================
<GROUP vodml-type="ndcube:Cube">
<!-- No reason to have an extra type for time series; that's
already defined in ds:Dataset.dataproductType and unlikely to
be of relevance to a cube-only client (e.g., a plot program)
anyway. -->
<FIELDref vodml-role="independent-axis" ref="obs_date"/>
<!-- that's it; a client just counts
*[@vodml-role="independent-axis" and knows the number of dimensions
in the cube. All additional annotation is on the FIELD itself. -->
<FIELDref vodml-role="dependent-axis" ref="FLX"/>
<FIELDref vodml-role="dependent-axis" ref="MAG"/>
<!-- that's it; a single reference defines a "value" in this cube,
and any further annotation is on the field itself, where non-cube
clients can also use it. -->
</GROUP> <!-- I don't think much further metadata is needed here -->
============== STC+Quantity annotation =====================
<GROUP vodml-type="stc:Time">
<!-- the one place STC metadata is collected -->
<FIELDref vodml-role="value" ref="obs_date"/>
<!-- note how we "amend" metadata on obs_date here; by co-reference
with the ndcube:Cube annotation, obs_date is *both* an independent
axis *and* a time. -->
<PARAM vodml-role="timescale" value="TT"/>
<PARAM vodml-role="timeformat" value="MJD"/>
<!-- timeformat is an invention; STC 1.0 uses classes to
distinguish between JD, MJD, "ISO". -->
<PARAM vodml-role="referencePosition" value="BARYCENTER"/>
...
</GROUP>
<GROUP vodml-type="ivoa:Quantity">
<!-- all measurements (can) have errors, min/max vals, etc, so
there's no point separately modelling this in cube, stc,
photometry, etc.; let's have ivoa:Quantity for that. -->
<FIELDref vodml-role="value" ref="obs_date"/>
<FIELDref vodml-role="standard-deviation" ref="err_time"/>
<PARAM name="minimum" value="56493.339"/>
<PARAM name="maximum" value="56498.341"/>
<!-- which also does much of char:, without introducing 1000s of
utypes -->
</GROUP>
============ Photometry+Quantity annotation ==================
<GROUP vodml-type="phot:PhotometryPoint">
<FIELDref vodml-role="value" ref="FLX">
<GROUP vodml-type="phot:Filter">
<PARAM vodml-role="name" value="K_s"/>
<PARAMref vodml-role="spectralLocation" ref="spec_loc_K_s"/>
<!-- note how I'm referencing a param here that's part of an
annotation; this way, Photometry (in principle) still doesn't
have to know anything about Quantity, but we can still have full
Quantity info on the spectral location. -->
</GROUP>
<GROUP vodml-type="phot:PhotometricSystem">
<PARAM vodml-role="description" value="Sloan"/>
</GROUP>
</GROUP>
<GROUP vodml-type="ivoa:Quantity">
<!-- it's exactly the same thing as above for obs_date; if
a client understands Quantity, it'll need no extra code, no extra
utypes to interpret this as well as the obs_date annotation. -->
<FIELDref vodml-role="value" ref="FLX"/>
<FIELDref vodml-role="standard-deviation" ref="FLXERR"/>
<!-- here, and not in a custom thing within ndcube or somewhere
else, is the connection made between FLX and FLXERR; that way, a
Quanitity-knowing client can figure out the error, not just one
for NDCube -->
<PARAM name="minimum"...
</GROUP>
<GROUP vodml-type="ivoa:Quantity">
<!-- By furnishing spec_loc_K_s with Quantity metadata, we can
communicate additional information if we have it, even for a PARAM.
-->
<PARAM id="spec_loc_K_s" vodml-role="value" value="2.2e-6"/>
<PARAM vodml-role="minimum" value="1.8e-6"/>
<PARAM vodml-role="maximum" value="2.5e-6"/>
</GROUP>
.... and the same for MAG ...
============= Field declarations
<FIELD ID="dateObs" name="dateObs"/>
<FIELD ID="FLX" name="FLX"/>
<FIELD ID="FLXERR" name="FLXERR"/>
If the size of this puts you off: Well, compared to today's FITS
headers, that's still compact and eminently readable, so I'd not
worry about this here.
What's really missing at this point as far as standards are concerned
is, as far as I know, ivoa:Quantity (or perhaps we should have a DM
of its own? I suspect Quantity will go through several revisions as
it starts getting used).
Can anyone summarise the state of Quantity modelling? I always get a
bit lost in all the artefacts in volute's dm branch...
Cheers (and with apologies if this indeed comes a bit late),
Markus
More information about the dm
mailing list