Time Series Cube DM - IVOA Note

Mon Mar 20 16:22:16 CET 2017

Dear DM,

On Sun, Mar 19, 2017 at 09:38:50PM +0100, Ji??í Nádvorník wrote:
> > <GROUP vodml-type="stc:Position2D" id="targetPosition">
> >   <!-- any STC client can now figure out there's a position here,
> >   and it can be referenced from multiple annotations.  It just
> >   *happens* that this group works for dataset's target position -->
> >   <PARAM vodml-role="c1" value="54.3"/>
> >   <PARAM vodml-role="c2" value="-12"/>
> >   <GROUP vodml-type="SpaceFrame" vodml-role="Frame">
> >     ...
> >   </GROUP>
> > </GROUP>
> >
> Hmm, no objections to the this, but this should not be part of the Cube
> DM, but the Dataset DM right?

Yes, I guess it's the dataset DM that talks about target positions
for now.  This means it needs to know about position as such, which
means that it's very hard to keep knowledge of STC out of Dataset, so
yes, I think ds: will have references to stc:Position*-typed
elements.

As long as we're sure that minor updates of STC don't break Dataset
(or any other DM), I hope that still remains manageable from a
standard evolution perspective.  

But clearly, if we ever have a(nother) major update of STC, all DMs
referencing types from STC will need re-doing.  That's certainly not
ideal but clearly can't always be helped, not anyway with something
as fundamental to astronomy as STC.

> 
> >
> > ================== Cube annotation =========================
> >
> > <GROUP vodml-type="ndcube:Cube">
[...]
> >   <FIELDref vodml-role="independent-axis" ref="obs_date"/>
> >   <!-- that's it; a client just counts
> >   *[@vodml-role="independent-axis" and knows the number of dimensions
> >   in the cube.  All additional annotation is on the FIELD itself. -->
> >
> >   <FIELDref vodml-role="dependent-axis" ref="FLX"/>
> >   <FIELDref vodml-role="dependent-axis" ref="MAG"/>
> > </GROUP> 
> >
[...]
> > ============== STC+Quantity annotation =====================
> >
> > <GROUP vodml-type="stc:Time">
> >   <FIELDref vodml-role="value" ref="obs_date"/>
> >   <PARAM vodml-role="timescale" value="TT"/>
> >   <PARAM vodml-role="timeformat" value="MJD"/>
> >   <PARAM vodml-role="referencePosition" value="BARYCENTER"/>
> >   ...
> > </GROUP>
> >
> > <GROUP vodml-type="ivoa:Quantity">
> >   <FIELDref vodml-role="value" ref="obs_date"/>
> >   <FIELDref vodml-role="standard-deviation" ref="err_time"/>
> >   <PARAM name="minimum" value="56493.339"/>
> >   <PARAM name="maximum" value="56498.341"/>
> > </GROUP>
> >
> 
> I kind of like this. The analogy in the Time Series Cube DM as it is now
> is:
> 
>    - ivoa:Quantity == Cube Axis.
>    - stc: Time == Axis Domain Model
> 
> The only difference is that you are referencing from the cube only the
> data, losing direct link between the cube and it's metadata. The problem is
> that we have a Cube dataset, meaning we are storing only cubes in there.
> But I don't want to put a restriction on the whole VOTable that it mustn't
> contain a single PARAM or FIELD element that is not actually referenced
> from the Cube DM part.

That is not implied at all.  In this plan, you can add one or more
cube annotations to a VOTable (or, in some closly related way, a FITS
or HDF5 file) at will; cube clients can figure out what the cubes are
and will simply ignore whatever they don't need or understand.

> Given that, when I read the Cube DM part, I don't have a clue where its
> metadata lie. I can go to the *data*, but there I find only metadata about
> the data itself, not about the cube that is holding the data. To find it, I
> still need to scan the whole VOTable, find all Quantity models and check
> whether they don't by chance reference the same FIELDs as my Cube is.

Yes, that is, indeed, the downside of the plan -- if you want to
figure out the range or an error column of an axis, you will have to
parse the entire VOTable and all VODML annotations.  I don't think
that, in itself, is a *major* issue, as you cannot really parse XML
only partially, and once you can *parse* some VO-DML annotation you
can *parse* all (*understanding* is a different matter, of course).

I would therefore maintain that the loose coupling and the
possibility of having annotations for multiple DM versions at the
same time outweighs the additional complications of the co-reference
scheme in many typical cases.

For illustration, here's how I hope a client could work:

(1) A cube component locates cube:NDCube-typed annotations.

(2) It let the user choose axes of interest (i.e., just the FIELDs; one
shouldn't need much more than VOTable-style metadata for this).

(3) Pass the user selection to a plotting component.

(4) The plotting component now has a set of FIELDs it wants to plot
and needs, say, the error columns.  It locates the
ivoa:Quanitity-typed annotations for the FIELDs it has and can thus
figure out the respective errors.  That way it can draw, e.g., error
bars.

(5) The controlling program now wants to overplot a second dataset.
It therefore needs to know the nature of the axes.  Say we're dealing
with a time series client, so it will look for stc:Time annotations
for the independent axis.  If these doesn't exist (in a version it
understands), it errors out, the datasets cannot be (automatically)
overplotted.  If it finds such an annotation, it can figure out what
transformations are necessary (e.g., TDB expressed in JD to TT
expressed in MJD) and hopefully even perform them.

(6) Similarly, it can try annotations it understands (e.g.,
phot:PhotometryPoint) for the dependent axis (let me indulge in that
terminology for today), in both datasets; again, with a bit of luck
it can preform transformations to bring the two datasets together
(say, by converting mag to flux units).

(7) If the controlling program has updated the quantity metadata
during the transformations, it can, as above, pass on the two things
to the plotting component, which, again, does not have to worry about
any other annotation except quantity (and will therefore work with a
different main program that, perhaps, is specialised on spectral data
and wouldn't have understood stc:Time)

> So I propose to reference the Quantity directly from the independent_axis,
> instead of the FIELD. That way the relationship is more logical IMHO. I
> read information about the Cube - I know how many independent axes (coords)

As you say, that is entirely possible in this scheme too
(technically, it's a GROUPref; depending on what it is you're
actually modelling there are various possibilities what VO-DML is
underlying).

I won't even dispute that it's more "logical" in the sense of easier
to work with when you want to process the annotations with XSLT or
similar.  All other things equal, I'd go for that as well.

But, when you're referencing ObjectType or DataType instances from
other data models rather than only VOTable atoms (FIELDs and PARAMs),
you introduce a tight coupling between two DMs (in this case, cube
and quantity), such that only one specific combination of (major)
versions of both DMs can work together.  From my experience with VO
standards evolution, that incurs a high risk that we can't move any
involved standard any more, and a major version change in one DM
would pull all other DMs after it -- which nobody would dare start.

The TargetPosition example above is one where we probably have to
reference complex objects.  For cube, I *think* it would just be a
convenience, and for a convenience I'd rather not pay the price of
tight coupling.

So much for today -- Mark, I've not missed your mail, but I've not
found time for a thorough response today.

Cheers,

           Markus