Time Series Cube DM - IVOA Note

Tue Mar 21 14:35:08 CET 2017

Dear DM,

On Mon, Mar 20, 2017 at 03:47:55PM -0400, CresitelloDittmar, Mark wrote:
> In the cube model, I want to say: "A DataProduct has one or more Coordinate
> system specifications, and the DataProduct owns its instances of CoordSys"

I think here we're getting to the bottom of what we're trying to work
out here: *why* do you want to say this?  What I'm trying to argue in
my parallel mail
http://mail.ivoa.net/pipermail/dm/2017-March/005492.html (look for
"For illustration") is that an object about you'd say such things
isn't what's actually useful for clients.  These, rather, need
annotation topical for what they're trying to do (data structure for
a cube plotter, axis/frame metadata for data merging component,
dataset metadata for an ingestor or a bibliography component).

The only reason I can see to have a "God Object" that gobbles up all
these individual annotations could be some sort of validation
component, as you argue here:

> My impression is not that you object to the items per se, but rather that
> they are explicitly connected in the model.. that it would be sufficient to
> simply serialize a coordsys instance in my cube, and since CoordSys is a
> valid, modeled object, that is all I need to do.  If this is so.. what is
> lost is the ability to validate the data product.  How do I know if the
> instance has all the expected components?

First, for me, yes it's the coupling of the various models I'm
worried about.

On the validation: What's actually relevant to a given client is that
a given annotation is what it expects, e.g., frame metadata for the
merge component I have imagined in the use case in the cited mail.
For the merge component, an NDCube annotation is unimportant, as is
the Dataset annotation; when there's good STC annotation, it is good
to go.

Now, having one big data model you're validating against would mean
that a dataset can be invalid although the STC annotation is
perfectly good.  The hypothetical component merging time series with
different time scales would simply work although it's not a
"DataProduct" in your sense.  If it asked a validator, the validator
would say: "No, this dataset is broken, keep your fingers off".  So,
the validator isn't useful to the merge component, and that would be
a pity.

What I'm trying to sell is the concept that you validate *individual*
annotations.  Based on this, clients can fairly reliably figure out
whether or not they'll work.  For instance, something that has valid
NDCube annotation can be used by a cube plotter even if it has
missing or bad STC annotation.  Conversely, regardless of the status
of the Dataset annotation, a time series merge tool will work just as
long as at least one STC annotation it understands is valid.

In other words: I'm proposing to abandon the hope that "This dataset
is valid" will be a statement useful beyond management and
beancounting.  Instead, I hope we'll see "This dataset has valid
STC-1, STC-2, photometry-1, Dataset-1, and NDCube-1 annotations",
which tells concrete software if whatever annotation(s) it needs are
all right.

[Jiri's plan to reference "good enough" objects]
> To do what I think you are suggesting, would require a change to the VO-DML
> specification.

Well, it would if we were really after is what Jiri may have hinted
at in his mail of Mon, 13 Mar 2017 11:14:13 +0100:

ji> model, that means the serialization of my data will change if that model
ji> changes. That doesn't mean, however, that I need to "embed" it into my data
ji> model, my data model is not changing if the on I am dependent on changes.

If this means "I reference an object in my DM, and if that object has
incompatible changes, all remains fine", then I agree VO-DML would
need to change; I don't think we have the equivalend of void* at this
point (I think we're all in agreement that minor changes to DMs will
by definition never break embedding data models, right?).

By just exploiting co-reference, we can, however, avoid these
potentially model-uprooting cross-model references *and*,
additionally, gain the flexibility to combine annotations from
various different annotations.

Consider, for instance, a dataset that has an annotation

  NDCube-1
    independent_axes: dateObs
    dependent_axes: whatever

  STC-1
    Frame
      TT
      BARYCENTER
    value: dateObs

  STC-2
    CooClass 
      Time
    Frame
      timeScale TT
      IncompatibleNiftyThing HighMagic
    value: dateObs

With this annotation, all clients knowing NDCube-1 and *either* of
STC-1 and STC-2 have a complete annotation.

Were dependent_axes to reference either the STC-1 or the STC-2
annotation rather than directly dateObs, a client implementing
NDCube-1 would be tightly bound to know whatever STC version is
"baked into" NDCube.

If you've ever implemented against our current SCS standard and
cursed because you have to write ancient VOTable 1.1 you'll have an
idea why I'm howling when contemplating such a practice.

> It boils down to a collection of Coordinate-s, the Coordinate has reference
> back to the Frame/Axis metadata.

For the record, I believe the Frame metadata should be embedded and
not referenced, but that's mainly for ease of implementation.  

The central point where we appear to differ that I am convinced we
should try hard to make it a collection of native entities (in VOTable:
FIELDs or PARAMs; FITS axes would be another example) that receive
the Axis annotations from other annotations.

> >> The premise is that a DataProduct should OWN all of its coordinates/data.
> >> The vo-dml rules for composition state that a class/object may not be in
> >> more than one composition relation.

-- which only applies to annotations, not to the annotated naive
entities themselves.  A VOTable FIELD can certainly have multiple
annotations, and there's no concept of ownership there.

> >> Since there are multiple types of Data Axis types, I modeled it this
> >> way.. where the DataProduct owns ALL its data (Observables), and the data
> >> axis types (DataAxis, DependentAxis) are organizational objects which refer
> >> to the instances of the same axis.
> >>
> >> This could be organized differently.. having the Observables owned by the
> >> DataAxis (which is directly or indirectly owned by the DataProduct), and
> >> extend that for various types of axis.. adding constraints as needed.  The

What I'm still unsure about: is there any reason beside the
"one-stop" validation for why DataProduct needs to worry about the
details of the axes (i.e., "physics" as covered by models like STC,
Photometry, and possibly many others) rather than just "This axis
value is in this column".  If there is, what is it?  If there's not,
I think the whole complication of having to work out ownership
relationships would go away (and this point 2 from the bottom of your
mail -- one less issue to solve is always a good thing, no?).

> >> I want to note one distinction.  The DataAxis here, is NOT the same as a
> >> coordinate space axis.
> >> If I have a 3D cartesian Space, with coordinate axes x,y,z.. there is 1
> >> DataAxis referring to a Position3D in that space.

Uh -- that sounds... dangerous.  In the spirit of my preference to
ideally reference native entities (i.e., FIELDs here): How does this
DataAxis grouping help a client?  What is it supposed to do with it?
How does the grouping help it over just having three axis (that, of
course, might still be related through one or more separate STC
annotations, but I'd like that to be uncorrelated if at all
possible).

> >> So, I see we have 2 points of discussion for the cube model itself
> >>   1) relation between Dataset and DataProduct
> >>       Currently modeled as according to Section 3.. extend Dataset add
> >> reference to DataProduct == MyDataset
> >>
> >>       Alternates include:
> >>         a) loose coupling
> >>             verbal statement that MyDataset includes an instance of
> >> Dataset + instance of MyDataProduct
> >>         b) referenced coupling
> >>             MyDataSet == reference to Dataset + reference to MyDataProduct
> >>             (allows validators to know what is expected, but allows
> >> flexibility w.r.t. Dataset flavor )
> >>
> >>      I personally think a) is too loose, but b) might be a good way to
> >> go..

But why couple it at all?  There are prefectly valid use cases where
you want Dataset without NDCube and where you want NDCube without
Dataset; to me, that's a clear indication that they should live next
to each other, both being first class citizens that can be validated
independently of each other.

Cheers,

             Markus

[who's aware there's still another unanswered message -- sorry]