Time Series Cube DM - IVOA Note

Tue Mar 21 16:39:35 CET 2017

Markus,

On Tue, Mar 21, 2017 at 9:35 AM, Markus Demleitner <
msdemlei at ari.uni-heidelberg.de> wrote:

> Dear DM,
>
>
> On Mon, Mar 20, 2017 at 03:47:55PM -0400, CresitelloDittmar, Mark wrote:
> > In the cube model, I want to say: "A DataProduct has one or more
> Coordinate
> > system specifications, and the DataProduct owns its instances of
> CoordSys"
>
> I think here we're getting to the bottom of what we're trying to work
> out here: *why* do you want to say this?  What I'm trying to argue in
> my parallel mail
> http://mail.ivoa.net/pipermail/dm/2017-March/005492.html (look for
> "For illustration") is that an object about you'd say such things
> isn't what's actually useful for clients.  These, rather, need
> annotation topical for what they're trying to do (data structure for
> a cube plotter, axis/frame metadata for data merging component,
> dataset metadata for an ingestor or a bibliography component).
>
> The only reason I can see to have a "God Object" that gobbles up all
> these individual annotations could be some sort of validation
> component, as you argue here:
>
> > My impression is not that you object to the items per se, but rather that
> > they are explicitly connected in the model.. that it would be sufficient
> to
> > simply serialize a coordsys instance in my cube, and since CoordSys is a
> > valid, modeled object, that is all I need to do.  If this is so.. what is
> > lost is the ability to validate the data product.  How do I know if the
> > instance has all the expected components?
>
> First, for me, yes it's the coupling of the various models I'm
> worried about.
>
> On the validation: What's actually relevant to a given client is that
> a given annotation is what it expects, e.g., frame metadata for the
> merge component I have imagined in the use case in the cited mail.
> For the merge component, an NDCube annotation is unimportant, as is
> the Dataset annotation; when there's good STC annotation, it is good
> to go.
>

> Now, having one big data model you're validating against would mean
> that a dataset can be invalid although the STC annotation is
> perfectly good.  The hypothetical component merging time series with
> different time scales would simply work although it's not a
> "DataProduct" in your sense.  If it asked a validator, the validator
> would say: "No, this dataset is broken, keep your fingers off".  So,
> the validator isn't useful to the merge component, and that would be
> a pity.
>

I consider the validation requirement a pretty important one..
  * an application like IRIS to verify that the product being read is
compatible with the code expectations
  * folks like 'Operations' to check that a data provider is producing what
they say they are

IMO, there should be a concept of 'this is a valid Spectrum instance'.

For your 'merge component' application would not be checking if the product
was a valid NDCube,
it would be validating against STC.. which would presumably validate all
the STC instances.

>
> What I'm trying to sell is the concept that you validate *individual*
> annotations.  Based on this, clients can fairly reliably figure out
> whether or not they'll work.  For instance, something that has valid
> NDCube annotation can be used by a cube plotter even if it has
> missing or bad STC annotation.

I know this is just an example.. but how could a plotter work without valid
Coordinate (valid+error) annotation, which is not in cube?

> Conversely, regardless of the status
> of the Dataset annotation, a time series merge tool will work just as
> long as at least one STC annotation it understands is valid.
>

> In other words: I'm proposing to abandon the hope that "This dataset
> is valid" will be a statement useful beyond management and
> beancounting.  Instead, I hope we'll see "This dataset has valid
> STC-1, STC-2, photometry-1, Dataset-1, and NDCube-1 annotations",
> which tells concrete software if whatever annotation(s) it needs are
> all right.
>

I think we need more input from the client/Applications side... to me, this
feels like an interoperability nightmare (though I think you argue the
opposite).  An application would need to check that the instance contains
valid annotation for every component that it uses, rather than just knowing
it is OK by seeing it is an NDCube-1 instance.

> [Jiri's plan to reference "good enough" objects]
> > To do what I think you are suggesting, would require a change to the
> VO-DML
> > specification.
>
> Well, it would if we were really after is what Jiri may have hinted
> at in his mail of Mon, 13 Mar 2017 11:14:13 +0100:
>
> ji> model, that means the serialization of my data will change if that
> model
> ji> changes. That doesn't mean, however, that I need to "embed" it into my
> data
> ji> model, my data model is not changing if the on I am dependent on
> changes.
>
> If this means "I reference an object in my DM, and if that object has
> incompatible changes, all remains fine", then I agree VO-DML would
> need to change; I don't think we have the equivalend of void* at this
> point (I think we're all in agreement that minor changes to DMs will
> by definition never break embedding data models, right?).
>
> By just exploiting co-reference, we can, however, avoid these
> potentially model-uprooting cross-model references *and*,
> additionally, gain the flexibility to combine annotations from
> various different annotations.
>
> Consider, for instance, a dataset that has an annotation
>
>   NDCube-1
>     independent_axes: dateObs
>     dependent_axes: whatever
>
>   STC-1
>     Frame
>       TT
>       BARYCENTER
>     value: dateObs
>
>   STC-2
>     CooClass
>       Time
>     Frame
>       timeScale TT
>       IncompatibleNiftyThing HighMagic
>     value: dateObs
>
> With this annotation, all clients knowing NDCube-1 and *either* of
> STC-1 and STC-2 have a complete annotation.
>

I can see that there would be value in being able to do this.
My objection is simply that to enable this means changing the vo-dml
standard, which would be a huge hit at this point.

Here, dateObs is, presumably a set of Time Coordinates..
  by vo-dml, the role independent_axes must have a type.  If that type is
not defined in the same model itself, it is
  identified by the model which does define it ( coords:Coordinate as a
generic base ).  That is a specific major version of
  the coords model with vo-dml/XML documentation.
To be a valid vo-dml model, that linkage must exist.  This is at the model
level.. which then constrains the annotation.

An instance can have this annotation, but they would define independent
instances.  I'm don't understand how the 'co-reference' mechanism works.
How does an application know that they are the same thing? (question
repeated from earlier msg, so I'll pause there)

>
> Were dependent_axes to reference either the STC-1 or the STC-2
> annotation rather than directly dateObs, a client implementing
> NDCube-1 would be tightly bound to know whatever STC version is
> "baked into" NDCube.
>
> If you've ever implemented against our current SCS standard and
> cursed because you have to write ancient VOTable 1.1 you'll have an
> idea why I'm howling when contemplating such a practice.
>
>
> > It boils down to a collection of Coordinate-s, the Coordinate has
> reference
> > back to the Frame/Axis metadata.
>
> For the record, I believe the Frame metadata should be embedded and
> not referenced, but that's mainly for ease of implementation.
>
> The central point where we appear to differ that I am convinced we
> should try hard to make it a collection of native entities (in VOTable:
> FIELDs or PARAMs; FITS axes would be another example) that receive
> the Axis annotations from other annotations.
>
>
> > >> The premise is that a DataProduct should OWN all of its
> coordinates/data.
> > >> The vo-dml rules for composition state that a class/object may not be
> in
> > >> more than one composition relation.
>
> -- which only applies to annotations, not to the annotated naive
> entities themselves.  A VOTable FIELD can certainly have multiple
> annotations, and there's no concept of ownership there.
>
> > >> Since there are multiple types of Data Axis types, I modeled it this
> > >> way.. where the DataProduct owns ALL its data (Observables), and the
> data
> > >> axis types (DataAxis, DependentAxis) are organizational objects which
> refer
> > >> to the instances of the same axis.
> > >>
> > >> This could be organized differently.. having the Observables owned by
> the
> > >> DataAxis (which is directly or indirectly owned by the DataProduct),
> and
> > >> extend that for various types of axis.. adding constraints as
> needed.  The
>
> What I'm still unsure about: is there any reason beside the
> "one-stop" validation for why DataProduct needs to worry about the
> details of the axes (i.e., "physics" as covered by models like STC,
> Photometry, and possibly many others) rather than just "This axis
> value is in this column".  If there is, what is it?  If there's not,
> I think the whole complication of having to work out ownership
> relationships would go away (and this point 2 from the bottom of your
> mail -- one less issue to solve is always a good thing, no?).
>

It doesn't worry about them.  It points to a generic base for the detailed
types.
Any implementation of that type can be used.  By linking it to a base, it
lets
applications know that there are certain elements one can always expect to
have
available.  If I know the value is a coords:Coordinate, then I can expect a
certain
structure and some content, even if I don't know the specific coordinate
flavor.
eg: I get a Flux Coordinate, which is not one of the domains I understand,
I can
still use it to a high degree for various applications.

The ownership relations are there for various applications which implement
the model.
When implementing a library, I would want to know when it is safe to free
the memory space
for particular elements.  I think this is most true for database
applications, but that is
outside my wheelhouse.

>
> > >> I want to note one distinction.  The DataAxis here, is NOT the same
> as a
> > >> coordinate space axis.
> > >> If I have a 3D cartesian Space, with coordinate axes x,y,z.. there is
> 1
> > >> DataAxis referring to a Position3D in that space.
>
> Uh -- that sounds... dangerous.  In the spirit of my preference to
> ideally reference native entities (i.e., FIELDs here): How does this
> DataAxis grouping help a client?  What is it supposed to do with it?
> How does the grouping help it over just having three axis (that, of
> course, might still be related through one or more separate STC
> annotations, but I'd like that to be uncorrelated if at all
> possible).
>
>
1 FIELD -> 1 DataAxis (Coordinate) works fine only for the simplest case
(1D value with no errors).
For the 2D/3D cases, the errors may be correlated, so the bundle of FIELDs
for
the value must be grouped above the errors.  And then there is the errors...
A '2D Coordinate' with 2 sources of error, both symmetric.. would have 4
FIELDs
feeding the DataAxis/Coordinate content  ( x, y, xy_staterr, xy_ranerr ).

> > >> So, I see we have 2 points of discussion for the cube model itself
> > >>   1) relation between Dataset and DataProduct
> > >>       Currently modeled as according to Section 3.. extend Dataset add
> > >> reference to DataProduct == MyDataset
> > >>
> > >>       Alternates include:
> > >>         a) loose coupling
> > >>             verbal statement that MyDataset includes an instance of
> > >> Dataset + instance of MyDataProduct
> > >>         b) referenced coupling
> > >>             MyDataSet == reference to Dataset + reference to
> MyDataProduct
> > >>             (allows validators to know what is expected, but allows
> > >> flexibility w.r.t. Dataset flavor )
> > >>
> > >>      I personally think a) is too loose, but b) might be a good way to
> > >> go..
>
> But why couple it at all?  There are prefectly valid use cases where
> you want Dataset without NDCube and where you want NDCube without
> Dataset; to me, that's a clear indication that they should live next
> to each other, both being first class citizens that can be validated
> independently of each other.
>
> Cheers,
>
>              Markus
>
> [who's aware there's still another unanswered message -- sorry]
>

Looking forward to it.. Cheers,
mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20170321/ee91b833/attachment-0001.html>