Time Series Cube DM - IVOA Note

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Wed Mar 22 14:20:16 CET 2017


Hi Mark, hi DM

On Tue, Mar 21, 2017 at 11:39:35AM -0400, CresitelloDittmar, Mark wrote:
> On Tue, Mar 21, 2017 at 9:35 AM, Markus Demleitner <
> msdemlei at ari.uni-heidelberg.de> wrote:
> > On the validation: What's actually relevant to a given client is that
> > a given annotation is what it expects, e.g., frame metadata for the
> > merge component I have imagined in the use case in the cited mail.
> > For the merge component, an NDCube annotation is unimportant, as is
> > the Dataset annotation; when there's good STC annotation, it is good
> > to go.
> >
> 
> > Now, having one big data model you're validating against would mean
> > that a dataset can be invalid although the STC annotation is
> > perfectly good.  The hypothetical component merging time series with
> > different time scales would simply work although it's not a
> > "DataProduct" in your sense.  If it asked a validator, the validator
> > would say: "No, this dataset is broken, keep your fingers off".  So,
> > the validator isn't useful to the merge component, and that would be
> > a pity.
> >
> 
> I consider the validation requirement a pretty important one..
>   * an application like IRIS to verify that the product being read is
> compatible with the code expectations
>   * folks like 'Operations' to check that a data provider is producing what
> they say they are

Both are possible, I would even say facilitated, with "non-God"
anntotations. IRIS would say "for operation A, I need a valid
Dataset-1 annotation, for operation B, I need valid STC-1 or STC-2 as
well as Photometry". The cool thing about it is that even with a
partially broken (or, perhaps, incompatibly newer) annotation,
whatever it can reliably do it can still do, perhaps greying out menu
entries for operations that require unavailable annotation.

With a "God annotation", it would have to, in such a situation,
entirely reject the dataset even though it could do quite a bit with
it based on the annotation it understands.  Clients tend to try to
avoid that situation and will try to hack around it, which kind of
defeats the purpose of validation in the first place.

> IMO, there should be a concept of 'this is a valid Spectrum instance'.

I'm not strictly arguing against saying in an appropriate place
(e.g., a DAL protocol, an endorsed note, or perhaps even a short REC
containing mainly examples): "A fully-compliant spectrum-1 instance has to
have valid annotations for STC-2 (spectral axis; for the record, I still
think the spectral axis should have a separate DM), photometry-1 (flux
axis), Dataset-1, and NDCube (axis organisation).  For observed spectra,
additionally, the Dataset-1 target position annotation and an
Observation-1 annotation".

Note that the last clause also provides a nice solution to what were
discussing back in the Spectrum DM 2.0 days, where SDM made the
target position required and it was unclear what to do with
theoretical spectra.

Let me mention, though, that I have reservations against making such
requirements in DAL protocols; again, I'd like to point to the SCS
experience, where a similar requirement (responses have to be VOTable
1.1) has turned out to be a major implementation liability without
providing any actual benefit -- essentially all current SCS clients
would actually benefit if services were allowed to respond with less
outdated VOTables.

> > What I'm trying to sell is the concept that you validate *individual*
> > annotations.  Based on this, clients can fairly reliably figure out
> > whether or not they'll work.  For instance, something that has valid
> > NDCube annotation can be used by a cube plotter even if it has
> > missing or bad STC annotation.
> 
> 
> I know this is just an example.. but how could a plotter work without valid
> Coordinate (valid+error) annotation, which is not in cube?

Well, think of a visualisation component that lets you do volume
plotting, perhaps fly through the data, and so forth.  Such a thing
doesn't need any axes labels or errors, or even an idea what all the
numbers mean as long as it can come up with the (x, y, z, value)
tuples (i.e., NDCube annotation as proposed a few mails up).

> > Consider, for instance, a dataset that has an annotation
> >
> >   NDCube-1
> >     independent_axes: dateObs
> >     dependent_axes: whatever
> >
> >   STC-1
> >     Frame
> >       TT
> >       BARYCENTER
> >     value: dateObs
> >
> >   STC-2
> >     CooClass
> >       Time
> >     Frame
> >       timeScale TT
> >       IncompatibleNiftyThing HighMagic
> >     value: dateObs
> >
> > With this annotation, all clients knowing NDCube-1 and *either* of
> > STC-1 and STC-2 have a complete annotation.
> >
> 
> I can see that there would be value in being able to do this.
> My objection is simply that to enable this means changing the vo-dml
> standard, which would be a huge hit at this point.
> 
> Here, dateObs is, presumably a set of Time Coordinates..
>   by vo-dml, the role independent_axes must have a type.  If that type is
> not defined in the same model itself, it is

No, that is my main point:  dateObs is a reference to what I've
called native entity a mail back.  In VOTable, that would typically
be a FIELD or PARAM, in FITS perhaps an axis or possibly a header
field (for FITS, clearly the ways to reference these things are still
to be worked out -- a.k.a. FITS mapping document).

To me, the annotation of these native entities is what this whole
effort is about. And another way of putting that main point is that
if  we can directly annotate them in our models, that's highly
preferable to putting in intermediaries from other DMs (in this case,
a Coordinates annotation).

I'll readily admit that that's not always possible.  I've shown
Dataset.targetPosition as a tentative example for that a couple of
mails back; it used a GROUPref to an stc:whatever-typed group.

Now that I re-think this case, new DALI facilities would even
alleviate that need.   Applying my own golden rule that particular
annotation could be re-written to use references to native entities
like this:

<GROUP vodml-type="ds:Dataset">
  ...
  <PARAMref vodml-role="targetPosition" ref="targetpos"/>
</GROUP>

<GROUP vodml-type="stc:Position">
  <PARAMref vodml-role="value" ref="targetpos"/>
  ...Frame annotation...
</GROUP>

<GROUP vodml-type="ivoa:Quantity">
  ...if desired, information on errors and such...
  <PARAMref vodml-role="value" ref="targetpos"/>
</GROUP>

<PARAM id="targetpos" arraysize="2" xtype="POINT" value="34.0 70.3"/>

I'm not saying I'd advocate this pattern at this point, but now that
we have half-native geometries in VOTable thanks to DALI 1.1, it
would definitely be great if these could be annotated with STC in
a natural way.


But again, no, I'm *not* advocating untyped references to VO-DML
objects, nor do I think "magically-typed" references ("use this
object as a stand-in for what you'd rather have, regardless of what
it is") are possible as long as computers are as dumb as they still
are.

So: No, I'm not advocating any change in VO-DML at this point (except
the stuff on registration and URL mapping over in some other thread,
that is).

> > What I'm still unsure about: is there any reason beside the
> > "one-stop" validation for why DataProduct needs to worry about the
> > details of the axes (i.e., "physics" as covered by models like STC,
> > Photometry, and possibly many others) rather than just "This axis
> > value is in this column".  If there is, what is it?  If there's not,
> > I think the whole complication of having to work out ownership
> > relationships would go away (and this point 2 from the bottom of your
> > mail -- one less issue to solve is always a good thing, no?).
> >
> 
> It doesn't worry about them.  It points to a generic base for the
> detailed types.  Any implementation of that type can be used.  By
> linking it to a base, it lets applications know that there are
> certain elements one can always expect to have available.  If I

But can't they find that out by simply checking for the presence of
annotation they need, rather than linking that existence to the
existence of lots of other, possibly unrelated annotation?

> The ownership relations are there for various applications which
> implement the model.  When implementing a library, I would want to
> know when it is safe to free the memory space for particular
> elements.  I think this is most true for database applications, but
> that is outside my wheelhouse.

Hm -- I think memory management, whether reference counting as
proposed here, a conventional garbage collector, or whatever else is
so far removed from data modelling that I doubt the concept of
ownerwhip in data model elements will actually help implementations
in memory management.

I could see the automatic induction of foreign keys in relational
models as a good use case for where the notion of "ownership" would
be important.  That's pretty certainly somewhat harder in a world of
co-references rather than direct "table-typed" references.  I'd have
to think about this a  bit more, ideally on the basis of a
sufficently complex dataset.  There's a relational mapper in Gerard's
VO-DML tools IIRC -- perhaps one could use that to figure out if the
co-reference model is a major problem for ORMs?

> > > >> If I have a 3D cartesian Space, with coordinate axes x,y,z.. there is
> > 1
> > > >> DataAxis referring to a Position3D in that space.
> >
> > Uh -- that sounds... dangerous.  In the spirit of my preference to
> > ideally reference native entities (i.e., FIELDs here): How does this
> > DataAxis grouping help a client?  What is it supposed to do with it?
> > How does the grouping help it over just having three axis (that, of
> > course, might still be related through one or more separate STC
> > annotations, but I'd like that to be uncorrelated if at all
> > possible).
> >
> >
> 1 FIELD -> 1 DataAxis (Coordinate) works fine only for the simplest
> case (1D value with no errors).  For the 2D/3D cases, the errors
> may be correlated, so the bundle of FIELDs for the value must be
> grouped above the errors.  And then there is the errors...  A '2D
> Coordinate' with 2 sources of error, both symmetric.. would have 4
> FIELDs feeding the DataAxis/Coordinate content  ( x, y, xy_staterr,
> xy_ranerr ).

Yes, correlations are a bit of a challenge, but you'll have
correlated errors in all kinds of contexts, so I'd argue that's for
quantity again, not for coordinates themselves.  Since doing this
kind of advanced annotation right is difficult, my preference would
be to postpone their definition until we have some experience with
the simple cases.  You can compatibly add things later, but taking
them away will probably mean a new major version, which would be
painful even without lots of links between DMs.

Having said that, I think these correlated errors are a case where DM
objects need to be referenced; but that would be references *within*
(hopefully) quantity, not between different DMs.

I admit that I don't really look forward to defining quantity (i.e.,
modelling errors, ranges, and that kind of thing) either, but I
maintain it's much better to do this once properly (and then
carefully evolve it in this single place) than to solve these
analogous problems in each DM and/or domain separately.

      -- Markus


More information about the dm mailing list