Time Series Cube DM - IVOA Note

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Mon Feb 20 10:09:11 CET 2017

Hi Mark, dear DM community

While I don't care too much about the particulars of the time series,
I do care about consistency, so...

On Thu, Feb 16, 2017 at 01:16:58PM -0500, CresitelloDittmar, Mark wrote:
> science cases) and am pleased to
> see that there is a high degree of compatibility with the Cube
> representation.  So much so that I'm
> confused about comments/objections regarding the cube model dependencies.

... that's a relief.  The other thing, however, I really care about
is DM independence, and there I beleive there's not much
compatibility yet.  

My basic premise is that as soon as DMs import one another (excepting
small "utility DMs" like ivoa in the VO-DML proposal), we quickly
have a system in which you can't change a thing without bringing down
everything.  Much like four points connected to each other to form a
rigid pyramid in which you can't move a vertex without moving at
least two others.

And hence let me try once more to explain why I think we're now in a
position to stop "cross-importing" datasets (as NDCube still does)
and migrate to a nice model of smallish, specialised, independently
developable DMs.

>    + Section 3.1
>       "because we use them as black boxes, it does not matter if these data
> models change.."
>       I'm not sure this is true.. or if it is, it is due to glossing over
>       some details.
>         o it looks like you're saying that a Time Series instance has
>            ObsDataset metadata as
>            described in 'the Dataset model'.. presumably any version thereof

Exactly: any version thereof.  Or, if necessary, multiple versions
thereof.  Let's take DatasetDM as an example.  Schematically, the
annotation would be something like

  creatorDID: ivo://foo.bar/quux?1
  dataProductType: timeseries

Note that this works independently of all other annotation.  A
document with such a thing, by this annotation, *is an* instance of
Dataset.  That it's also an instance of NDCube (say) is irrelevant
for this fact (and conversely, NDCube, as a way to reference array
axes and/or table columns and label them as independent and dependent
axes, doesn't need to know anything about that).

The great thing is that, when someone would like to make it easier
for datacite to process our datasets, they can easily drop in
*another* type/representation of dataset metadata, say

    creatorName: Smith, John
    nameIdentifier: 1000.123558
  ResourceType: Dataset/timeseries

This way *both* VO clients understanding ds:Dataset and whatever
software understands the DataCite annotation will be able to figure
out their particular dataset metadata, independent of whether they
have any business with NDCube or anything else.  If we were to phase
out ds:Dataset in favour of dcds:Dataset, NDCube could stay just what
it was.

>         o but you show the model import.. the vo-dml model import includes
>           the URL to a specific version

Yes, and as said above you could even have several versions of the
same DM; that, however, should only be necessary (or even allowed)
for major versions.  But then, this trick allows a smooth
transisition between old and new clients by temporarily embedding
annotations for both versions.  Continuing the example from above,
you'd then also put in

  dataProductType: timeseries
  semanticVOURL: http://foo.bar/magic/understand-it-all/1238

-- again without having to change either annotation or clients as
regards NDCube.

>       It seems important for interoperability to have the explicit
>       relations so that V2.0 of models can go through a vetting
>       process before being acceptable/useable in subsequent models
>       which use the V1.0.

...but all this is ONLY possible if DMs don't directly reference one
another.  If, for instance, CubeDM had an explicit reference to some
variant of DatasetDM (or would even embed it), then a service will be
stuck with having to provide ds:Dataset annotation even if there's
been ds2:Dataset for ages ONLY to stuff it where NDCube wants to see

With physics (STC, photometry), this kind of thing is even more
important.  Here, I argue, NDCube should just say things like:

  pixelAxis: [<reference to date_obs>]
  dependentAxes: [<reference to mag_g> <reference to ra> 
    <reference to dec>]

That's it.  The actual metadata of date_obs, say, would be in a
different group, understandable by any STC client, whether or not it
understands anything about NDImage:

      value: <reference to date_obs>
          timeScale: TT
      c1: <reference to ra>
      c2: <reference to dec>

Note how the STC annotation isn't concerned about how NDCube sorts
axes into dependent and independent and how that's not even relevant
to a client that just needs the STC part; and note how a naive cube
client can disregard STC metadata and yet figure out how to make a
plot from the cube (say).

Still, a client that knows both STC and NDCube will figure out that
the column date_obs is both the independentAxis of a cube, and that
it's a time with a TT timescale, etc, so by coreference it's able to
figure out the *actual* (not formal) relationships between the
various model's roles.

As STC evolves and multiple clients understanding different versions
are around, there's again no problem keeping the same NDCube annotation
while providing potentially multiple STC annotations for the
hopefully increasingly sophisticated STC clients we might be seeing
in the future.

Sorry for this somewhat verbose marketing gig, but since I believe
this is the centerpiece of what I've been hoping for as the VO-DML
dividend I thought I might beg forgiveness for overstraining your time

       -- Markus

More information about the dm mailing list