Time Series Cube DM - IVOA Note

Mon Mar 27 19:55:57 CEST 2017

Dear DM,

I agree with Omar that concrete serialisations is what we should now
look at; let's do some!

Meanwhile, a couple of more abstract points might still merit a bit
of thought.

On Wed, Mar 22, 2017 at 10:49:51AM -0400, Laurino, Omar wrote:
> There is no God model in VO-DML. If anything, exactly the opposite, you
> have building blocks that interoperate nicely with each other, as long as

Not in VO-DML as such, but you can build large and complex data
models covering lots of different and not overly tightly correlated
parts of the world in VO-DML. And though importing VO-DML is a lot
better and maintainable than the opaque utype manipulations of
yonder day, I think we are basically in agreement that it's better if we
still keep any individual data model as small and tidy as we possibly
can.

> What you can't avoid in some cases is to tightly couple two models, the
> same way you tightly couple, just to make an example, an application to

Yes.  Quantity and STC are two that need to be referenced a lot.  But
that referencing is a liability, dictated by the fact that values and
errors come in all kinds of contexts, many far removed from anything
space and time. But even worse than references into data models are
(at some point certainly incompatible) parallel implementations, so
clearly these need referencing.

That is also why I think the equation coordinate = quantity that
Mark makes in a parallel branch of this thread means that this part
of the model needs to move "up"; we'll have this pattern for
temperatures and photometry as well (and whatever else), and their
errors will be correlated as well, and people will want to represent
their statistcal properties as well, and perhaps correlate value and
derivative, and so on.  It would be shame if STC and photometry, say,
had different models for their, well, quantities.

> some version(s) of Python, working around any incompatible differences. Can
> you assure me that the current DaCHS release will be compatible with all
> the versions of Python from now to eternity? Did everybody's JAVA code
> evolve seamlessly when commons-lang3 was releases as a major, incompatible
> replacement for commons-lang?

No, but things become horrible if I not only depend on a small subset
of python versions, but also on a small subset of versions of
openssl, node.js, systemd, a2ps, latex, GNU make, TOPCAT, apache,
gtk, and cairo *at the same time* and *altogether*.  This is when
people start to use single Docker containers containing different,
incompatibly broken, versions of all the various dependencies, which
in this case produces the unmaintainable and unpredictably exploitable
mess we're seeing out there.

Our standards live a lot longer than the last new-fangled
distributed peer-to-peer NoSQL social blockchain web glitz.  And
therefore for us dependencies are even more expensive.

> should be reduced as much as possible. Decoupling DatasetDM and CubeDM, for
> instance, might work pretty well. I am skeptical that you can do the same

Right, let's do that, then.

> the time. Major releases should be baked only when really necessary to add
> new value (e.g. STC2) and new models should be baked only to cover new
> domains (Source, Cube) or to provide site-specific extensions to baseline,

Well, yes.  But of course when someone does a major version of a
model you depend on, you'll have to issue a new major version of your
own data model (and break *your* clients in turn) where otherwise you
could probably have just left your DM alone.  So, my urging is
exactly to avoid having to build a lot of major DM versions 
just because a new major version was necessary for one DM (ivoa,
quantity, and STC excepted).

> But I don't think your point is valid to begin with: we now have a
> framework that allows us to swiftly and neatly produce new models, at least

I'm not talking about swiftly -- but the stuff we're writing now
should still work fine 10 years from now and should have a fair
chance of working 20 years from now.  That's why I think we always
have to have evolvabilty in our view.

> By the way, I agree we need a model for complex quantities. Inside or
> outside STC I don't think I care too much, as long as they are defined
> independently of specific domains for a broad range of usages.

Because it'll be used in essentially all data models (and, of course,
standalone quite a bit), and because I believe that one will see
quite a few minor versions (getting this right is, I think, the Holy
Grail of data modeling), I'd much rather see an extra DM just devoted
to it.

I think pulling out the generic stuff from STC will get us a long way
towards a good start for quantity, and I'm not adverse to specifying
quantity in a REC together with STC.  But I shouldn't need to pull in
STC and its frames (not to mention transformations and geometry) just
to express that I have something with a value, an error, and whatever
else.

Oh, by the way, this *is* something I'd like to change in the VO-DML
spec: I think ivoa:quantity is not very helpful while hogging the term in
a way that might become confusing later.  Is anyone really attached
to it?  Using it already, perhaps?

> The references to your "native" elements don't strike me as particularly
> interoperable, unless they are formalized so they can work with different
> formats and proper mappings. One thing is to say that "the type dali:Point

...but we need to do that formalisation anyway to make VO-DML useful,
no?  And we're already doing it for VOTable, which isn't too shabby:
Once we've figured out how things work there, we'll have a much
easier time with other formats.

> is mapped to a such and such PARAM in VOTable", a different one (that I
> don't like) is to say "there is something over there that I call <point>,
> but you don't necessarily know what it is, except that it might be a DALI
> point, but again, not sure".

As usual, I see that from the perspective of a data provider.  If I
have points or polygons in my database tables, people can select
them.  If they do, I want to tell them what the reference frames of
these geometries are (at the very least).

So, we won't get around this piece of annotation.  If we can do it
without having extra magic in the DM, that'd be great.  I *think* the
VODML mapping is almost up to it.  In classic STC (the 2009 note),
for instance, one could have said (this is a cartoon using newish
serialisation):

  <GROUP vodml-type="stc:Coordinate">
    <PARAMref vodml-role="value" vodml-type="Coord2" ref="pt"/>
  </GROUP>
  <PARAM ID="pt" xtype="POINT" datatype="real" 
    arraysize="2" value="23.3 41"/>

  <GROUP vodml-type="stc:Coordinate">
    <GROUP vodml-role="value" vodml-type="Coord2">
      <PARAMref vodml-role="C1" ref="ra"/>
      <PARAMref vodml-role="C2" ref="dec"/>
    </GROUP>
  </GROUP>
  <PARAM ID="ra" value="23.3"/>
  <PARAM ID="dec" value="41"/>

and effect essentially the same thing in a client.  I don't think
there's much keeping us from doing the same thing with current VO-DML
mapping.

> I believe we agree on the general requirements, though, right? models and
> instances (annotations) should be as loosely coupled as possible.
> Applications that are aware of some model should find the relevant
> annotations in any file, while ignoring all the other models. We should
> have a framework that allows for agile definition of new models or
> revisions of old models, although we won't pop a new model every three
> weeks. And the goal is, above all, interoperability among models,
> applications, and services.

Absolutely.

         -- Markus