Entangled Models [Was: MCT - model document delivery.]

Tue Sep 29 12:53:27 CEST 2020

Dear François,

On Wed, Sep 23, 2020 at 03:16:34PM +0200, François Bonnarel wrote:
> > I'd still say the options I showed the other day would let us
> > disentangle the data models even in this stuation (and we can reflect
> > that on the model level, too),
> 
> As far as I understood in your serialization , for a given FIELD the client
> has to check all the "annotation features" refering to it, which may become
> complex and does not provide a unified view of the succession  of intervals
> (exempli gratia)

If you could make that example a bit more explicit, it would perhaps
help me understand your -- and Mark's, in his mail from Sep 22:

> This seems really inefficient to be doing this for every complex object in
> the serialization.

-- concerns.

Meanwhile, let me try to explain why I believe you needn't be
concerned.

You see, what I believe a DM API of a VOTable library would offer is
essentially two functions:

get_instances(type_name) -- iterating over all instances (i.e.,
  "templates", if you will) of type_name; this would give you objects
  representing phot:PhotPoints, or coord:Positions, or
  meas:Measurements, or whatever, in the VOTable, possibly to be
  expanded into sequences from table rows, if there are FIELD
  references in there.

get_annotations(vot_object) -- iterating over all instances that a
  FIELD, PARAM, or GROUP is part of.  So, if you called
  get_annotations on your ra column, you might see it is in
  a coord:Position (as the latitude), in a meas:Measurements (as the
  expectation in a distribution), and perhaps in a ds:Target (as one
  of the objects making up the location), and perhaps in a
  coord2:Position (as, say, the first element of the location vector).

Based on what I've had to pull out of data models (or their
stand-ins), I believe this lets clients work fairly neatly. Use cases
I'm thinking of:

* Figure out value and error when plotting
* Find out the proper motions and epoch to use when
  epoch-transforming positions
* Find the photometry points in a time series
* Figure out the axis metadata on a cube.

In case you're curious how I think a client would use this API (and,
in particular, disentangled DMs) in any of these cases, feel free to
say so.

> > Does this mean you are arguing that we will just have to deal with
> > the fact that we can never have a new major version on the
> > fundamental data models without pushing up everything else up another
> > major version, they way things ended up in Registry (see my other
> > mail)?
> 
> For the model itself and its vodml derscription : yes.

Let me ask back here (because I find that consequence rather
unwelcome): Say we have the following dependency graph on the DMs:

coord - meas - dataset - cube
           `---- phot ----'

(indulge me if this isn't quite what's in the pipeline right now),
Are you really advocating that once we make coord2, we will have
to have meas2, dataset2, cube2, and phot2 as well, even if nothing in
them changes?

And if we need phot2, we will have to have cube2, that then
references the existing coord, meas, and dataset instances directly?

Mind you, I give you there are siutations in which you have to
entangle DMs.  What I'm saying is that we should rather view such
situations as indicators that refactoring things with a goal to
disentangle the DMs might be advantageous.  And anyway, we shouldn't
view that as the stated goal of our work.

> if we fully serialize instances form scratch and populate them from some
> data processing, each version of the model would have to be reflected in a
> new serialisation. But this would be the case for any kind of data model
> also outside IVOA or astronomy.

True, because this kind of thing cannot (or so I think) be fixed in
the serialisation.  It needs to be fixed in the DMs' architectures --
which is the point I've been trying to make with respect to
measurement.  Of course, having a serialisation defined would make
this discussion a lot more concrete, which would hopefully give us
more eyeballs -- and that's why I've been pushing for a serious
proposal for the mapping syntax so much.

But the fundamental question of DM design -- entangled or isolated --
is orthogonal.  And perhaps we can make this discussion a bit more
concrete again if we try to work out where entangling coords and
measurements gives a benefit proportional to the cost involved.

         -- Markus