Entangled Models [Was: MCT - model document delivery.]

Wed Oct 7 00:13:32 CEST 2020

Dear Markus,

Sorry to answer late.

Le 29/09/2020 à 12:53, Markus Demleitner a écrit :
> Dear François,
>
> On Wed, Sep 23, 2020 at 03:16:34PM +0200, François Bonnarel wrote:
>>> I'd still say the options I showed the other day would let us
>>> disentangle the data models even in this stuation (and we can reflect
>>> that on the model level, too),
>> As far as I understood in your serialization , for a given FIELD the client
>> has to check all the "annotation features" refering to it, which may become
>> complex and does not provide a unified view of the succession  of intervals
>> (exempli gratia)
> If you could make that example a bit more explicit, it would perhaps
> help me understand

My initial example On September the 22nd was a supposed extension of 
ObsCore with time and spectral support :

>      But  to go a little further if we want to introduce a time or 
> spectral "support" made of a list of intervals beside the 
> coarse-grained "bounds" : em_min/em_max, t_min/t_max we will end with 
> a lot of spectral coordinates and time slots which only a structured 
> datamodel can allow to describe.
Here the modelling view (independant of serialization or format) is 
definitely top down. The spectral or time characterization is made of 
location, bounds, support. Bounds or support are made of intervals of 
... spectral (or time) coordinates.

Each of these high levels view is decomposed in smaller elements , 
references, etc...

I don't think it's possible to make consistent modelization by starting 
from a thing at the bottom and adressing various models elements 
independantly.

I cannot imagine how we could distingush an interval of coordinates 
which is a "bounds" from one which is part of a "support".

>   how we coukd distinguish  your -- and Mark's, in his mail from Sep 22:
>
>> This seems really inefficient to be doing this for every complex object in
>> the serialization.
> -- concerns.
>
> Meanwhile, let me try to explain why I believe you needn't be
> concerned.
>
> You see, what I believe a DM API of a VOTable library would offer is
> essentially two functions:
>
> get_instances(type_name) -- iterating over all instances (i.e.,
>    "templates", if you will) of type_name; this would give you objects
>    representing phot:PhotPoints, or coord:Positions, or
>    meas:Measurements, or whatever, in the VOTable, possibly to be
>    expanded into sequences from table rows, if there are FIELD
>    references in there.
>
> get_annotations(vot_object) -- iterating over all instances that a
>    FIELD, PARAM, or GROUP is part of.  So, if you called
>    get_annotations on your ra column, you might see it is in
>    a coord:Position (as the latitude), in a meas:Measurements (as the
>    expectation in a distribution), and perhaps in a ds:Target (as one
>    of the objects making up the location), and perhaps in a
>    coord2:Position (as, say, the first element of the location vector).

It may work. But if we do so we don't give an accurate definition of 
what is a "Target" in astronomy

Does it have to contain a position, does it have to contain a name ? an 
object type ?

I'm confident that this could lead rapidly to inconistences.

Really What we want to do with the modelling is providing organized view 
on top of our data

Concepts exists and encompass other concepts in a complex relationship.

That's what we want the model to tell us.

Cheers

François

>
> Based on what I've had to pull out of data models (or their
> stand-ins), I believe this lets clients work fairly neatly. Use cases
> I'm thinking of:
>
> * Figure out value and error when plotting
> * Find out the proper motions and epoch to use when
>    epoch-transforming positions
> * Find the photometry points in a time series
> * Figure out the axis metadata on a cube.
>
> In case you're curious how I think a client would use this API (and,
> in particular, disentangled DMs) in any of these cases, feel free to
> say so.
>
>>> Does this mean you are arguing that we will just have to deal with
>>> the fact that we can never have a new major version on the
>>> fundamental data models without pushing up everything else up another
>>> major version, they way things ended up in Registry (see my other
>>> mail)?
>> For the model itself and its vodml derscription : yes.
> Let me ask back here (because I find that consequence rather
> unwelcome): Say we have the following dependency graph on the DMs:
>
> coord - meas - dataset - cube
>             `---- phot ----'
>
> (indulge me if this isn't quite what's in the pipeline right now),
> Are you really advocating that once we make coord2, we will have
> to have meas2, dataset2, cube2, and phot2 as well, even if nothing in
> them changes?
>
> And if we need phot2, we will have to have cube2, that then
> references the existing coord, meas, and dataset instances directly?
>
> Mind you, I give you there are siutations in which you have to
> entangle DMs.  What I'm saying is that we should rather view such
> situations as indicators that refactoring things with a goal to
> disentangle the DMs might be advantageous.  And anyway, we shouldn't
> view that as the stated goal of our work.
>
>> if we fully serialize instances form scratch and populate them from some
>> data processing, each version of the model would have to be reflected in a
>> new serialisation. But this would be the case for any kind of data model
>> also outside IVOA or astronomy.
> True, because this kind of thing cannot (or so I think) be fixed in
> the serialisation.  It needs to be fixed in the DMs' architectures --
> which is the point I've been trying to make with respect to
> measurement.  Of course, having a serialisation defined would make
> this discussion a lot more concrete, which would hopefully give us
> more eyeballs -- and that's why I've been pushing for a serious
> proposal for the mapping syntax so much.
>
> But the fundamental question of DM design -- entangled or isolated --
> is orthogonal.  And perhaps we can make this discussion a bit more
> concrete again if we try to work out where entangling coords and
> measurements gives a benefit proportional to the cost involved.
>
>           -- Markus