<div dir="ltr">Markus,<br><br><br><div><div class="gmail_extra"><div class="gmail_quote">On Tue, Mar 21, 2017 at 9:35 AM, Markus Demleitner <span dir="ltr">&lt;<a href="mailto:msdemlei@ari.uni-heidelberg.de" target="_blank">msdemlei@ari.uni-heidelberg.de</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear DM,<br>

<span class=""><br>

<br>

On Mon, Mar 20, 2017 at 03:47:55PM -0400, CresitelloDittmar, Mark wrote:<br>

&gt; In the cube model, I want to say: &quot;A DataProduct has one or more Coordinate<br>

&gt; system specifications, and the DataProduct owns its instances of CoordSys&quot;<br>

<br>

</span>I think here we&#39;re getting to the bottom of what we&#39;re trying to work<br>

out here: *why* do you want to say this?  What I&#39;m trying to argue in<br>

my parallel mail<br>

<a href="http://mail.ivoa.net/pipermail/dm/2017-March/005492.html" rel="noreferrer" target="_blank">http://mail.ivoa.net/<wbr>pipermail/dm/2017-March/<wbr>005492.html</a> (look for<br>

&quot;For illustration&quot;) is that an object about you&#39;d say such things<br>

isn&#39;t what&#39;s actually useful for clients.  These, rather, need<br>

annotation topical for what they&#39;re trying to do (data structure for<br>

a cube plotter, axis/frame metadata for data merging component,<br>

dataset metadata for an ingestor or a bibliography component).<br>

<br>

The only reason I can see to have a &quot;God Object&quot; that gobbles up all<br>

these individual annotations could be some sort of validation<br>

component, as you argue here:<br>

<span class=""><br>

&gt; My impression is not that you object to the items per se, but rather that<br>

&gt; they are explicitly connected in the model.. that it would be sufficient to<br>

&gt; simply serialize a coordsys instance in my cube, and since CoordSys is a<br>

&gt; valid, modeled object, that is all I need to do.  If this is so.. what is<br>

&gt; lost is the ability to validate the data product.  How do I know if the<br>

&gt; instance has all the expected components?<br>

<br>

</span>First, for me, yes it&#39;s the coupling of the various models I&#39;m<br>

worried about.<br>

<br>

On the validation: What&#39;s actually relevant to a given client is that<br>

a given annotation is what it expects, e.g., frame metadata for the<br>

merge component I have imagined in the use case in the cited mail.<br>

For the merge component, an NDCube annotation is unimportant, as is<br>

the Dataset annotation; when there&#39;s good STC annotation, it is good<br>

to go. <br></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Now, having one big data model you&#39;re validating against would mean<br>

that a dataset can be invalid although the STC annotation is<br>

perfectly good.  The hypothetical component merging time series with<br>

different time scales would simply work although it&#39;s not a<br>

&quot;DataProduct&quot; in your sense.  If it asked a validator, the validator<br>

would say: &quot;No, this dataset is broken, keep your fingers off&quot;.  So,<br>

the validator isn&#39;t useful to the merge component, and that would be<br>

a pity.<br></blockquote><div><br></div><div>I consider the validation requirement a pretty important one..<br></div><div>  * an application like IRIS to verify that the product being read is compatible with the code expectations<br></div><div>  * folks like &#39;Operations&#39; to check that a data provider is producing what they say they are<br></div><div><br></div><div>IMO, there should be a concept of &#39;this is a valid Spectrum instance&#39;.<br><br></div><div>For your &#39;merge component&#39; application would not be checking if the product was a valid NDCube, <br></div><div>it would be validating against STC.. which would presumably validate all the STC instances.<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

What I&#39;m trying to sell is the concept that you validate *individual*<br>

annotations.  Based on this, clients can fairly reliably figure out<br>

whether or not they&#39;ll work.  For instance, something that has valid<br>

NDCube annotation can be used by a cube plotter even if it has<br>

missing or bad STC annotation.  </blockquote><div><br></div><div>I know this is just an example.. but how could a plotter work without valid Coordinate (valid+error) annotation, which is not in cube?<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Conversely, regardless of the status<br>

of the Dataset annotation, a time series merge tool will work just as<br>

long as at least one STC annotation it understands is valid. <br></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

In other words: I&#39;m proposing to abandon the hope that &quot;This dataset<br>

is valid&quot; will be a statement useful beyond management and<br>

beancounting.  Instead, I hope we&#39;ll see &quot;This dataset has valid<br>

STC-1, STC-2, photometry-1, Dataset-1, and NDCube-1 annotations&quot;,<br>

which tells concrete software if whatever annotation(s) it needs are<br>

all right.<br></blockquote><div><br></div><div>I think we need more input from the client/Applications side... to me, this feels like an interoperability nightmare (though I think you argue the opposite).  An application would need to check that the instance contains valid annotation for every component that it uses, rather than just knowing it is OK by seeing it is an NDCube-1 instance.<br></div><div><br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

[Jiri&#39;s plan to reference &quot;good enough&quot; objects]<br>

<span class="">&gt; To do what I think you are suggesting, would require a change to the VO-DML<br>

&gt; specification.<br>

<br>

</span>Well, it would if we were really after is what Jiri may have hinted<br>

at in his mail of Mon, 13 Mar 2017 11:14:13 +0100:<br>

<br>

ji&gt; model, that means the serialization of my data will change if that model<br>

ji&gt; changes. That doesn&#39;t mean, however, that I need to &quot;embed&quot; it into my data<br>

ji&gt; model, my data model is not changing if the on I am dependent on changes.<br>

<br>

If this means &quot;I reference an object in my DM, and if that object has<br>

incompatible changes, all remains fine&quot;, then I agree VO-DML would<br>

need to change; I don&#39;t think we have the equivalend of void* at this<br>

point (I think we&#39;re all in agreement that minor changes to DMs will<br>

by definition never break embedding data models, right?).<br>

<br>

By just exploiting co-reference, we can, however, avoid these<br>

potentially model-uprooting cross-model references *and*,<br>

additionally, gain the flexibility to combine annotations from<br>

various different annotations.<br>

<br>

Consider, for instance, a dataset that has an annotation<br>

<br>

  NDCube-1<br>

    independent_axes: dateObs<br>

    dependent_axes: whatever<br>

<br>

  STC-1<br>

    Frame<br>

      TT<br>

      BARYCENTER<br>

    value: dateObs<br>

<br>

  STC-2<br>

    CooClass<br>

      Time<br>

    Frame<br>

      timeScale TT<br>

      IncompatibleNiftyThing HighMagic<br>

    value: dateObs<br>

<br>

With this annotation, all clients knowing NDCube-1 and *either* of<br>

STC-1 and STC-2 have a complete annotation.<br></blockquote><div><br></div>I can see that there would be value in being able to do this.<br></div><div class="gmail_quote">My objection is simply that to enable this means changing the vo-dml standard, which would be a huge hit at this point.<br><br></div><div class="gmail_quote">Here, dateObs is, presumably a set of Time Coordinates.. <br></div><div class="gmail_quote">  by vo-dml, the role independent_axes must have a type.  If that type is not defined in the same model itself, it is <br></div><div class="gmail_quote">  identified by the model which does define it ( coords:Coordinate as a generic base ).  That is a specific major version of <br></div><div class="gmail_quote">  the coords model with vo-dml/XML documentation.<br></div><div class="gmail_quote">To be a valid vo-dml model, that linkage must exist.  This is at the model level.. which then constrains the annotation.<br></div><div class="gmail_quote"><br></div><div class="gmail_quote">An instance can have this annotation, but they would define independent instances.  I&#39;m don&#39;t understand how the &#39;co-reference&#39; mechanism works.<br></div><div class="gmail_quote">How does an application know that they are the same thing? (question repeated from earlier msg, so I&#39;ll pause there)<br></div><div class="gmail_quote"> <br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Were dependent_axes to reference either the STC-1 or the STC-2<br>

annotation rather than directly dateObs, a client implementing<br>

NDCube-1 would be tightly bound to know whatever STC version is<br>

&quot;baked into&quot; NDCube.<br>

<br>

If you&#39;ve ever implemented against our current SCS standard and<br>

cursed because you have to write ancient VOTable 1.1 you&#39;ll have an<br>

idea why I&#39;m howling when contemplating such a practice.<br>

<span class=""><br>

<br>

&gt; It boils down to a collection of Coordinate-s, the Coordinate has reference<br>

&gt; back to the Frame/Axis metadata.<br>

<br>

</span>For the record, I believe the Frame metadata should be embedded and<br>

not referenced, but that&#39;s mainly for ease of implementation.<br>

<br>

The central point where we appear to differ that I am convinced we<br>

should try hard to make it a collection of native entities (in VOTable:<br>

FIELDs or PARAMs; FITS axes would be another example) that receive<br>

the Axis annotations from other annotations.<br>

<span class=""><br>

<br>

&gt; &gt;&gt; The premise is that a DataProduct should OWN all of its coordinates/data.<br>

&gt; &gt;&gt; The vo-dml rules for composition state that a class/object may not be in<br>

&gt; &gt;&gt; more than one composition relation.<br>

<br>

</span>-- which only applies to annotations, not to the annotated naive<br>

entities themselves.  A VOTable FIELD can certainly have multiple<br>

annotations, and there&#39;s no concept of ownership there.<br>

<span class=""><br>

&gt; &gt;&gt; Since there are multiple types of Data Axis types, I modeled it this<br>

&gt; &gt;&gt; way.. where the DataProduct owns ALL its data (Observables), and the data<br>

&gt; &gt;&gt; axis types (DataAxis, DependentAxis) are organizational objects which refer<br>

&gt; &gt;&gt; to the instances of the same axis.<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; This could be organized differently.. having the Observables owned by the<br>

&gt; &gt;&gt; DataAxis (which is directly or indirectly owned by the DataProduct), and<br>

&gt; &gt;&gt; extend that for various types of axis.. adding constraints as needed.  The<br>

<br>

</span>What I&#39;m still unsure about: is there any reason beside the<br>

&quot;one-stop&quot; validation for why DataProduct needs to worry about the<br>

details of the axes (i.e., &quot;physics&quot; as covered by models like STC,<br>

Photometry, and possibly many others) rather than just &quot;This axis<br>

value is in this column&quot;.  If there is, what is it?  If there&#39;s not,<br>

I think the whole complication of having to work out ownership<br>

relationships would go away (and this point 2 from the bottom of your<br>

mail -- one less issue to solve is always a good thing, no?).<br></blockquote><div><br></div><div>It doesn&#39;t worry about them.  It points to a generic base for the detailed types.<br></div><div>Any implementation of that type can be used.  By linking it to a base, it lets <br></div><div>applications know that there are certain elements one can always expect to have<br></div><div>available.  If I know the value is a coords:Coordinate, then I can expect a certain<br></div><div>structure and some content, even if I don&#39;t know the specific coordinate flavor.<br></div><div>eg: I get a Flux Coordinate, which is not one of the domains I understand, I can<br></div><div>still use it to a high degree for various applications.<br><br></div><div>The ownership relations are there for various applications which implement the model.<br></div><div>When implementing a library, I would want to know when it is safe to free the memory space<br></div><div>for particular elements.  I think this is most true for database applications, but that is <br></div><div>outside my wheelhouse.<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span class=""><br>

&gt; &gt;&gt; I want to note one distinction.  The DataAxis here, is NOT the same as a<br>

&gt; &gt;&gt; coordinate space axis.<br>

&gt; &gt;&gt; If I have a 3D cartesian Space, with coordinate axes x,y,z.. there is 1<br>

&gt; &gt;&gt; DataAxis referring to a Position3D in that space.<br>

<br>

</span>Uh -- that sounds... dangerous.  In the spirit of my preference to<br>

ideally reference native entities (i.e., FIELDs here): How does this<br>

DataAxis grouping help a client?  What is it supposed to do with it?<br>

How does the grouping help it over just having three axis (that, of<br>

course, might still be related through one or more separate STC<br>

annotations, but I&#39;d like that to be uncorrelated if at all<br>

possible).<br>

<span class=""><br></span></blockquote><div><br></div><div>1 FIELD -&gt; 1 DataAxis (Coordinate) works fine only for the simplest case (1D value with no errors).<br></div><div>For the 2D/3D cases, the errors may be correlated, so the bundle of FIELDs for <br>the value must be grouped above the errors.  And then there is the errors...<br></div><div>A &#39;2D Coordinate&#39; with 2 sources of error, both symmetric.. would have 4 FIELDs<br></div><div>feeding the DataAxis/Coordinate content  ( x, y, xy_staterr, xy_ranerr ).<br></div><div><br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

&gt; &gt;&gt; So, I see we have 2 points of discussion for the cube model itself<br>

&gt; &gt;&gt;   1) relation between Dataset and DataProduct<br>

&gt; &gt;&gt;       Currently modeled as according to Section 3.. extend Dataset add<br>

&gt; &gt;&gt; reference to DataProduct == MyDataset<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt;       Alternates include:<br>

&gt; &gt;&gt;         a) loose coupling<br>

&gt; &gt;&gt;             verbal statement that MyDataset includes an instance of<br>

&gt; &gt;&gt; Dataset + instance of MyDataProduct<br>

&gt; &gt;&gt;         b) referenced coupling<br>

&gt; &gt;&gt;             MyDataSet == reference to Dataset + reference to MyDataProduct<br>

&gt; &gt;&gt;             (allows validators to know what is expected, but allows<br>

&gt; &gt;&gt; flexibility w.r.t. Dataset flavor )<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt;      I personally think a) is too loose, but b) might be a good way to<br>

&gt; &gt;&gt; go..<br>

<br>

</span>But why couple it at all?  There are prefectly valid use cases where<br>

you want Dataset without NDCube and where you want NDCube without<br>

Dataset; to me, that&#39;s a clear indication that they should live next<br>

to each other, both being first class citizens that can be validated<br>

independently of each other.<br>

<br>

Cheers,<br>

<br>

             Markus<br>

<br>

[who&#39;s aware there&#39;s still another unanswered message -- sorry]<br></blockquote><div><br>Looking forward to it.. Cheers,<br></div><div>mark<br><br></div></div><br></div></div></div>