<div dir="ltr">Dear DM,<div><br></div><div>I have gone through the ideas since my last response and would like to sum it up as follows. I think, generally, the discussion is going in the right direction - but sometimes I'm getting the impression we are talking about the same things and just calling them differently. An example of serialization would be exactly what we need now.</div><div><br></div><div>Most important points that I see in the discussion now:<br><br></div><div>1. Validation</div><div>I will not cite here as this one has multiple remarks in the conversation already. My opinion is that loosely coupled models are better - any client can then validate what he needs. It can validate STC, Photometry or Quantity separately and then if I want to validate TimeSeriesCube, I go into the TimeSeriesCube DM and see - OK for a valid TimeSeriesCube instance I need to have valid one STC and one Quantity (+ maybe Photometry) annotated in the dataset. So the TimeSeriesCube DM won't actually <b>contain</b> everything it requires for validation - it will just say what <b>other DMs</b> need to be valid as part of it.</div><div><br></div><div>This gives the clients a tool (not only a possibility) to validate only parts of the coming time series VOTable so Mark Taylor can now say - no this is not a valid Time Series data cube, but it has valid STC and Photometry annotations and I can work with those.</div><div><br></div><div><br></div><div>2. Quantity</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span style="font-size:12.8px">Agreed.. again, since your Quantity == my Coordinate, the 'coords' model would be that place.</span></blockquote><div><br></div><div>I must disagree with this, because in this particular case, I guess we are calling two different things one name. If I use your described objects:</div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote"><br> coordsys: == Coordinate Systems and Frame specification pattern + domain implementations<br> coords: == Coordinates specification ( value + errors ) <br> trans: == Transform model<br> cube: == NDCube model elements<br> ds: == DatasetMetadata elements.</blockquote><div><br></div><div>Then you have only value+error for coords, along with its coordsys (which can be delgated to STC, I will get to that later). For Quantity though, it's not only about annotating value and error, it's also about describing this quantity's distribution. So after put into your format:</div><div><br></div><div>coordsys: == ( value + errors ) - simple model of the data, not metadata</div><div>quantity: == ( value + errors ) + this quantity metadata ( mean + sigma + quartiles + ... ). The metadata is to help the users to decide what they want search for within these values, how to filter them. </div><div><br></div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">What alternative would you suggest? <br>It is used in:<br> Dataset once<br> Coords many times<br><span style="font-size:12.8px"> CoordSys a few times</span></blockquote><div><br></div><div><span style="font-size:12.8px"></span>This is actually not important for the Cube only, so I'd also vouch for putting it into separate model which would have one page now only - just list the statistical parameters that you would like to keep for describing a quantity along with value and error columns.</div><div><br></div><div>3. Extensibility, God-likeness, loose coupling</div><div><br></div><div>I'd pick the following important statements.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span style="font-size:12.8px">Our standards live a lot longer than the last new-fangled<br></span><span style="font-size:12.8px">distributed peer-to-peer NoSQL social blockchain web glitz. And<br></span><span style="font-size:12.8px">therefore for us dependencies are even more expensive.</span></blockquote><div><br></div><div>The best way to make standards live longer is to make them closed for modification (as few <b>major</b> versions as possible), but make them open to extension. If anybody realizes the data model does not provide enough for him, he can still extend it with custom attributes (still compatible with the current version) and once we have several of these extensions, we just make an intersection and what's common will create a baseline for new <b>minor</b> version, adding more standard tools to bear for the clients.</div><div><br></div><div>If we realize that we designed something incorrectly in the <b>major </b>version (made it too restrictive) and somebody needs to change attributes already part of the model, that's the expensive part and we need to create a new <b>major</b> version. Still, with the loose coupling we can keep annotations of both major versions in the same dataset to make things backwards compatible - more in serialization examples.</div><div><br></div><div>To sum it up, I would use the rule Petr mentioned:</div><div><br style="font-size:12.8px"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span style="font-size:12.8px">In using the 80/20 rule - we may not cover everything ...</span></blockquote><div><br></div><div>Exactly - we can cover within the model 80 percent, but to make our major versions stable, we need to predict where those 20 percent might head and make the model easily extensible (non-restrictive) in that way. Decoupling models into smaller parts helps a lot in this regard, because it's much easier to extend a smaller model than a God-like object.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br style="font-size:12.8px"><span style="font-size:12.8px">I think pulling out the generic stuff from STC will get us a long way<br></span><span style="font-size:12.8px">towards a good start for quantity, and I'm not adverse to specifying<br></span><span style="font-size:12.8px">quantity in a REC together with STC. But I shouldn't need to pull in<br></span><span style="font-size:12.8px">STC and its frames (not to mention transformations and geometry) just<br></span><span style="font-size:12.8px">to express that I have something with a value, an error, and whatever<br></span><span style="font-size:12.8px">else.</span></blockquote><div><br></div><div>Yes! We should not be afraid of pulling parts of models out into new stand-alone models. This is the most natural way of Software engineering. You design a small object. You keep adding functionality to that object. Once you realize that the object is too big, or has multiple responsibilities, you refactor part of the object into a new object that is just referenced from the original.</div><div><br></div><div>Once you realize that the object is too big, or has multiple responsibilities - this question needs to be asked on every major version of a model or we won't improve the quality of IVOA standards at all.</div></div><div><br></div><div>And yes, adding your own stuff to the model will be easier, as Omar wrote:</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span style="font-size:12.8px">At the same time, I think we should make sure that when such an update is really necessary, our framework makes it easy to update the downstream model. It's always a trade-off. The easier it is to adopt healthy patterns, the easier it is to fall into disruptive anti-pattern pits.</span></blockquote><div><br></div><div>But that's not a reason to make it harder. The task of IVOA should be here to guide the people who are interested in using our standards and extending them and tell them what is actually a healthy addition and what is an abuse of the model.</div><div><br></div><div>4. Examples of serialization</div><div>The most important part right now, because it will straighten up the vocabulary and uncover ambiguities that we are using.</div><div><br></div><div>We are working on this right now - we have some sample XMLs for TimeSeriesCube DM 1.1 written by hand right now, but I would like to postpone sharing them here before we try to implement them to see how they work in practice - hopefully somewhere during next week.</div><div><br></div><div><br></div><div>Cheers,</div><div><br></div><div>Jiri</div><div><br></div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">2017-03-31 13:49 GMT+02:00 Markus Demleitner <span dir="ltr"><<a href="mailto:msdemlei@ari.uni-heidelberg.de" target="_blank">msdemlei@ari.uni-heidelberg.de</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Omar,<br>
<br>
One short point, one longer one:<br>
<span class="gmail-"><br>
On Tue, Mar 28, 2017 at 10:21:46AM -0400, Laurino, Omar wrote:<br>
> In the time series example, more than a time series *data model* I think<br>
> time series can just be seen as *instances* of a common, more generic data<br>
> model, that is itself a lightweight one. A client could specialize into<br>
<br>
</span>Absolutely -- at least my goal in this is to have time series just be<br>
an NDCube that happens to have just one non-degenerate independent<br>
axis that furthermore happens to have time-like STC annotation; I<br>
think our adopters would rightfully develop solid resentments against<br>
us if we did something very different.<br>
<span class="gmail-"><br>
<br>
> <GROUP vodml-type="stc:Coordinate"><br>
> > <PARAMref vodml-role="value" vodml-type="Coord2" ref="pt"/><br>
> > </GROUP><br>
> > <PARAM ID="pt" xtype="POINT" datatype="real"<br>
> > arraysize="2" value="23.3 41"/><br>
> > <GROUP vodml-type="stc:Coordinate"><br>
> > <GROUP vodml-role="value" vodml-type="Coord2"><br>
> > <PARAMref vodml-role="C1" ref="ra"/><br>
> > <PARAMref vodml-role="C2" ref="dec"/><br>
> > </GROUP><br>
> > </GROUP><br>
> > <PARAM ID="ra" value="23.3"/><br>
> > <PARAM ID="dec" value="41"/><br>
><br>
><br>
> Would you have both annotations in the same file? How should a client<br>
> (unaware of the enclosing model) know this is two different representations<br>
> of the same coordinate rather than two distinct coordinates? I would rather<br>
> be in favor of specific mapping rules for certain types, if that makes<br>
> sense, which is what we already do for ivoa:Quantity. Coord2 would be<br>
> serialized as a DALI POINT, if that makes sense. Admittedly, I haven't<br>
> given this possibility enough thought, so I am not sure how convenient that<br>
> would be or what repercussions it might have down the road.<br>
<br>
</span>I guess this is a good example for a distinction between two use<br>
cases that we perhaps haven't sufficiently made in past DM work to<br>
our detriment. I think issues become a lot clearer if we separate<br>
two related but actually distinct things:<br>
<br>
(1) We want to define standard serialisations; that's stuff like an<br>
obscore table, an SSA response, or whatever. Here, we have to be<br>
strict and precise on the serialisation details. I think obscore<br>
gets it right, simply saying "column/FIELD with name s_ra, floating<br>
point type, in unit foo, UCD such-and-such, preferred description<br>
this-and-that, reference frame ICRS". Note how, this way, further<br>
annotation is actually not necessary for anything in the core data<br>
content, and that's how things must be if one wants to write<br>
multi-service queries or join results from different services without<br>
a lot of logic. I'd say that's "baseline interoperability".<br>
<br>
Personally, I'm not even sure if the notion of a data model even is<br>
terrribly useful for these *as such*. Grammars or, as in obscore, a<br>
simple database schema seem more appropriate to me. Be that as it<br>
may, by now I'm convinced that even with VO-DML and the mapping<br>
document, we'll still have do define concrete serialisation(s), for<br>
me preferably in documents of type (1) themselves. But that's, I'd<br>
say, tangential for now.<br>
<br>
Of course, once you add local extensions to such predefined<br>
serialisations (e.g., extra columns in obscore, custom fields in DAL<br>
responses), things are different, and then we have one example of<br>
(2).<br>
<br>
(2) We want complex metadata schemes for physical (or whatever)<br>
entities which generically work whereever these entities turn up;<br>
that could be filter names and zero points in photometry, time scales<br>
and reference positions for times, or statistical properties, error<br>
models, etc for measuments of all kinds. These *may* go on top of<br>
the well-defined serialisations, but where they really are needed is<br>
when you have "free" responses, e.g., in TAP, datalink/SODA parameter<br>
declarations, custom extensions, etc. I'd call this "spontaneous<br>
interoperability, because client and server don't need to pre-arrange<br>
anything above the transport and annotation layers.<br>
<br>
That's complex in the general case, but it's not black magic. Hence,<br>
(not only) I still think it's a disgrace that 15 years into the VO<br>
all we have is a deprecated (and fairly limited) way to say "this<br>
pair of columns [this POINT, POLYGON...] is a position in ICRS<br>
BARYCENTER for Epoch J2015.0. At least this very basic annotation<br>
simply must work for essentially all representations that sensible<br>
and almost-sensible people (as well as data-writing astronomers) may<br>
choose.<br>
<br>
With that distinction: No, I do not believe we'll end up at a useful<br>
standard if we leave open in a given standard whether positions are<br>
given as RA/DEC or a POINT in cases like (1). They have to say that<br>
or you can never, say, write an obscore query that works on more than<br>
one service.<br>
<br>
But the data models and in particular annotation scheme (make that a<br>
plural once we tackle FITS or HDF5) must still be flexible enough to<br>
cover (2). Let's see that we can finally annotate VOTables in<br>
suffient detail that a client can reliably bring a catalog in ICRS on<br>
Epoch J1992.25 to Galactic in J2015 (or notice when that's not possible<br>
for lack of proper motions). I'd say that's an achievable goal.<br>
<br>
And since I'd really like this to not share the fate of the<br>
STC-in-VOTable Note, my feeling at this point is that proper error<br>
treatment is for when we've gathered some experience; that would mean<br>
that we can for now delay modelling correlated, non-Gaussian or<br>
otherwise real-world errors (but keep Quantity open for adding that<br>
later).<br>
<br>
A bird in the hand is worth two in the bush.<br>
<span class="gmail-HOEnZb"><font color="#888888"><br>
<br>
-- Markus<br>
</font></span></blockquote></div><br></div></div>