Time Series Cube DM - IVOA Note

Laurino, Omar olaurino at cfa.harvard.edu
Tue Mar 21 15:08:51 CET 2017


Is anybody distilling all the requirements/test cases that are being
expressed in this thread?

To me, that's the most valuable information.

Because in the end we may prefer some solutions over others for different
reasons, but what really counts is what can be implemented with the
standards we provide, and what cannot.

If we formalized a list of requirements/use cases/test cases we could then
validate each proposed model against them and pick the simplest model that
covers them.

Omar.

On Tue, Mar 21, 2017 at 9:35 AM, Markus Demleitner <
msdemlei at ari.uni-heidelberg.de> wrote:

> Dear DM,
>
>
> On Mon, Mar 20, 2017 at 03:47:55PM -0400, CresitelloDittmar, Mark wrote:
> > In the cube model, I want to say: "A DataProduct has one or more
> Coordinate
> > system specifications, and the DataProduct owns its instances of
> CoordSys"
>
> I think here we're getting to the bottom of what we're trying to work
> out here: *why* do you want to say this?  What I'm trying to argue in
> my parallel mail
> http://mail.ivoa.net/pipermail/dm/2017-March/005492.html (look for
> "For illustration") is that an object about you'd say such things
> isn't what's actually useful for clients.  These, rather, need
> annotation topical for what they're trying to do (data structure for
> a cube plotter, axis/frame metadata for data merging component,
> dataset metadata for an ingestor or a bibliography component).
>
> The only reason I can see to have a "God Object" that gobbles up all
> these individual annotations could be some sort of validation
> component, as you argue here:
>
> > My impression is not that you object to the items per se, but rather that
> > they are explicitly connected in the model.. that it would be sufficient
> to
> > simply serialize a coordsys instance in my cube, and since CoordSys is a
> > valid, modeled object, that is all I need to do.  If this is so.. what is
> > lost is the ability to validate the data product.  How do I know if the
> > instance has all the expected components?
>
> First, for me, yes it's the coupling of the various models I'm
> worried about.
>
> On the validation: What's actually relevant to a given client is that
> a given annotation is what it expects, e.g., frame metadata for the
> merge component I have imagined in the use case in the cited mail.
> For the merge component, an NDCube annotation is unimportant, as is
> the Dataset annotation; when there's good STC annotation, it is good
> to go.
>
> Now, having one big data model you're validating against would mean
> that a dataset can be invalid although the STC annotation is
> perfectly good.  The hypothetical component merging time series with
> different time scales would simply work although it's not a
> "DataProduct" in your sense.  If it asked a validator, the validator
> would say: "No, this dataset is broken, keep your fingers off".  So,
> the validator isn't useful to the merge component, and that would be
> a pity.
>
> What I'm trying to sell is the concept that you validate *individual*
> annotations.  Based on this, clients can fairly reliably figure out
> whether or not they'll work.  For instance, something that has valid
> NDCube annotation can be used by a cube plotter even if it has
> missing or bad STC annotation.  Conversely, regardless of the status
> of the Dataset annotation, a time series merge tool will work just as
> long as at least one STC annotation it understands is valid.
>
> In other words: I'm proposing to abandon the hope that "This dataset
> is valid" will be a statement useful beyond management and
> beancounting.  Instead, I hope we'll see "This dataset has valid
> STC-1, STC-2, photometry-1, Dataset-1, and NDCube-1 annotations",
> which tells concrete software if whatever annotation(s) it needs are
> all right.
>
> [Jiri's plan to reference "good enough" objects]
> > To do what I think you are suggesting, would require a change to the
> VO-DML
> > specification.
>
> Well, it would if we were really after is what Jiri may have hinted
> at in his mail of Mon, 13 Mar 2017 11:14:13 +0100:
>
> ji> model, that means the serialization of my data will change if that
> model
> ji> changes. That doesn't mean, however, that I need to "embed" it into my
> data
> ji> model, my data model is not changing if the on I am dependent on
> changes.
>
> If this means "I reference an object in my DM, and if that object has
> incompatible changes, all remains fine", then I agree VO-DML would
> need to change; I don't think we have the equivalend of void* at this
> point (I think we're all in agreement that minor changes to DMs will
> by definition never break embedding data models, right?).
>
> By just exploiting co-reference, we can, however, avoid these
> potentially model-uprooting cross-model references *and*,
> additionally, gain the flexibility to combine annotations from
> various different annotations.
>
> Consider, for instance, a dataset that has an annotation
>
>   NDCube-1
>     independent_axes: dateObs
>     dependent_axes: whatever
>
>   STC-1
>     Frame
>       TT
>       BARYCENTER
>     value: dateObs
>
>   STC-2
>     CooClass
>       Time
>     Frame
>       timeScale TT
>       IncompatibleNiftyThing HighMagic
>     value: dateObs
>
> With this annotation, all clients knowing NDCube-1 and *either* of
> STC-1 and STC-2 have a complete annotation.
>
> Were dependent_axes to reference either the STC-1 or the STC-2
> annotation rather than directly dateObs, a client implementing
> NDCube-1 would be tightly bound to know whatever STC version is
> "baked into" NDCube.
>
> If you've ever implemented against our current SCS standard and
> cursed because you have to write ancient VOTable 1.1 you'll have an
> idea why I'm howling when contemplating such a practice.
>
>
> > It boils down to a collection of Coordinate-s, the Coordinate has
> reference
> > back to the Frame/Axis metadata.
>
> For the record, I believe the Frame metadata should be embedded and
> not referenced, but that's mainly for ease of implementation.
>
> The central point where we appear to differ that I am convinced we
> should try hard to make it a collection of native entities (in VOTable:
> FIELDs or PARAMs; FITS axes would be another example) that receive
> the Axis annotations from other annotations.
>
>
> > >> The premise is that a DataProduct should OWN all of its
> coordinates/data.
> > >> The vo-dml rules for composition state that a class/object may not be
> in
> > >> more than one composition relation.
>
> -- which only applies to annotations, not to the annotated naive
> entities themselves.  A VOTable FIELD can certainly have multiple
> annotations, and there's no concept of ownership there.
>
> > >> Since there are multiple types of Data Axis types, I modeled it this
> > >> way.. where the DataProduct owns ALL its data (Observables), and the
> data
> > >> axis types (DataAxis, DependentAxis) are organizational objects which
> refer
> > >> to the instances of the same axis.
> > >>
> > >> This could be organized differently.. having the Observables owned by
> the
> > >> DataAxis (which is directly or indirectly owned by the DataProduct),
> and
> > >> extend that for various types of axis.. adding constraints as
> needed.  The
>
> What I'm still unsure about: is there any reason beside the
> "one-stop" validation for why DataProduct needs to worry about the
> details of the axes (i.e., "physics" as covered by models like STC,
> Photometry, and possibly many others) rather than just "This axis
> value is in this column".  If there is, what is it?  If there's not,
> I think the whole complication of having to work out ownership
> relationships would go away (and this point 2 from the bottom of your
> mail -- one less issue to solve is always a good thing, no?).
>
> > >> I want to note one distinction.  The DataAxis here, is NOT the same
> as a
> > >> coordinate space axis.
> > >> If I have a 3D cartesian Space, with coordinate axes x,y,z.. there is
> 1
> > >> DataAxis referring to a Position3D in that space.
>
> Uh -- that sounds... dangerous.  In the spirit of my preference to
> ideally reference native entities (i.e., FIELDs here): How does this
> DataAxis grouping help a client?  What is it supposed to do with it?
> How does the grouping help it over just having three axis (that, of
> course, might still be related through one or more separate STC
> annotations, but I'd like that to be uncorrelated if at all
> possible).
>
> > >> So, I see we have 2 points of discussion for the cube model itself
> > >>   1) relation between Dataset and DataProduct
> > >>       Currently modeled as according to Section 3.. extend Dataset add
> > >> reference to DataProduct == MyDataset
> > >>
> > >>       Alternates include:
> > >>         a) loose coupling
> > >>             verbal statement that MyDataset includes an instance of
> > >> Dataset + instance of MyDataProduct
> > >>         b) referenced coupling
> > >>             MyDataSet == reference to Dataset + reference to
> MyDataProduct
> > >>             (allows validators to know what is expected, but allows
> > >> flexibility w.r.t. Dataset flavor )
> > >>
> > >>      I personally think a) is too loose, but b) might be a good way to
> > >> go..
>
> But why couple it at all?  There are prefectly valid use cases where
> you want Dataset without NDCube and where you want NDCube without
> Dataset; to me, that's a clear indication that they should live next
> to each other, both being first class citizens that can be validated
> independently of each other.
>
> Cheers,
>
>              Markus
>
> [who's aware there's still another unanswered message -- sorry]
>



-- 
Omar Laurino
Smithsonian Astrophysical Observatory
Harvard-Smithsonian Center for Astrophysics
100 Acorn Park Dr. R-377 MS-81
02140 Cambridge, MA
(617) 495-7227
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20170321/931faa5d/attachment-0001.html>


More information about the dm mailing list