Entangled Models [Was: MCT - model document delivery.]

Wed Sep 30 09:01:40 CEST 2020

Hi Mark,

On Tue, Sep 29, 2020 at 12:41:08PM -0400, CresitelloDittmar, Mark wrote:
> On Tue, Sep 29, 2020 at 7:58 AM Markus Demleitner <
> msdemlei at ari.uni-heidelberg.de> wrote:
> 
> >
> > Let me ask back here (because I find that consequence rather
> > unwelcome): Say we have the following dependency graph on the DMs:
> >
> >                          |- Spectral
>                            |- TimeSeries
> coord - meas - dataset - cube
>           `---- phot ----'
> 
> I'm wondering how that works in the higher level products...
> I've added TimeSeries and Spectral to your hierarchy chart.

Hm -- isn't a time series just a 1D cube with a time axis, and a
spectrum just a 1D cube with a spectral axis (neglecting for the
moment multiple arrays in a single dataset, but again that's probably
a common problem)?  What are the use cases requiring specific models
for those?

>    * Having given this no thought.. If these are isolated models, I'd
> expect that Spectral and TimeSeries could not be expressed as a simple
> extension of Cube (as the current prototype model is).
> 
> If I'm a provider of Spectra, Photometry filters, TimeSeries, etc..
>   1) How do I know that I should provide annotations for the 'data' as
> Measurements and Coordinates?

First -- everyone should give Meas and Coord.  Marking up errors and
values is useful no matter what, and having RA and Dec without saying
they're actually ICRS and giving their Epoch will be a growing headache
the farther we go from the all-major-resources-are-ICRS Epoch J2000
world that the VO started in (for galactic astronomy, that is).

Second -- there's nothing wrong with defining "profiles", for
instance in access protocols, as in "Products delivered through SIAP
3.0 *must* have an annotation with version 1 of coord, measurement,
dataset, and cube (and can, of course, have additional annotations so
we can grow into the future)."

Before planning for that, I'd rather be sure that's actually useful
because I find...

>   2) From that perspective wouldn't I expect that annotating as a
> TimeSeries should be all I need to do?
>       o If I neglect/fail to provide them, it is a valid TimeSeries, but
> one with no Measurements.

...such questions become a lot simpler to reason about once one
starts saying what "is a time series" actually means operationally.
Here are the use cases I can think of, with the annotations a client
would look at in what I think our current factoring is, for a time
series:

* Recognise it's a time series: cube annotation, look at the UCDs of
  the independent axes (or perhaps at the dataproduct_type).
* Figure out the (time, val1, val2, ...) tuples: cube annotation:
  What are the independent and dependent axes?
* Giving values errors: measurement annotation ("give me all
  instances of measurement that my flux axis is part of").
* Figuring out time frames and such: coord annotation
* Figuring out observational metadata (where was the telescope
  pointed?): dataset annotation

I claim that's not much different from when you have entangled data
models -- there's very little in this that you could make mandatory
in an entangled model, because you'll always find a dataset that
could be annotated as a time series but simply doesn't provide a
particular piece of metadata.

Hence, clients will always have to be written such that they fail
gracefully when some metadata (or annotation) is missing.

Incidentally, another advantage of isolated DMs is that when a
certain annotation is "bad" (and don't get me started on existing
FITS headers), the others keep working.  Thus, even when the cube
annoation is botched, a client would still be able to figure out,
say, the reference position for the time.

          -- Markus