[Observation] relation to Dataset

Fri Nov 22 09:36:01 PST 2013

Hi Doug and Arnold
> 
> Hi Arnold -
> 
> This is all true enough, although one could argue that some data products
> resulting from analysis combining multiple other data products could be
> considered a form of "software observation".  
I almost agree with Doug, but I think that "some data products ... MUST be
considered" to be the result of a completely different kind of experiment,
not an Observation, but for example a "Stacking Operation". Very different
way to describe such an experiment, different kind of provenance, different
parameters etc. Even though the dataset it produces can be considered to be
an image. No problem to describe all of these using the simple, recursive
pattern for a more comprehensive Provenance model I referred to in my
earlier, somewhat ignored email. For details look under "domain model" and
also under "simulation data model" on the DM page. 
And see the sketch for the Provenance model from GAVO that follows this
pattern.

> But the real reason we stretched
> the concept a bit in ObsTAP was merely to be able to provide a single
uniform
> index (the ObsTAP index) for science data products in an archive.
> 
> I agree that the observation-dataset modeling needs to be more
> comprehensive; my guess is that this can be done in a relational fashion
by
> adding one or more additional models/tables to hold the additional
metadata
> and relationships.  The relational model can easily represent the required
many-
> to-many relationship.
> 
This is where one SHOULD (if not MUST) start from a proper VO-DML model. 
It is the appropriate language, developed in another IVOA effort also, for
expressing data models of the complexity that is required. And it is easily
mapped to for example a relational model or can be easily used to annotate
such a database model (see another email I sent recently).

Cheers
Gerard

>  	- Doug
> 
> 
> 
> 
> On Fri, 22 Nov 2013, Arnold Rots wrote:
> 
> > I strongly object to this statement:
> >
> > "the data product may be the result of combining data from multiple
> > primary (physical) observations.  In this case the resulting data
> > product is a new processed "observation" to which a new unique
> > observation identifier should be assigned."
> >
> > We really need to distinguish clearly between Datasets and Observations.
> > An Observation represents an operation that is characterized by a
> > configuration
> > - instrument characteristics, coordinate volume and properties,
> > calibration, etc.
> > A Dataset is a container of bytes that may have resulted from an
> > Observation (the byte stream that came out of the telescope or various
> > direct processing products of it), a simulation, or the processing and
> > analysis of (possibly a subset) of one or more parent Datasets.
> > Each Dataset also carries metadata detailing coordinate
> > characteristics, the nature of the Dataset and its components, and its
> > provenance regarding its parents.
> >
> > Blurring the line between Observations and Datasets and carelessly
> > forcing one to assume the characteristics of the other is going to get
> > us into major trouble.
> >
> > Cheers,
> >
> >   - Arnold
> >
> > ----------------------------------------------------------------------
> > ----------
> > -----------------------------
> > Arnold H. Rots                                          Chandra X-ray
> > Science Center Smithsonian Astrophysical Observatory
> > tel:  +1 617 496
> > 7701
> > 60 Garden Street, MS 67                                      fax:  +1
> > 617
> > 495 7356
> > Cambridge, MA 02138
> > arots at cfa.harvard.edu
> > USA
> > http://hea-www.harvard.edu/~arots/
> > ----------------------------------------------------------------------
> > ----------
> > ------------------------------
> >
> >
> >
> > On Thu, Nov 21, 2013 at 6:00 PM, CresitelloDittmar, Mark
> > <mdittmar at cfa.harvard.edu> wrote:
> >       All,
> >
> > I've been thinking about this and some comments Arnold made on the
> > Provenance thread which are closely related.
> >   1) there is general agreement that Observation *has* 0 or more
> > Datasets  (rather than *is* a Dataset)
> >
> >   2) Dataset can exist without an Observation (can be created by
> > something else).
> >
> >   3) The definition of Observation is pretty fuzzy, but lets assume
> > that there could be an "Analysis" or "Simulation" step which could
> > create a Dataset.  These may be parts of the larger domain that all
> > these objects live in, but are not modeled.  Currently, the ObsCore
> > model does say (pg 19) "the data product may be the result of
> > combining data from multiple primary (physical) observations.  In this
> > case the resulting data product is a new processed "observation" to
> > which a new unique observation identifier should be assigned."
> > So the relation of Dataset to 'the thing which created it', is not
> > clear to me yet.  I keep going back to the 'Experiment' concept in
> > Gerard's mail (provenance thread).
> >
> > I don't think that a Dataset should have a bi-directional relation to
> > the full Observation(s) as I noted at the head of this thread, but
> > should
> >   a) have an association back to components of the Observation (
> > ObsConfig, Proposal ) which become part of the Dataset 'provenance'.
> >       (which is what I think Arnold was saying in the other thread).
> >   b) have metadata identifying the relevant Observation(s) comprising
> > Dataset (DataID.ObservationID), as Francois notes.
> >       but this gets tricky because ObsCore expects a singular (well
> > unique) obs_id for each Dataset.
> >   c) if the Dataset were created by something else, then it would add
> > associations to components of those things holding the relevant
> > information to fold into the 'provenance'.  Like the progenitor
> > Datasets.
> >
> >
> >
> >
> > On Fri, Nov 15, 2013 at 9:59 AM, Arnold Rots <arots at cfa.harvard.edu>
> > wrote:
> >       If multiple observations have to be taken care of
> >       through provenance,
> > then why should a single observation not be handled the same way?
> > Don't get me wrong: I think neither should be handled through
> > provenance.
> >
> > Examples are: VLA multi-configuration images; stacked images;
> > multi-observation event files.
> >
> > It is much clearer and more intuitive if we just simply allow a
> > Dataset to be associated with multiple Observations.
> > Actually, I think this is absolutely a requirement.
> >
> >   - Arnold
> >
> > ----------------------------------------------------------------------
> > ----------
> > -----------------------------
> > Arnold H. Rots
> > Chandra X-ray Science Center
> > Smithsonian Astrophysical Observatory                   tel:
> > +1 617 496 7701
> > 60 Garden Street, MS 67
> > fax:  +1 617 495 7356
> > Cambridge, MA 02138
> > arots at cfa.harvard.edu
> > USA
> > http://hea-www.harvard.edu/~arots/
> > ----------------------------------------------------------------------
> > ----------
> > ------------------------------
> >
> >
> >
> > On Thu, Nov 14, 2013 at 6:29 PM, Douglas Tody <dtody at nrao.edu> wrote:
> >       On Thu, 14 Nov 2013, Arnold Rots wrote:
> >
> >                   >From this description I
> >                   am beginning to suspect
> >                   that a Dataset can be
> >
> >             derived from
> >             (associated with) no more than one
> >             Observation.
> >             That seems utterly wrong; multiple
> >             Observations can be combined into a
> >             single Dataset.
> >             Or did I misunderstand?
> >
> >
> > Multiple Observations can be and often are combined to produce a new
> > Dataset, however describing that history would be likely be the
> > responsibility of the Provenance model.  At the level of Observation
> > it would probably be a new "Observation" (or at least Dataset).
> > Depends upon how strict we are with the concept of Observation.
> >  The
> > CreationType and calibration level say something about it being a
> > synthesized/derived data product.
> >
> >       I think it is OK to require that a Dataset
> >       is associated with at least one
> >       Observation,
> >       provided that a model or simulation can be
> >       described as an Observation.
> >
> >
> > In practice that is what we are doing, to keep things simple;
> > DataSource can be something like "theory".
> >
> >         - Doug
> >
> >       Cheers,
> >
> >        - Arnold
> >
> >
----------------------------------------------------------------------------
----
> >       -----------------------------
> >       Arnold H. Rots
> >                    Chandra X-ray
> >       Science Center
> >       Smithsonian Astrophysical Observatory
> >                   tel:  +1 617 496
> >       7701
> >       60 Garden Street, MS 67
> >                        fax:  +1 617
> >       495 7356
> >       Cambridge, MA 02138
> >       arots at cfa.harvard.edu
> >       USA
> >       http://hea-www.harvard.edu/~arots/
> >
----------------------------------------------------------------------------
----
> >       ------------------------------
> >
> >
> >
> >       On Thu, Nov 14, 2013 at 12:08 PM,
> >       CresitelloDittmar, Mark <
> >       mdittmar at cfa.harvard.edu> wrote:
> >
> >             All,
> >               This thread is for discussion
> >             on the relation between
> >             Observation and
> >             Dataset.
> >
> >             ref: ObsCoreDM -
> >             http://www.ivoa.net/documents/ObsCore/20111028/index.html
> >             ref: diagram illustrating
> >             relation of Image/Spectral
> >             Observation to
> >             ObsCoreDM (draft)
> >
> >
> http://www.ivoa.net/pipermail/dm/attachments/20131113/c9ef7581/attachme
> nt-0001.p
> >             ng
> >
> >             motivation
> >               It is clear that there is a
> >             relationship between
> >             "Observation" and a
> >             more generic "Dataset".  This
> >             "Dataset" would contain
> >             elements such as the
> >             dataProductType, and
> >             dataProductSubtype, presumably
> >             others.  This object
> >             has not been formally defined.
> >
> >               In ObsCore, there is an
> >             implied relationship for
> >             Observation as an
> >             Extension of Dataset in the
> >             location of these attributes.
> >              So, I have
> >             always interpreted that
> >             Observation "is" a Dataset.
> >              This is reflected in
> >             my choice of the name
> >             "ObservationDataset" in the
> >             left hand package of my
> >             diagram.  It implies that it is
> >             a Dataset extended for
> >             Observation purposes.
> >
> >               Recent discussion brings this
> >             relationship into question,
> >             with
> >             assertions that an Observation
> >             can be associated with 0 or
> >             more Datasets.
> >
> >               This has real ramifications
> >             for the Image and Spectral
> >             models..
> >
> >             Seed:
> >
> >             If the relation is Observation
> >             "has" 0..* Dataset, then all
> >             the diagrams
> >             to date are wrong.
> >             It feels like this would be a
> >             fundamental change to all these
> >             models.
> >
> >               - there would need to be a
> >             bi-directional relation between
> >             Observation
> >             and Dataset
> >                    (observation has 0..*
> >             Dataset; Dataset associated
> >             with 1
> >             Observation)
> >                 Hmm.. since there can be
> >             Datasets not associated with
> >             Observations,
> >             this would
> >                 need to be a specialization
> >             of Dataset..
> >             (ObservationDataset.. but not
> >             the one in my diag.)
> >
> >               - the Char associated with
> >             Observation would characterize
> >             the total
> >             space of all included Datasets.
> >              (0..1) relation to
> >             Observation.  If no
> >             Datasets, no Char
> >
> >               - each Dataset would require
> >             it's own Characterisation,
> >             specific to it's
> >             space.
> >                 (so there is another
> >             attribute for Dataset).
> >
> >               - we would need to specify
> >             which of the elements are
> >             associated to the
> >             Dataset, and which to the
> >             Observation.  e.g. DataModel =>
> >             Dataset;  Target
> >             => Observation
> >
> >             Thoughts?
> >             Mark
> >
> >
> >
> >
> >
> >
> >
> >
> >