[Observation] relation to Dataset

Fri Nov 22 22:50:52 PST 2013

Hi Doug
> 
> I agree that the generic Dataset is the more fundamental concept here:
> an Observation is not a Dataset but a data product derived from an
Observation
> is a Dataset, and there can be Datasets that are not derived
> from a single observation or any observation.   - Doug
>
Actually I do *not* think DataSet is the more fundamental concept.
I guess it *does* represent the thing (file, files?, database) that users
are ultimately interested in to access.
But from a scientific point of view one can consider it to be "merely" the
result of an experiment (the Observation), without which it would not have
been created.
You can have experiments without results (ones that have been proposed,
designed, just started, or that failed), but no results without experiments.

This means that for each kind of dataset one SHOULD(MUST?) be able to
specify what (kind of) experiment produced it.
It may mean you have to be more open on the type of experiments one includes
in one's model, and/or be explicit about which experiments one ignores (and
whose data sets therefore should not be represented either, even though they
may look like images).
A project I was involved in produced lots of large images (as FITS files)
purely resulting from a rather long simulation+post-processing pipeline,
which included realistic telescope/instrument simulators (see
http://adsabs.harvard.edu/abs/2013MNRAS.428..778O if you're interested)
None of these images was the result of an Observation (when defined as an
experiment that includes using a real telescope etc), and clearly users of
the images would like to know this. 

I guess what I am saying is that I think it is dangerous to focus on
modeling some abstract DataSet concept, without understanding what possible
experiments may have produced them and what possible parameters may be
required to describe them properly.
Hence I have always believed that the modeling efforts could (should?) be
organised according to the type of experiment they describe, NOT the kind of
dataset they produce.

Cheers
Gerard
> 
> On Fri, 22 Nov 2013, Gerard Lemson wrote:
> 
> > Hi Doug and Arnold
> >>
> >> Hi Arnold -
> >>
> >> This is all true enough, although one could argue that some data
> >> products resulting from analysis combining multiple other data
> >> products could be considered a form of "software observation".
> > I almost agree with Doug, but I think that "some data products ...
> > MUST be considered" to be the result of a completely different kind of
> > experiment, not an Observation, but for example a "Stacking
> > Operation". Very different way to describe such an experiment,
> > different kind of provenance, different parameters etc. Even though
> > the dataset it produces can be considered to be an image. No problem
> > to describe all of these using the simple, recursive pattern for a
> > more comprehensive Provenance model I referred to in my earlier,
> > somewhat ignored email. For details look under "domain model" and also
> under "simulation data model" on the DM page.
> > And see the sketch for the Provenance model from GAVO that follows
> > this pattern.
> >
> >> But the real reason we stretched
> >> the concept a bit in ObsTAP was merely to be able to provide a single
> > uniform
> >> index (the ObsTAP index) for science data products in an archive.
> >>
> >> I agree that the observation-dataset modeling needs to be more
> >> comprehensive; my guess is that this can be done in a relational
> >> fashion
> > by
> >> adding one or more additional models/tables to hold the additional
> > metadata
> >> and relationships.  The relational model can easily represent the
> >> required
> > many-
> >> to-many relationship.
> >>
> > This is where one SHOULD (if not MUST) start from a proper VO-DML model.
> > It is the appropriate language, developed in another IVOA effort also,
> > for expressing data models of the complexity that is required. And it
> > is easily mapped to for example a relational model or can be easily
> > used to annotate such a database model (see another email I sent
recently).
> >
> > Cheers
> > Gerard
> >
> >>  	- Doug
> >>
> >>
> >>
> >>
> >> On Fri, 22 Nov 2013, Arnold Rots wrote:
> >>
> >>> I strongly object to this statement:
> >>>
> >>> "the data product may be the result of combining data from multiple
> >>> primary (physical) observations.  In this case the resulting data
> >>> product is a new processed "observation" to which a new unique
> >>> observation identifier should be assigned."
> >>>
> >>> We really need to distinguish clearly between Datasets and
Observations.
> >>> An Observation represents an operation that is characterized by a
> >>> configuration
> >>> - instrument characteristics, coordinate volume and properties,
> >>> calibration, etc.
> >>> A Dataset is a container of bytes that may have resulted from an
> >>> Observation (the byte stream that came out of the telescope or
> >>> various direct processing products of it), a simulation, or the
> >>> processing and analysis of (possibly a subset) of one or more parent
> Datasets.
> >>> Each Dataset also carries metadata detailing coordinate
> >>> characteristics, the nature of the Dataset and its components, and
> >>> its provenance regarding its parents.
> >>>
> >>> Blurring the line between Observations and Datasets and carelessly
> >>> forcing one to assume the characteristics of the other is going to
> >>> get us into major trouble.
> >>>
> >>> Cheers,
> >>>
> >>>   - Arnold
> >>>
> >>> --------------------------------------------------------------------
> >>> --
> >>> ----------
> >>> -----------------------------
> >>> Arnold H. Rots                                          Chandra
> >>> X-ray Science Center Smithsonian Astrophysical Observatory
> >>> tel:  +1 617 496
> >>> 7701
> >>> 60 Garden Street, MS 67                                      fax:
> >>> +1
> >>> 617
> >>> 495 7356
> >>> Cambridge, MA 02138
> >>> arots at cfa.harvard.edu
> >>> USA
> >>> http://hea-www.harvard.edu/~arots/
> >>> --------------------------------------------------------------------
> >>> --
> >>> ----------
> >>> ------------------------------
> >>>
> >>>
> >>>
> >>> On Thu, Nov 21, 2013 at 6:00 PM, CresitelloDittmar, Mark
> >>> <mdittmar at cfa.harvard.edu> wrote:
> >>>       All,
> >>>
> >>> I've been thinking about this and some comments Arnold made on the
> >>> Provenance thread which are closely related.
> >>>   1) there is general agreement that Observation *has* 0 or more
> >>> Datasets  (rather than *is* a Dataset)
> >>>
> >>>   2) Dataset can exist without an Observation (can be created by
> >>> something else).
> >>>
> >>>   3) The definition of Observation is pretty fuzzy, but lets assume
> >>> that there could be an "Analysis" or "Simulation" step which could
> >>> create a Dataset.  These may be parts of the larger domain that all
> >>> these objects live in, but are not modeled.  Currently, the ObsCore
> >>> model does say (pg 19) "the data product may be the result of
> >>> combining data from multiple primary (physical) observations.  In
> >>> this case the resulting data product is a new processed
> >>> "observation" to which a new unique observation identifier should be
> assigned."
> >>> So the relation of Dataset to 'the thing which created it', is not
> >>> clear to me yet.  I keep going back to the 'Experiment' concept in
> >>> Gerard's mail (provenance thread).
> >>>
> >>> I don't think that a Dataset should have a bi-directional relation
> >>> to the full Observation(s) as I noted at the head of this thread,
> >>> but should
> >>>   a) have an association back to components of the Observation (
> >>> ObsConfig, Proposal ) which become part of the Dataset 'provenance'.
> >>>       (which is what I think Arnold was saying in the other thread).
> >>>   b) have metadata identifying the relevant Observation(s)
> >>> comprising Dataset (DataID.ObservationID), as Francois notes.
> >>>       but this gets tricky because ObsCore expects a singular (well
> >>> unique) obs_id for each Dataset.
> >>>   c) if the Dataset were created by something else, then it would
> >>> add associations to components of those things holding the relevant
> >>> information to fold into the 'provenance'.  Like the progenitor
> >>> Datasets.
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Nov 15, 2013 at 9:59 AM, Arnold Rots <arots at cfa.harvard.edu>
> >>> wrote:
> >>>       If multiple observations have to be taken care of
> >>>       through provenance,
> >>> then why should a single observation not be handled the same way?
> >>> Don't get me wrong: I think neither should be handled through
> >>> provenance.
> >>>
> >>> Examples are: VLA multi-configuration images; stacked images;
> >>> multi-observation event files.
> >>>
> >>> It is much clearer and more intuitive if we just simply allow a
> >>> Dataset to be associated with multiple Observations.
> >>> Actually, I think this is absolutely a requirement.
> >>>
> >>>   - Arnold
> >>>
> >>> --------------------------------------------------------------------
> >>> --
> >>> ----------
> >>> -----------------------------
> >>> Arnold H. Rots
> >>> Chandra X-ray Science Center
> >>> Smithsonian Astrophysical Observatory                   tel:
> >>> +1 617 496 7701
> >>> 60 Garden Street, MS 67
> >>> fax:  +1 617 495 7356
> >>> Cambridge, MA 02138
> >>> arots at cfa.harvard.edu
> >>> USA
> >>> http://hea-www.harvard.edu/~arots/
> >>> --------------------------------------------------------------------
> >>> --
> >>> ----------
> >>> ------------------------------
> >>>
> >>>
> >>>
> >>> On Thu, Nov 14, 2013 at 6:29 PM, Douglas Tody <dtody at nrao.edu> wrote:
> >>>       On Thu, 14 Nov 2013, Arnold Rots wrote:
> >>>
> >>>                  >From this description I
> >>>                   am beginning to suspect
> >>>                   that a Dataset can be
> >>>
> >>>             derived from
> >>>             (associated with) no more than one
> >>>             Observation.
> >>>             That seems utterly wrong; multiple
> >>>             Observations can be combined into a
> >>>             single Dataset.
> >>>             Or did I misunderstand?
> >>>
> >>>
> >>> Multiple Observations can be and often are combined to produce a new
> >>> Dataset, however describing that history would be likely be the
> >>> responsibility of the Provenance model.  At the level of Observation
> >>> it would probably be a new "Observation" (or at least Dataset).
> >>> Depends upon how strict we are with the concept of Observation.
> >>>  The
> >>> CreationType and calibration level say something about it being a
> >>> synthesized/derived data product.
> >>>
> >>>       I think it is OK to require that a Dataset
> >>>       is associated with at least one
> >>>       Observation,
> >>>       provided that a model or simulation can be
> >>>       described as an Observation.
> >>>
> >>>
> >>> In practice that is what we are doing, to keep things simple;
> >>> DataSource can be something like "theory".
> >>>
> >>>         - Doug
> >>>
> >>>       Cheers,
> >>>
> >>>        - Arnold
> >>>
> >>>
> > ----------------------------------------------------------------------
> > ------
> > ----
> >>>       -----------------------------
> >>>       Arnold H. Rots
> >>>                    Chandra X-ray
> >>>       Science Center
> >>>       Smithsonian Astrophysical Observatory
> >>>                   tel:  +1 617 496
> >>>       7701
> >>>       60 Garden Street, MS 67
> >>>                        fax:  +1 617
> >>>       495 7356
> >>>       Cambridge, MA 02138
> >>>       arots at cfa.harvard.edu
> >>>       USA
> >>>       http://hea-www.harvard.edu/~arots/
> >>>
> > ----------------------------------------------------------------------
> > ------
> > ----
> >>>       ------------------------------
> >>>
> >>>
> >>>
> >>>       On Thu, Nov 14, 2013 at 12:08 PM,
> >>>       CresitelloDittmar, Mark <
> >>>       mdittmar at cfa.harvard.edu> wrote:
> >>>
> >>>             All,
> >>>               This thread is for discussion
> >>>             on the relation between
> >>>             Observation and
> >>>             Dataset.
> >>>
> >>>             ref: ObsCoreDM -
> >>>             http://www.ivoa.net/documents/ObsCore/20111028/index.html
> >>>             ref: diagram illustrating
> >>>             relation of Image/Spectral
> >>>             Observation to
> >>>             ObsCoreDM (draft)
> >>>
> >>>
> >> http://www.ivoa.net/pipermail/dm/attachments/20131113/c9ef7581/attach
> >> me
> >> nt-0001.p
> >>>             ng
> >>>
> >>>             motivation
> >>>               It is clear that there is a
> >>>             relationship between
> >>>             "Observation" and a
> >>>             more generic "Dataset".  This
> >>>             "Dataset" would contain
> >>>             elements such as the
> >>>             dataProductType, and
> >>>             dataProductSubtype, presumably
> >>>             others.  This object
> >>>             has not been formally defined.
> >>>
> >>>               In ObsCore, there is an
> >>>             implied relationship for
> >>>             Observation as an
> >>>             Extension of Dataset in the
> >>>             location of these attributes.
> >>>              So, I have
> >>>             always interpreted that
> >>>             Observation "is" a Dataset.
> >>>              This is reflected in
> >>>             my choice of the name
> >>>             "ObservationDataset" in the
> >>>             left hand package of my
> >>>             diagram.  It implies that it is
> >>>             a Dataset extended for
> >>>             Observation purposes.
> >>>
> >>>               Recent discussion brings this
> >>>             relationship into question,
> >>>             with
> >>>             assertions that an Observation
> >>>             can be associated with 0 or
> >>>             more Datasets.
> >>>
> >>>               This has real ramifications
> >>>             for the Image and Spectral
> >>>             models..
> >>>
> >>>             Seed:
> >>>
> >>>             If the relation is Observation
> >>>             "has" 0..* Dataset, then all
> >>>             the diagrams
> >>>             to date are wrong.
> >>>             It feels like this would be a
> >>>             fundamental change to all these
> >>>             models.
> >>>
> >>>               - there would need to be a
> >>>             bi-directional relation between
> >>>             Observation
> >>>             and Dataset
> >>>                    (observation has 0..*
> >>>             Dataset; Dataset associated
> >>>             with 1
> >>>             Observation)
> >>>                 Hmm.. since there can be
> >>>             Datasets not associated with
> >>>             Observations,
> >>>             this would
> >>>                 need to be a specialization
> >>>             of Dataset..
> >>>             (ObservationDataset.. but not
> >>>             the one in my diag.)
> >>>
> >>>               - the Char associated with
> >>>             Observation would characterize
> >>>             the total
> >>>             space of all included Datasets.
> >>>              (0..1) relation to
> >>>             Observation.  If no
> >>>             Datasets, no Char
> >>>
> >>>               - each Dataset would require
> >>>             it's own Characterisation,
> >>>             specific to it's
> >>>             space.
> >>>                 (so there is another
> >>>             attribute for Dataset).
> >>>
> >>>               - we would need to specify
> >>>             which of the elements are
> >>>             associated to the
> >>>             Dataset, and which to the
> >>>             Observation.  e.g. DataModel =>
> >>>             Dataset;  Target
> >>>             => Observation
> >>>
> >>>             Thoughts?
> >>>             Mark
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >
> >