[Observation] relation to Dataset

Fri Nov 22 10:26:03 PST 2013

I agree that the generic Dataset is the more fundamental concept here:
an Observation is not a Dataset but a data product derived from an
Observation is a Dataset, and there can be Datasets that are not derived
from a single observation or any observation.   - Doug

On Fri, 22 Nov 2013, Gerard Lemson wrote:

> Hi Doug and Arnold
>>
>> Hi Arnold -
>>
>> This is all true enough, although one could argue that some data products
>> resulting from analysis combining multiple other data products could be
>> considered a form of "software observation".
> I almost agree with Doug, but I think that "some data products ... MUST be
> considered" to be the result of a completely different kind of experiment,
> not an Observation, but for example a "Stacking Operation". Very different
> way to describe such an experiment, different kind of provenance, different
> parameters etc. Even though the dataset it produces can be considered to be
> an image. No problem to describe all of these using the simple, recursive
> pattern for a more comprehensive Provenance model I referred to in my
> earlier, somewhat ignored email. For details look under "domain model" and
> also under "simulation data model" on the DM page.
> And see the sketch for the Provenance model from GAVO that follows this
> pattern.
>
>> But the real reason we stretched
>> the concept a bit in ObsTAP was merely to be able to provide a single
> uniform
>> index (the ObsTAP index) for science data products in an archive.
>>
>> I agree that the observation-dataset modeling needs to be more
>> comprehensive; my guess is that this can be done in a relational fashion
> by
>> adding one or more additional models/tables to hold the additional
> metadata
>> and relationships.  The relational model can easily represent the required
> many-
>> to-many relationship.
>>
> This is where one SHOULD (if not MUST) start from a proper VO-DML model.
> It is the appropriate language, developed in another IVOA effort also, for
> expressing data models of the complexity that is required. And it is easily
> mapped to for example a relational model or can be easily used to annotate
> such a database model (see another email I sent recently).
>
> Cheers
> Gerard
>
>>  	- Doug
>>
>>
>>
>>
>> On Fri, 22 Nov 2013, Arnold Rots wrote:
>>
>>> I strongly object to this statement:
>>>
>>> "the data product may be the result of combining data from multiple
>>> primary (physical) observations.  In this case the resulting data
>>> product is a new processed "observation" to which a new unique
>>> observation identifier should be assigned."
>>>
>>> We really need to distinguish clearly between Datasets and Observations.
>>> An Observation represents an operation that is characterized by a
>>> configuration
>>> - instrument characteristics, coordinate volume and properties,
>>> calibration, etc.
>>> A Dataset is a container of bytes that may have resulted from an
>>> Observation (the byte stream that came out of the telescope or various
>>> direct processing products of it), a simulation, or the processing and
>>> analysis of (possibly a subset) of one or more parent Datasets.
>>> Each Dataset also carries metadata detailing coordinate
>>> characteristics, the nature of the Dataset and its components, and its
>>> provenance regarding its parents.
>>>
>>> Blurring the line between Observations and Datasets and carelessly
>>> forcing one to assume the characteristics of the other is going to get
>>> us into major trouble.
>>>
>>> Cheers,
>>>
>>>   - Arnold
>>>
>>> ----------------------------------------------------------------------
>>> ----------
>>> -----------------------------
>>> Arnold H. Rots                                          Chandra X-ray
>>> Science Center Smithsonian Astrophysical Observatory
>>> tel:  +1 617 496
>>> 7701
>>> 60 Garden Street, MS 67                                      fax:  +1
>>> 617
>>> 495 7356
>>> Cambridge, MA 02138
>>> arots at cfa.harvard.edu
>>> USA
>>> http://hea-www.harvard.edu/~arots/
>>> ----------------------------------------------------------------------
>>> ----------
>>> ------------------------------
>>>
>>>
>>>
>>> On Thu, Nov 21, 2013 at 6:00 PM, CresitelloDittmar, Mark
>>> <mdittmar at cfa.harvard.edu> wrote:
>>>       All,
>>>
>>> I've been thinking about this and some comments Arnold made on the
>>> Provenance thread which are closely related.
>>>   1) there is general agreement that Observation *has* 0 or more
>>> Datasets  (rather than *is* a Dataset)
>>>
>>>   2) Dataset can exist without an Observation (can be created by
>>> something else).
>>>
>>>   3) The definition of Observation is pretty fuzzy, but lets assume
>>> that there could be an "Analysis" or "Simulation" step which could
>>> create a Dataset.  These may be parts of the larger domain that all
>>> these objects live in, but are not modeled.  Currently, the ObsCore
>>> model does say (pg 19) "the data product may be the result of
>>> combining data from multiple primary (physical) observations.  In this
>>> case the resulting data product is a new processed "observation" to
>>> which a new unique observation identifier should be assigned."
>>> So the relation of Dataset to 'the thing which created it', is not
>>> clear to me yet.  I keep going back to the 'Experiment' concept in
>>> Gerard's mail (provenance thread).
>>>
>>> I don't think that a Dataset should have a bi-directional relation to
>>> the full Observation(s) as I noted at the head of this thread, but
>>> should
>>>   a) have an association back to components of the Observation (
>>> ObsConfig, Proposal ) which become part of the Dataset 'provenance'.
>>>       (which is what I think Arnold was saying in the other thread).
>>>   b) have metadata identifying the relevant Observation(s) comprising
>>> Dataset (DataID.ObservationID), as Francois notes.
>>>       but this gets tricky because ObsCore expects a singular (well
>>> unique) obs_id for each Dataset.
>>>   c) if the Dataset were created by something else, then it would add
>>> associations to components of those things holding the relevant
>>> information to fold into the 'provenance'.  Like the progenitor
>>> Datasets.
>>>
>>>
>>>
>>>
>>> On Fri, Nov 15, 2013 at 9:59 AM, Arnold Rots <arots at cfa.harvard.edu>
>>> wrote:
>>>       If multiple observations have to be taken care of
>>>       through provenance,
>>> then why should a single observation not be handled the same way?
>>> Don't get me wrong: I think neither should be handled through
>>> provenance.
>>>
>>> Examples are: VLA multi-configuration images; stacked images;
>>> multi-observation event files.
>>>
>>> It is much clearer and more intuitive if we just simply allow a
>>> Dataset to be associated with multiple Observations.
>>> Actually, I think this is absolutely a requirement.
>>>
>>>   - Arnold
>>>
>>> ----------------------------------------------------------------------
>>> ----------
>>> -----------------------------
>>> Arnold H. Rots
>>> Chandra X-ray Science Center
>>> Smithsonian Astrophysical Observatory                   tel:
>>> +1 617 496 7701
>>> 60 Garden Street, MS 67
>>> fax:  +1 617 495 7356
>>> Cambridge, MA 02138
>>> arots at cfa.harvard.edu
>>> USA
>>> http://hea-www.harvard.edu/~arots/
>>> ----------------------------------------------------------------------
>>> ----------
>>> ------------------------------
>>>
>>>
>>>
>>> On Thu, Nov 14, 2013 at 6:29 PM, Douglas Tody <dtody at nrao.edu> wrote:
>>>       On Thu, 14 Nov 2013, Arnold Rots wrote:
>>>
>>>                  >From this description I
>>>                   am beginning to suspect
>>>                   that a Dataset can be
>>>
>>>             derived from
>>>             (associated with) no more than one
>>>             Observation.
>>>             That seems utterly wrong; multiple
>>>             Observations can be combined into a
>>>             single Dataset.
>>>             Or did I misunderstand?
>>>
>>>
>>> Multiple Observations can be and often are combined to produce a new
>>> Dataset, however describing that history would be likely be the
>>> responsibility of the Provenance model.  At the level of Observation
>>> it would probably be a new "Observation" (or at least Dataset).
>>> Depends upon how strict we are with the concept of Observation.
>>>  The
>>> CreationType and calibration level say something about it being a
>>> synthesized/derived data product.
>>>
>>>       I think it is OK to require that a Dataset
>>>       is associated with at least one
>>>       Observation,
>>>       provided that a model or simulation can be
>>>       described as an Observation.
>>>
>>>
>>> In practice that is what we are doing, to keep things simple;
>>> DataSource can be something like "theory".
>>>
>>>         - Doug
>>>
>>>       Cheers,
>>>
>>>        - Arnold
>>>
>>>
> ----------------------------------------------------------------------------
> ----
>>>       -----------------------------
>>>       Arnold H. Rots
>>>                    Chandra X-ray
>>>       Science Center
>>>       Smithsonian Astrophysical Observatory
>>>                   tel:  +1 617 496
>>>       7701
>>>       60 Garden Street, MS 67
>>>                        fax:  +1 617
>>>       495 7356
>>>       Cambridge, MA 02138
>>>       arots at cfa.harvard.edu
>>>       USA
>>>       http://hea-www.harvard.edu/~arots/
>>>
> ----------------------------------------------------------------------------
> ----
>>>       ------------------------------
>>>
>>>
>>>
>>>       On Thu, Nov 14, 2013 at 12:08 PM,
>>>       CresitelloDittmar, Mark <
>>>       mdittmar at cfa.harvard.edu> wrote:
>>>
>>>             All,
>>>               This thread is for discussion
>>>             on the relation between
>>>             Observation and
>>>             Dataset.
>>>
>>>             ref: ObsCoreDM -
>>>             http://www.ivoa.net/documents/ObsCore/20111028/index.html
>>>             ref: diagram illustrating
>>>             relation of Image/Spectral
>>>             Observation to
>>>             ObsCoreDM (draft)
>>>
>>>
>> http://www.ivoa.net/pipermail/dm/attachments/20131113/c9ef7581/attachme
>> nt-0001.p
>>>             ng
>>>
>>>             motivation
>>>               It is clear that there is a
>>>             relationship between
>>>             "Observation" and a
>>>             more generic "Dataset".  This
>>>             "Dataset" would contain
>>>             elements such as the
>>>             dataProductType, and
>>>             dataProductSubtype, presumably
>>>             others.  This object
>>>             has not been formally defined.
>>>
>>>               In ObsCore, there is an
>>>             implied relationship for
>>>             Observation as an
>>>             Extension of Dataset in the
>>>             location of these attributes.
>>>              So, I have
>>>             always interpreted that
>>>             Observation "is" a Dataset.
>>>              This is reflected in
>>>             my choice of the name
>>>             "ObservationDataset" in the
>>>             left hand package of my
>>>             diagram.  It implies that it is
>>>             a Dataset extended for
>>>             Observation purposes.
>>>
>>>               Recent discussion brings this
>>>             relationship into question,
>>>             with
>>>             assertions that an Observation
>>>             can be associated with 0 or
>>>             more Datasets.
>>>
>>>               This has real ramifications
>>>             for the Image and Spectral
>>>             models..
>>>
>>>             Seed:
>>>
>>>             If the relation is Observation
>>>             "has" 0..* Dataset, then all
>>>             the diagrams
>>>             to date are wrong.
>>>             It feels like this would be a
>>>             fundamental change to all these
>>>             models.
>>>
>>>               - there would need to be a
>>>             bi-directional relation between
>>>             Observation
>>>             and Dataset
>>>                    (observation has 0..*
>>>             Dataset; Dataset associated
>>>             with 1
>>>             Observation)
>>>                 Hmm.. since there can be
>>>             Datasets not associated with
>>>             Observations,
>>>             this would
>>>                 need to be a specialization
>>>             of Dataset..
>>>             (ObservationDataset.. but not
>>>             the one in my diag.)
>>>
>>>               - the Char associated with
>>>             Observation would characterize
>>>             the total
>>>             space of all included Datasets.
>>>              (0..1) relation to
>>>             Observation.  If no
>>>             Datasets, no Char
>>>
>>>               - each Dataset would require
>>>             it's own Characterisation,
>>>             specific to it's
>>>             space.
>>>                 (so there is another
>>>             attribute for Dataset).
>>>
>>>               - we would need to specify
>>>             which of the elements are
>>>             associated to the
>>>             Dataset, and which to the
>>>             Observation.  e.g. DataModel =>
>>>             Dataset;  Target
>>>             => Observation
>>>
>>>             Thoughts?
>>>             Mark
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>