[Observation] relation to Dataset

Fri Nov 22 07:04:54 PST 2013

I strongly object to this statement:

"the data product may be the result of combining data from multiple primary
(physical) observations.  In this case the resulting data product is a new
processed "observation" to which a new unique observation identifier should
be assigned."

We really need to distinguish clearly between Datasets and Observations.
An Observation represents an operation that is characterized by a
configuration
- instrument characteristics, coordinate volume and properties,
calibration, etc.
A Dataset is a container of bytes that may have resulted from an Observation
(the byte stream that came out of the telescope or various direct processing
products of it), a simulation, or the processing and analysis of (possibly
a subset)
of one or more parent Datasets.
Each Dataset also carries metadata detailing coordinate characteristics,
the nature
of the Dataset and its components, and its provenance regarding its parents.

Blurring the line between Observations and Datasets and carelessly forcing
one
to assume the characteristics of the other is going to get us into major
trouble.

Cheers,

  - Arnold

-------------------------------------------------------------------------------------------------------------
Arnold H. Rots                                          Chandra X-ray
Science Center
Smithsonian Astrophysical Observatory                   tel:  +1 617 496
7701
60 Garden Street, MS 67                                      fax:  +1 617
495 7356
Cambridge, MA 02138
arots at cfa.harvard.edu
USA
http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------------------------------------------

On Thu, Nov 21, 2013 at 6:00 PM, CresitelloDittmar, Mark <
mdittmar at cfa.harvard.edu> wrote:

> All,
>
> I've been thinking about this and some comments Arnold made on the
> Provenance thread which are closely related.
>   1) there is general agreement that Observation *has* 0 or more Datasets
> (rather than *is* a Dataset)
>
>   2) Dataset can exist without an Observation (can be created by something
> else).
>
>   3) The definition of Observation is pretty fuzzy, but lets assume that
> there could be an "Analysis" or "Simulation" step which could create a
> Dataset.  These may be parts of the larger domain that all these objects
> live in, but are not modeled.  Currently, the ObsCore model does say (pg
> 19) "the data product may be the result of combining data from multiple
> primary (physical) observations.  In this case the resulting data product
> is a new processed "observation" to which a new unique observation
> identifier should be assigned."
> So the relation of Dataset to 'the thing which created it', is not clear
> to me yet.  I keep going back to the 'Experiment' concept in Gerard's mail
> (provenance thread).
>
> I don't think that a Dataset should have a bi-directional relation to the
> full Observation(s) as I noted at the head of this thread, but should
>   a) have an association back to components of the Observation (
> ObsConfig, Proposal ) which become part of the Dataset 'provenance'.
>       (which is what I think Arnold was saying in the other thread).
>   b) have metadata identifying the relevant Observation(s) comprising
> Dataset (DataID.ObservationID), as Francois notes.
>       but this gets tricky because ObsCore expects a singular (well
> unique) obs_id for each Dataset.
>   c) if the Dataset were created by something else, then it would add
> associations to components of those things holding the relevant information
> to fold into the 'provenance'.  Like the progenitor Datasets.
>
>
>
>
> On Fri, Nov 15, 2013 at 9:59 AM, Arnold Rots <arots at cfa.harvard.edu>wrote:
>
>> If multiple observations have to be taken care of through provenance,
>> then why should a single observation not be handled the same way?
>> Don't get me wrong: I think neither should be handled through provenance.
>>
>> Examples are: VLA multi-configuration images; stacked images;
>> multi-observation event files.
>>
>> It is much clearer and more intuitive if we just simply allow a Dataset
>> to be associated with multiple Observations.
>> Actually, I think this is absolutely a requirement.
>>
>>   - Arnold
>>
>>
>> -------------------------------------------------------------------------------------------------------------
>> Arnold H. Rots                                          Chandra X-ray
>> Science Center
>> Smithsonian Astrophysical Observatory                   tel:  +1 617 496
>> 7701
>> 60 Garden Street, MS 67                                      fax:  +1
>> 617 495 7356
>> Cambridge, MA 02138
>> arots at cfa.harvard.edu
>> USA
>> http://hea-www.harvard.edu/~arots/
>>
>> --------------------------------------------------------------------------------------------------------------
>>
>>
>>
>> On Thu, Nov 14, 2013 at 6:29 PM, Douglas Tody <dtody at nrao.edu> wrote:
>>
>>> On Thu, 14 Nov 2013, Arnold Rots wrote:
>>>
>>>  From this description I am beginning to suspect that a Dataset can be
>>>>>
>>>> derived from
>>>> (associated with) no more than one Observation.
>>>> That seems utterly wrong; multiple Observations can be combined into a
>>>> single Dataset.
>>>> Or did I misunderstand?
>>>>
>>>
>>> Multiple Observations can be and often are combined to produce a new
>>> Dataset, however describing that history would be likely be the
>>> responsibility of the Provenance model.  At the level of Observation it
>>> would probably be a new "Observation" (or at least Dataset).  Depends
>>> upon how strict we are with the concept of Observation.  The
>>> CreationType and calibration level say something about it being a
>>> synthesized/derived data product.
>>>
>>>
>>>  I think it is OK to require that a Dataset is associated with at least
>>>> one
>>>> Observation,
>>>> provided that a model or simulation can be described as an Observation.
>>>>
>>>
>>> In practice that is what we are doing, to keep things simple; DataSource
>>> can be something like "theory".
>>>
>>>         - Doug
>>>
>>>
>>>  Cheers,
>>>>
>>>>  - Arnold
>>>>
>>>> ------------------------------------------------------------
>>>> -------------------------------------------------
>>>> Arnold H. Rots                                          Chandra X-ray
>>>> Science Center
>>>> Smithsonian Astrophysical Observatory                   tel:  +1 617 496
>>>> 7701
>>>> 60 Garden Street, MS 67                                      fax:  +1
>>>> 617
>>>> 495 7356
>>>> Cambridge, MA 02138
>>>> arots at cfa.harvard.edu
>>>> USA
>>>> http://hea-www.harvard.edu/~arots/
>>>> ------------------------------------------------------------
>>>> --------------------------------------------------
>>>>
>>>>
>>>>
>>>> On Thu, Nov 14, 2013 at 12:08 PM, CresitelloDittmar, Mark <
>>>> mdittmar at cfa.harvard.edu> wrote:
>>>>
>>>>  All,
>>>>>   This thread is for discussion on the relation between Observation and
>>>>> Dataset.
>>>>>
>>>>> ref: ObsCoreDM - http://www.ivoa.net/documents/
>>>>> ObsCore/20111028/index.html
>>>>> ref: diagram illustrating relation of Image/Spectral Observation to
>>>>> ObsCoreDM (draft)
>>>>>
>>>>> http://www.ivoa.net/pipermail/dm/attachments/20131113/
>>>>> c9ef7581/attachment-0001.png
>>>>>
>>>>> motivation
>>>>>   It is clear that there is a relationship between "Observation" and a
>>>>> more generic "Dataset".  This "Dataset" would contain elements such as
>>>>> the
>>>>> dataProductType, and dataProductSubtype, presumably others.  This
>>>>> object
>>>>> has not been formally defined.
>>>>>
>>>>>   In ObsCore, there is an implied relationship for Observation as an
>>>>> Extension of Dataset in the location of these attributes.  So, I have
>>>>> always interpreted that Observation "is" a Dataset.  This is reflected
>>>>> in
>>>>> my choice of the name "ObservationDataset" in the left hand package of
>>>>> my
>>>>> diagram.  It implies that it is a Dataset extended for Observation
>>>>> purposes.
>>>>>
>>>>>   Recent discussion brings this relationship into question, with
>>>>> assertions that an Observation can be associated with 0 or more
>>>>> Datasets.
>>>>>
>>>>>   This has real ramifications for the Image and Spectral models..
>>>>>
>>>>> Seed:
>>>>>
>>>>> If the relation is Observation "has" 0..* Dataset, then all the
>>>>> diagrams
>>>>> to date are wrong.
>>>>> It feels like this would be a fundamental change to all these models.
>>>>>
>>>>>   - there would need to be a bi-directional relation between
>>>>> Observation
>>>>> and Dataset
>>>>>        (observation has 0..* Dataset; Dataset associated with 1
>>>>> Observation)
>>>>>     Hmm.. since there can be Datasets not associated with Observations,
>>>>> this would
>>>>>     need to be a specialization of Dataset.. (ObservationDataset.. but
>>>>> not
>>>>> the one in my diag.)
>>>>>
>>>>>   - the Char associated with Observation would characterize the total
>>>>> space of all included Datasets.  (0..1) relation to Observation.  If no
>>>>> Datasets, no Char
>>>>>
>>>>>   - each Dataset would require it's own Characterisation, specific to
>>>>> it's
>>>>> space.
>>>>>     (so there is another attribute for Dataset).
>>>>>
>>>>>   - we would need to specify which of the elements are associated to
>>>>> the
>>>>> Dataset, and which to the Observation.  e.g. DataModel => Dataset;
>>>>>  Target
>>>>> => Observation
>>>>>
>>>>> Thoughts?
>>>>> Mark
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/dm/attachments/20131122/003021fb/attachment-0001.html>