[Observation] relation to Dataset
Douglas Tody
dtody at nrao.edu
Fri Nov 22 10:26:03 PST 2013
I agree that the generic Dataset is the more fundamental concept here:
an Observation is not a Dataset but a data product derived from an
Observation is a Dataset, and there can be Datasets that are not derived
from a single observation or any observation. - Doug
On Fri, 22 Nov 2013, Gerard Lemson wrote:
> Hi Doug and Arnold
>>
>> Hi Arnold -
>>
>> This is all true enough, although one could argue that some data products
>> resulting from analysis combining multiple other data products could be
>> considered a form of "software observation".
> I almost agree with Doug, but I think that "some data products ... MUST be
> considered" to be the result of a completely different kind of experiment,
> not an Observation, but for example a "Stacking Operation". Very different
> way to describe such an experiment, different kind of provenance, different
> parameters etc. Even though the dataset it produces can be considered to be
> an image. No problem to describe all of these using the simple, recursive
> pattern for a more comprehensive Provenance model I referred to in my
> earlier, somewhat ignored email. For details look under "domain model" and
> also under "simulation data model" on the DM page.
> And see the sketch for the Provenance model from GAVO that follows this
> pattern.
>
>> But the real reason we stretched
>> the concept a bit in ObsTAP was merely to be able to provide a single
> uniform
>> index (the ObsTAP index) for science data products in an archive.
>>
>> I agree that the observation-dataset modeling needs to be more
>> comprehensive; my guess is that this can be done in a relational fashion
> by
>> adding one or more additional models/tables to hold the additional
> metadata
>> and relationships. The relational model can easily represent the required
> many-
>> to-many relationship.
>>
> This is where one SHOULD (if not MUST) start from a proper VO-DML model.
> It is the appropriate language, developed in another IVOA effort also, for
> expressing data models of the complexity that is required. And it is easily
> mapped to for example a relational model or can be easily used to annotate
> such a database model (see another email I sent recently).
>
> Cheers
> Gerard
>
>> - Doug
>>
>>
>>
>>
>> On Fri, 22 Nov 2013, Arnold Rots wrote:
>>
>>> I strongly object to this statement:
>>>
>>> "the data product may be the result of combining data from multiple
>>> primary (physical) observations. In this case the resulting data
>>> product is a new processed "observation" to which a new unique
>>> observation identifier should be assigned."
>>>
>>> We really need to distinguish clearly between Datasets and Observations.
>>> An Observation represents an operation that is characterized by a
>>> configuration
>>> - instrument characteristics, coordinate volume and properties,
>>> calibration, etc.
>>> A Dataset is a container of bytes that may have resulted from an
>>> Observation (the byte stream that came out of the telescope or various
>>> direct processing products of it), a simulation, or the processing and
>>> analysis of (possibly a subset) of one or more parent Datasets.
>>> Each Dataset also carries metadata detailing coordinate
>>> characteristics, the nature of the Dataset and its components, and its
>>> provenance regarding its parents.
>>>
>>> Blurring the line between Observations and Datasets and carelessly
>>> forcing one to assume the characteristics of the other is going to get
>>> us into major trouble.
>>>
>>> Cheers,
>>>
>>> - Arnold
>>>
>>> ----------------------------------------------------------------------
>>> ----------
>>> -----------------------------
>>> Arnold H. Rots Chandra X-ray
>>> Science Center Smithsonian Astrophysical Observatory
>>> tel: +1 617 496
>>> 7701
>>> 60 Garden Street, MS 67 fax: +1
>>> 617
>>> 495 7356
>>> Cambridge, MA 02138
>>> arots at cfa.harvard.edu
>>> USA
>>> http://hea-www.harvard.edu/~arots/
>>> ----------------------------------------------------------------------
>>> ----------
>>> ------------------------------
>>>
>>>
>>>
>>> On Thu, Nov 21, 2013 at 6:00 PM, CresitelloDittmar, Mark
>>> <mdittmar at cfa.harvard.edu> wrote:
>>> All,
>>>
>>> I've been thinking about this and some comments Arnold made on the
>>> Provenance thread which are closely related.
>>> 1) there is general agreement that Observation *has* 0 or more
>>> Datasets (rather than *is* a Dataset)
>>>
>>> 2) Dataset can exist without an Observation (can be created by
>>> something else).
>>>
>>> 3) The definition of Observation is pretty fuzzy, but lets assume
>>> that there could be an "Analysis" or "Simulation" step which could
>>> create a Dataset. These may be parts of the larger domain that all
>>> these objects live in, but are not modeled. Currently, the ObsCore
>>> model does say (pg 19) "the data product may be the result of
>>> combining data from multiple primary (physical) observations. In this
>>> case the resulting data product is a new processed "observation" to
>>> which a new unique observation identifier should be assigned."
>>> So the relation of Dataset to 'the thing which created it', is not
>>> clear to me yet. I keep going back to the 'Experiment' concept in
>>> Gerard's mail (provenance thread).
>>>
>>> I don't think that a Dataset should have a bi-directional relation to
>>> the full Observation(s) as I noted at the head of this thread, but
>>> should
>>> a) have an association back to components of the Observation (
>>> ObsConfig, Proposal ) which become part of the Dataset 'provenance'.
>>> (which is what I think Arnold was saying in the other thread).
>>> b) have metadata identifying the relevant Observation(s) comprising
>>> Dataset (DataID.ObservationID), as Francois notes.
>>> but this gets tricky because ObsCore expects a singular (well
>>> unique) obs_id for each Dataset.
>>> c) if the Dataset were created by something else, then it would add
>>> associations to components of those things holding the relevant
>>> information to fold into the 'provenance'. Like the progenitor
>>> Datasets.
>>>
>>>
>>>
>>>
>>> On Fri, Nov 15, 2013 at 9:59 AM, Arnold Rots <arots at cfa.harvard.edu>
>>> wrote:
>>> If multiple observations have to be taken care of
>>> through provenance,
>>> then why should a single observation not be handled the same way?
>>> Don't get me wrong: I think neither should be handled through
>>> provenance.
>>>
>>> Examples are: VLA multi-configuration images; stacked images;
>>> multi-observation event files.
>>>
>>> It is much clearer and more intuitive if we just simply allow a
>>> Dataset to be associated with multiple Observations.
>>> Actually, I think this is absolutely a requirement.
>>>
>>> - Arnold
>>>
>>> ----------------------------------------------------------------------
>>> ----------
>>> -----------------------------
>>> Arnold H. Rots
>>> Chandra X-ray Science Center
>>> Smithsonian Astrophysical Observatory tel:
>>> +1 617 496 7701
>>> 60 Garden Street, MS 67
>>> fax: +1 617 495 7356
>>> Cambridge, MA 02138
>>> arots at cfa.harvard.edu
>>> USA
>>> http://hea-www.harvard.edu/~arots/
>>> ----------------------------------------------------------------------
>>> ----------
>>> ------------------------------
>>>
>>>
>>>
>>> On Thu, Nov 14, 2013 at 6:29 PM, Douglas Tody <dtody at nrao.edu> wrote:
>>> On Thu, 14 Nov 2013, Arnold Rots wrote:
>>>
>>> >From this description I
>>> am beginning to suspect
>>> that a Dataset can be
>>>
>>> derived from
>>> (associated with) no more than one
>>> Observation.
>>> That seems utterly wrong; multiple
>>> Observations can be combined into a
>>> single Dataset.
>>> Or did I misunderstand?
>>>
>>>
>>> Multiple Observations can be and often are combined to produce a new
>>> Dataset, however describing that history would be likely be the
>>> responsibility of the Provenance model. At the level of Observation
>>> it would probably be a new "Observation" (or at least Dataset).
>>> Depends upon how strict we are with the concept of Observation.
>>> The
>>> CreationType and calibration level say something about it being a
>>> synthesized/derived data product.
>>>
>>> I think it is OK to require that a Dataset
>>> is associated with at least one
>>> Observation,
>>> provided that a model or simulation can be
>>> described as an Observation.
>>>
>>>
>>> In practice that is what we are doing, to keep things simple;
>>> DataSource can be something like "theory".
>>>
>>> - Doug
>>>
>>> Cheers,
>>>
>>> - Arnold
>>>
>>>
> ----------------------------------------------------------------------------
> ----
>>> -----------------------------
>>> Arnold H. Rots
>>> Chandra X-ray
>>> Science Center
>>> Smithsonian Astrophysical Observatory
>>> tel: +1 617 496
>>> 7701
>>> 60 Garden Street, MS 67
>>> fax: +1 617
>>> 495 7356
>>> Cambridge, MA 02138
>>> arots at cfa.harvard.edu
>>> USA
>>> http://hea-www.harvard.edu/~arots/
>>>
> ----------------------------------------------------------------------------
> ----
>>> ------------------------------
>>>
>>>
>>>
>>> On Thu, Nov 14, 2013 at 12:08 PM,
>>> CresitelloDittmar, Mark <
>>> mdittmar at cfa.harvard.edu> wrote:
>>>
>>> All,
>>> This thread is for discussion
>>> on the relation between
>>> Observation and
>>> Dataset.
>>>
>>> ref: ObsCoreDM -
>>> http://www.ivoa.net/documents/ObsCore/20111028/index.html
>>> ref: diagram illustrating
>>> relation of Image/Spectral
>>> Observation to
>>> ObsCoreDM (draft)
>>>
>>>
>> http://www.ivoa.net/pipermail/dm/attachments/20131113/c9ef7581/attachme
>> nt-0001.p
>>> ng
>>>
>>> motivation
>>> It is clear that there is a
>>> relationship between
>>> "Observation" and a
>>> more generic "Dataset". This
>>> "Dataset" would contain
>>> elements such as the
>>> dataProductType, and
>>> dataProductSubtype, presumably
>>> others. This object
>>> has not been formally defined.
>>>
>>> In ObsCore, there is an
>>> implied relationship for
>>> Observation as an
>>> Extension of Dataset in the
>>> location of these attributes.
>>> So, I have
>>> always interpreted that
>>> Observation "is" a Dataset.
>>> This is reflected in
>>> my choice of the name
>>> "ObservationDataset" in the
>>> left hand package of my
>>> diagram. It implies that it is
>>> a Dataset extended for
>>> Observation purposes.
>>>
>>> Recent discussion brings this
>>> relationship into question,
>>> with
>>> assertions that an Observation
>>> can be associated with 0 or
>>> more Datasets.
>>>
>>> This has real ramifications
>>> for the Image and Spectral
>>> models..
>>>
>>> Seed:
>>>
>>> If the relation is Observation
>>> "has" 0..* Dataset, then all
>>> the diagrams
>>> to date are wrong.
>>> It feels like this would be a
>>> fundamental change to all these
>>> models.
>>>
>>> - there would need to be a
>>> bi-directional relation between
>>> Observation
>>> and Dataset
>>> (observation has 0..*
>>> Dataset; Dataset associated
>>> with 1
>>> Observation)
>>> Hmm.. since there can be
>>> Datasets not associated with
>>> Observations,
>>> this would
>>> need to be a specialization
>>> of Dataset..
>>> (ObservationDataset.. but not
>>> the one in my diag.)
>>>
>>> - the Char associated with
>>> Observation would characterize
>>> the total
>>> space of all included Datasets.
>>> (0..1) relation to
>>> Observation. If no
>>> Datasets, no Char
>>>
>>> - each Dataset would require
>>> it's own Characterisation,
>>> specific to it's
>>> space.
>>> (so there is another
>>> attribute for Dataset).
>>>
>>> - we would need to specify
>>> which of the elements are
>>> associated to the
>>> Dataset, and which to the
>>> Observation. e.g. DataModel =>
>>> Dataset; Target
>>> => Observation
>>>
>>> Thoughts?
>>> Mark
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>
More information about the dm
mailing list