Vocabularising dataproduct_type

Tue Mar 24 11:04:24 CET 2020

Hi all,

I have read again the various inputs along this thread.
I would like to summarize a bit and propose a way to harmonize the various
existing sets of terms into the new vocabulary and help for backward 
compatibility.

Sorry if this is a bit long. Just to recap.
I understand there are various uses cases:

*1- Data discovery*
The ObsCore specification defines terms for the type of data products 
resulting
from one or multiple observations.
Its main focus is data discovery across multiple archives and data centers.
It is mainly used with the TAP protocol .
Terms defined in Obscore1.1 are:
image, cube, spectrum, sed, timeseries, visibility, event or measurements.

The term 'catalog' was discarded, because ObsCore was not targeting at
discovering all sky catalogs.
Catalog services like Vizier, ESO, Mast, etc.  serve source catalogs as 
tables
with a large variety of features that ObsCore is not describing in its 
metadata
profile.
The term 'measurements' was selected to represent any measurements derived
from an observed image or cube by some processing, and namely a list of
extracted source.
Somehow it corresponds to a source catalog restricted to the field of 
view of
some observed field on the sky.

Planetary data has a wider set of dataproducts to discover, and can gather
different types of products within a 'granule' as far as I understand ,
therefore EPN-Core has defined a longer list of terms, together with some
concatenation mechanism.

*2- **Specifying dataproduct types managed by services*
Registry entries for DAL services need to expose the type of data products
used or generated by a service.
One example of service described in Registry is SPLAT, a VO tool which can
visualize curves as one or several functions from a 1D Vector holding
time or spectral coordinates: freq, wavelength, energy.
For this Obscore dataproduct_type labels can be reused:
sed, spectrum, timeseries.

A registry entry (service) dealing with catalogs (all sky)
needs the "catalog" term.

A service generating the Hips data structure for an all sky catalog
should mention the output's dataproduct_type : catalog ,
and the data structure: HiPS.

*3- Designate the type of dataset pointed via a datalink*
The nature of the data associated with a datalink entry :
what is at the end of the link.
For Instance:
Muse IFU datacube --> datalink / derived --> source list
dataproduct_type=measurements
                   --> datalink / derived --> detection probability map
dataproduct_type=image

*4- Use the dataproduct type for building and sending a SAMP message to an*
*appropriate VO tool*
S. Erard message mentions this for EPNTAP.
When Aladin, overplots sources from a Tap query on top of an image, the 
query
response comes back as a table with some columns as datalink items.
Here also the dataproduct type is needed to select the appropriate API and
send a SAMP message with the accessurl for visualisation or further 
analysis.

In order to fulfill the 4 derived requirements and take into account 
existing usage ,
I propose this suggestion :
lets define
- /'source list'/ for gathering sources extracted from one or a small set of
observations, like multiband images restricted to a region, a radio cube, an
  event list, etc.
- /'catalog'/  for allsky or multi regions source lists
This is already the term used at CADC, as Pat mentionned.
We can derive these terms from 'measurements' in ObsCore for compatibility.

The various vocabularies can then be organised as :
- /IVOA dataproduct_type Vocabulary/ extends /Obscore dataproduct_type/ 
definitions.
- /EPNCore/ reuses /IVOA dataproduct_type Vocabulary /concepts with its 
own labels,
and its own concatenation rules.

I think this would help if the IVOA dataproduct_type Vocabulary had an 
extra attribute to mention
if it is the Obscore original terms or a derived one or a new one.
Just to make dependencies explicit when we consider new terms for 
addition later in the future.

Thanks for your reading, and your further comments .

Cheers, Mireille.

Le 18/03/2020 à 08:42, Markus Demleitner a écrit :
> Dear François,
>
> On Tue, Mar 17, 2020 at 09:41:42PM +0100, François Bonnarel wrote:
>>       ObsCore/ObsTAP is for discovery of datasets which have time, spatial,
>> spectral and polarisation axes. Selection on the ObsCore parameters is not
>> sufficient for catalogs with plenty of other parameters which are directly
>> queried via TAP (or even in the registry). So we had to find another word
>> for these specific tables extracted from the datasets in order to not let
>> beleive that ObsCore is a discovery model for any kind of catalog. Hence
>> "measurements"
> That reconfirms that the actual question is: What kind of catalog (or
> whatever) would you include into the concept "Measurement", and what
> would be out?
>
>>       So I think we should keep "measurements" but not with the negative
>> definition "Generic tabular data not fitting any of the other terms.
>> Because
>>    of its lack of specificity, this term should generally be avoided,  and
>> new, more precise terms should be introduced instead" any of the others will
>> fit I think and yes we have to keep ascendant compatibility with obsCore.
> We'll still need to have a definition; while the term is just a label
> and doesn't really matter, the definition delineates the concept, and
> while it can later be adjusted to better fit the actual use, it needs
> to say what entity is and, in particular, is not part of the concept.
>
> Hence "Generic tabular data" is not a good definition, in particular
> because at least spectra, time series, and events arguably are
> tabular data and thus ought to be children of measurements.  But
> that, I'm sure, is not useful for what Measurements was introduced
> for.
>
> That's why I'm proposing this hedging language.  Once we see what
> people actually want to use this for (and why they want it), I'd
> expect we can come up with a useful concept to slap the term
> "Measurements" on.
>
> On the other hand, if someone proposes a description that
>
> * says something like "it's tabular data",
> * keeps spectra, time series, and events out of the concept
> * and still has a plausible, useful extension (i.e., there are
>    actually datasets that people will want to look for that are part
>    of the concept)
>
> (are we agreed on something like these criteria?) I'll happily put it
> in, of course.
>
>>          We may imagine have "spectrochronogram" and "sed" as children
>> elements of spectrum. "timecube" or "spectralcube" as children from cubes.
>>
>>          This will be clear in the vocabulary page but ObsCore will manage
>> that at the same level in dataproduct_type (exactly like we already have sed
>> and spectrum in parallel today)
>>
>>         Thoughts ?
> The nice thing is that, if Vocabularies 2 pans out the way I hope it
> will, we don't have to think about that now.  As clients come along
> that will want this kind of distinction, we can figure out what
> structure best covers their needs.
>
> Meanwhile, the vocabulary is clear that anything with 3 or more axes
> is a cube, and clients looking for data with special axes ("time
> cube", "spectral cube") can use the *xel columns.  Whether other uses
> will make futher distinctions in the vocabulary desirable we can, I
> claim, confidently leave to the future.
>
> The problem at this point is non-image 2D data.  If that's itching
> someone *now*, let's have a separate thread.
>
>> C ) Miscelaneous.
>>
>>        If this vocabulary is to be used in various contexts (and indeed it
>> is) why do we link it to SimpleDALRegExt ? Vocabularies 2.0 is proposing to
>> manage vocabularies as endorsed notes. Why don't we do it this way and refer
>> it from  SimpleDAL, ObsCore, DataLink, etc ...
> The reason I'd like to link vocabularies to a concrete REC is that
> there should be some sort of RFC for them.  It's conceivable to have
> this RFC as part of an EN, and that might be what it takes of,
> perhaps, the UAT or the object types.
>
> But an extra document always is a liability (who maintains it?  who
> should read it?  what, indeed, would it say in this case?).  If we
> can avoid it, we should.
>
> As to citing a vocabulary later, I'm sure you only should say "The
> vocabulary \url{http://www.ivoa.net/rdf/product-type}" or, if you're
> against URLs in running text "The IVOA dataproduct type
> vocabulary\footnote{\url{http://www.ivoa.net/rdf/product-type}}".
> Which is a good point -- I'll add that to the Voc2 WD.
>
>
>           -- Markus
>
-- 
--
Mireille Louys,  MCF (Associate Professor)
CDS				IPSEO, Images, Laboratoire Icube
Observatoire de Strasbourg	Telecom Physique Strasbourg
11 rue de l'Université		300, Bd Sebastien Brandt CS 10413
F- 67000-STRASBOURG		F-67412 ILLKIRCH Cedex
Tel: +33 3 68 85 24 34

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20200324/9e9909d5/attachment.html>