Obscore dataproduct_type question

Douglas Tody dtody at nrao.edu
Thu May 12 00:29:23 CEST 2016


This looks very logical, however the values allowed for dataproduct_type 
form a controlled vocabulary - only the values specified in the spec or
"null" may be used.  This was done so that generic TAP queries can
reliably find all data of a specific generic class.  dataproduct_subtype
however is *not* a controlled vocabulary, and can be used to specify in
data provider terms what the data product actually is.  So for example
movie and volume are instances of the generic class cube, but subtype
could be specified as movie or volume.  A query looking for all data of
type "cube" would find all of these, and the subtype would provide more
detailed information on the type of data.  If the client knows the
specific archive being queried and wants only "movie" data, they could
query for only that subtype.

>From the spec (3.3.1): "One of the defined dataproduct_type values must
be used if appropriate for the data product in question, otherwise a
NULL value is permitted and a more precise definition of the data
product type should be given in dataproduct_subtype."  The text goes
on to note that both type and subtype may be given.

I don't want to get involved in re-litigation of all of this (it was
discussed extensively back in the original author group), I'm just
clarifying the spec and the intended usage.  In the case under
discussion "table" is the generic class of data, but of course there are
many subtypes of tables, just as there are many subtypes of images,
spectra, cubes, etc.  The science type is really the subtype or maybe
UCD, whereas type is more a generic computer datatype.  - Doug



On Wed, 11 May 2016, Baptiste Cecconi wrote:

> The dataproduct_type allowed values in EPNcore are (at the moment, and it may change by new additions, we are currently assessing this):
> - image: object depending on 2 spatial dimensions
> - spectrum: object depending on 1 spectral dimension
> - timeseries: object depending on 1 temporal dimension
> - catalog: list of object not depending on a specific dimension
> - spectral cube: object depending on 1 spectral and 2 spatial dimensions
> - dynamic spectrum: object depending on 1 spectral and 1 temporal
> - profile: object depending on 1 spatial dimension
> - cube: object depending on several dimensions (generic)
> - movie: object depending on 1 temporal and 2 spatial dimensions
> - volume: object depending on 3 spatial dimensions
> - spatial_vector: sparse object depending on 2 or 3 spatial dimensions (not sure now what we meant, for this one :-)
>
> The mentioned dimension dependences are those that are considered as primary dimensions by the provider. It is usually the dimensions of the sampling axes, but not necessarily. This can be used to define how interoperable tools will treat the data.
>
> So to add my contribution to this discussion, I would vote for "catalog" rather than "sourcelist", which seems too specific to me. In our EPN-TAP services we already have a few ones with dataproduct_type =  "catalog" :
> - The exoplanet.eu <http://exoplanet.eu/> catalog
> - The HELIO Feature Catalog of Active Regions
> - The HELIO Feature Catalog of Coronal Holes
> - The HELIO Feature Catalog of Solar Radio Bursts
> - The NASA Dust Samples Catalog
> - The Main characteristics of solar system planets
>
> Cheers
> Baptiste
>
>
>> Le 11 mai 2016 à 16:02, Laurent MICHEL <laurent.michel at astro.unistra.fr> a écrit :
>>
>> Hello Dough,
>>
>> Le 09/05/2016 01:19, Douglas Tody a écrit :
>>> These are good points.  Note, dataproduct_type is coarse grain, e.g.,
>>> at this level we say just "image" not what type of image.  "Catalog"
>>> as I suggested earlier is also too fine grained.  So if we add support
>>> for table data the value for dataproduct_type should be "table".
>>
>> To me, "table" is too coarse. Event lists or time series can be seen as tables either. The word we are looking for is related to tables containing sky objects related to one observation. According to this i continue to support the "sourcelist" option.
>>
>> Laurent
>>
>>> While not a mandatory parameter, dataproduct_subtype would be perfect
>>> for specifying the data-provider defined type of the table, e.g.,
>>> "sourcelist" or "catalog" (this is intentionally not a controlled
>>> vocabulary, rather it is site or domain specific, controlled by the
>>> user).  A UCD might be useful for specifying the table type as well,
>>> although this is perhaps beyond the scope of a UCD since it is not
>>> a quantity.  - Doug
>>>
>>>
>>> On Wed, 20 Apr 2016, Laurent Michel wrote:
>>>
>>>> Hello,
>>>>
>>>>
>>>> I'm likely guided by my experience with XMM data but this is a
>>>> valuable use case anyway.
>>>> The data bulk released with one XMM observation contains various data
>>>> types. To make it simple, we can find there event lists, images,
>>>> spectra, time series and source lists possibly in various formats
>>>> (FITS, PDF..)
>>>> All these data sets can take place in an Obscore table except the
>>>> source list. This exception has no justification since the source list
>>>> is a scientific product issued from the processing of a particular
>>>> observation. There is even one stronger reason for pushing the source
>>>> lists in Obscore which is that source level products (e.g. time
>>>> series, spectra) are attached to individual detections which are
>>>> nothing else than source list rows. So hiding the source list would
>>>> limit the possibilities of the data discovery.
>>>> I guess that this point of view can easily be extended to other
>>>> missions of instruments.
>>>>
>>>> Moreover, for catalogues which are table compilations (e.g. XMM
>>>> catalogue) or large sky coverage surveys (e.g. 2MASS), most of the
>>>> characterisation axes parameters are not relevant. So Obscore
>>>> description does not bring much. The Dataset DM or the source DM are
>>>> better suited for his kind of datasets.
>>>>
>>>> The right position of the cursor between publishing or not a table in
>>>> Obscore is left to the appreciation of the data publisher.
>>>> But an appropriate vocabulary may help for this.
>>>>
>>>> As far as vocabulary is concerned, I would prefer to speak about a
>>>> 'source list' instead of a 'catalogue' which is too generic for a list
>>>> of sources extracted from one specific observation.
>>>>
>>>> Assuming that most of the Obscore axe values can be set from the
>>>> parameters of the related observation, I suggest to allow
>>>> dataproduct_type='sourcelist' in Obscore1.1.
>>>>
>>>>
>>>> Laurent
>>>>
>>>> Le 15/04/2016 19:45, Douglas Tody a écrit :
>>>>> I have always thought that ObsCore should include "catalog" as a
>>>>> dataproduct type.  This was opposed in the first version due to a desire
>>>>> by the Exec to keep things as simple as possible, but I do not think it
>>>>> would have complicated things or delayed the release.
>>>>>
>>>>> A catalog is a valid data product with calib_level 3 or higher.  It is
>>>>> not that much different than some other high level, derived data
>>>>> products such as a dither stack.  These higher level data products are
>>>>> often the data products one most wants to find for analysis.
>>>>>
>>>>> While many catalogs are derived from multiple observations or even
>>>>> collections, it is possible for example to perform object detection on a
>>>>> single observation, and include the result in the set of data products
>>>>> published for the observation, all sharing the same obs_id.  Catalogs
>>>>> can even have valid metadata giving the spatial, spectral, and time
>>>>> coverage for the catalog.  So, +1 for me too.    - Doug
>>>>>
>>>>>
>>>>> On Fri, 15 Apr 2016, Patrick Dowler wrote:
>>>>>
>>>>>> Guilty as charged :-)
>>>>>>
>>>>>> Our underlying data model (CAOM2) has catalog as a valid type and I
>>>>>> recall some of us trying to get catalog into the vocabulary back
>>>>>> before 1.0.... I can easily fix this because ObsCore is a view on
>>>>>> CAOM2, with the cost being that people can't find all the products via
>>>>>> ObsCore. But for ASKAP, if they implement ObsCore directly then they
>>>>>> would be left with no way to provide discovery of catalog products.
>>>>>>
>>>>>> The DM rationale is that ObsCore is a list of products (it is a
>>>>>> flattened view of Observation+Product where there is a 1..* relation).
>>>>>> As such, there are certainly other kinds of things that can be created
>>>>>> from data (besides just more data). Do such things belong in ObsCore?
>>>>>> I obviously think they do so IMO we should add to the vocabulary and
>>>>>> I've either been meaning to request it or forgot about that little
>>>>>> non-compliance.
>>>>>>
>>>>>> The alternative that I see if that access to the catalog would be a
>>>>>> link in a DataLink response with semantics="derivation"... and then
>>>>>> probably augment the DataLink vocabulary to be able to ay catalog...
>>>>>> and then still not be able to discover them via a data discovery
>>>>>> query. DataLink *is* awesome and all, but that doesn't seem so great.
>>>>>>
>>>>>> So: can we add "catalog" to the ObsCore-1.1 vocabulary for
>>>>>> datapoduct_type? +1 from me
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 15 April 2016 at 05:21, Mark Taylor <M.B.Taylor at bristol.ac.uk>
>>>>>> wrote:
>>>>>>> James,
>>>>>>>
>>>>>>> it looks like you're not the only one to hit this.  If I use the
>>>>>>> ObsTAP service at http://www.cadc.hia.nrc.gc.ca/tap to run:
>>>>>>>
>>>>>>>   select distinct top 100
>>>>>>>          dataproduct_type, obs_collection
>>>>>>>   from ivoa.obscore
>>>>>>>   where dataproduct_type not in
>>>>>>>       ('image', 'cube', 'spectrum', 'sed',
>>>>>>>        'timeseries', 'visibility', 'event')
>>>>>>>
>>>>>>> I get
>>>>>>>
>>>>>>>   +------------------+----------------+
>>>>>>>   | dataproduct_type | obs_collection |
>>>>>>>   +------------------+----------------+
>>>>>>>   | catalog          | APASS          |
>>>>>>>   | catalog          | CFHTTERAPIX    |
>>>>>>>   | catalog          | JCMT           |
>>>>>>>   +------------------+----------------+
>>>>>>>
>>>>>>> (note this is one of the tests run by taplint, currently failing at
>>>>>>> CADC).
>>>>>>>
>>>>>>> Mark
>>>>>>>
>>>>>>> On Fri, 15 Apr 2016, James.Dempsey at csiro.au wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> In ObsCore v1.0 and 1.1 the dataproduct_type field has a defined
>>>>>>>> list of values. We currently advertise our ASKAP catalogue
>>>>>>>> data products in our ObsCore table, but have to leave the
>>>>>>>> dataproduct_type field blank for these records as none of the
>>>>>>>> possible types match.
>>>>>>>>
>>>>>>>> Has the inclusion of a catalogue type been considered before? If
>>>>>>>> not, would it be possible to include in ObsCore v1.1?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> James Dempsey
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Mark Taylor   Astronomical Programmer   Physics, Bristol
>>>>>>> University, UK
>>>>>>> m.b.taylor at bris.ac.uk +44-117-9288776
>>>>>>> http://www.star.bris.ac.uk/~mbt/
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Patrick Dowler
>>>>>> Canadian Astronomy Data Centre
>>>>>> Victoria, BC, Canada
>>>>>>
>>>>
>>>> --
>>>> jesuischarlie/Tunis/Paris/Bruxelles
>>>>
>>>> Laurent Michel
>>>> SSC XMM-Newton
>>>> Tél : +33 (0)3 68 85 24 37
>>>> Fax : +33 (0)3 )3 68 85 24 32
>>>> laurent.michel at astro.unistra.fr <mailto:laurent.michel at astro.unistra.fr>
>>>> Université de Strasbourg <http://www.unistra.fr>
>>>> Observatoire Astronomique
>>>> 11 Rue de l'Université
>>>> F - 67200 Strasbourg
>>>> http://amwdb.u-strasbg.fr/HighEnergy/spip.php?rubrique34
>>>>
>>
>> --
>> ---- Laurent MICHEL              Tel  (33 0) 3 68 85 24 37
>>     Observatoire de Strasbourg  Fax  (33 0) 3 68 85 24 32
>>     11 Rue de l'Universite      Mail laurent.michel at astro.unistra.fr
>>     67000 Strasbourg (France)   Web  http://astro.u-strasbg.fr/~michel
>> ---
>
>


More information about the dm mailing list