Vocabularising dataproduct_type
François Bonnarel
francois.bonnarel at astro.unistra.fr
Tue Mar 17 19:29:02 CET 2020
Hi all,
Two words on this.
A ) Measurements
I remember the discussion among authors when we added (to the initial
1.0 list) the "measurements" term among authors of ObsCore 1.1 . Indeed
I think the idea to add something like "catalog" was given by Daniel
DUrand from CADC because they had this use case that Pat is exposing.
There was however a reason why we didn't choose to add the
dataproduct-type "catalog".
ObsCore/ObsTAP is for discovery of datasets which have time,
spatial, spectral and polarisation axes. Selection on the ObsCore
parameters is not sufficient for catalogs with plenty of other
parameters which are directly queried via TAP (or even in the registry).
So we had to find another word for these specific tables extracted from
the datasets in order to not let beleive that ObsCore is a discovery
model for any kind of catalog. Hence "measurements"
HiPS was different because they are really creating HiPS for ANY
KIND of catalog !!!
So I think we should keep "measurements" but not with the negative
definition "Generic tabular data not fitting any of the other terms.
Because
of its lack of specificity, this term should generally be avoided,
and new, more precise terms should be introduced instead" any of the
others will fit I think and yes we have to keep ascendant compatibility
with obsCore. This is pretty important for interoperability
B ) TimeSeries and :SED. hierarchies
This is maybe more critical. the definition in ObsCore which has
been reproduced in Markus' list seems to exclude TimeSeries of spectra
or Images or whatever thing which is not a varying scalar. I don't
remember the reason but where do we put spectrochronogram then ? Do we
create subtypes of spectrum or cube ?
The question of subtypes and "parent" terms seems to be open. In
Obscore there was no hierarchy in the terms. But in Markus list there is
sed as a child from spectrum. I think some people complained about using
dataproduct_subtype for such things and prefer to let this subtype
Obscore field available for free strings.
Imagine in the future (next version of ObscORe) we decide to
have ObsCore vocabulary for dataproduct_type externally defined as the
VOcabulary list we are discussing here. This could be a way to easilly
extend ObsCore vocabulary for this specific dataproduct_type attribute.
We may imagine have "spectrochronogram" and "sed" as children
elements of spectrum. "timecube" or "spectralcube" as children from cubes.
This will be clear in the vocabulary page but ObsCore will
manage that at the same level in dataproduct_type (exactly like we
already have sed and spectrum in parallel today)
Thoughts ?
C ) Miscelaneous.
If this vocabulary is to be used in various contexts (and indeed
it is) why do we link it to SimpleDALRegExt ? Vocabularies 2.0 is
proposing to manage vocabularies as endorsed notes. Why don't we do it
this way and refer it from SimpleDAL, ObsCore, DataLink, etc ...
The MS discussion indeed shows we need to add new refinements in
formats conveyed by the "media types". Does one exist for Measurement
sets already ?
Cheers
François
Le 12/03/2020 à 16:20, Patrick Dowler a écrit :
> note: cross-posting reduced to DAL and semantics
>
> In CAOM, I made an ObsCore.dataproduyct_type base vocabulary and then
> added my own custom child of measurements named
> http://www.opencadc.org/caom/DataProductType#catalog (this is
> literally ObsCore + extensions so if this idea goes ahead we'll have
> to figure out this vocab refers back to the parent vocab). If one
> queries caom2 in our tap service you can see that (fully qualified)
> value mixed in with ObsCore values. In the ObsCore view I am currently
> limiting the rows to exclude custom (fully qualified) terms for
> compliance with the current model (1.1). In the event that:
>
> * ObsCore (1.2) dataproduct_type values are defined by a vocabulary, and
> * providers can extend the vocabulary with custom terms (then fully
> qualified rather that just the short form when using base terms)
>
> I could remove he filtering and allow more entries to appear in
> ObsCore view. If that doesn't become possible, we would have to decide
> to (i) leave content as-is or (ii) change those to measurements.
>
> Currently CADC has:
>
> select dataProductType,count(*) from caom2.Plane group by dataProductType;
> dataproducttype | count
> -------------------------------------------------------+----------
> | 15538972
> timeseries | 866405
> spectrum | 7911476
> cube | 202980
> http://www.opencadc.org/caom2/DataProductType#catalog | 160850
> eventlist | 88421
> image | 16076616
>
> So, we are using "measurements" but it is not surprising that Markus
> didn't see it. We only have one child term in use right now.
>
> --
> Patrick Dowler
> Canadian Astronomy Data Centre
> Victoria, BC, Canada
>
>
> On Thu, 12 Mar 2020 at 00:01, Slava Kitaeff <slava.kitaeff at uwa.edu.au
> <mailto:slava.kitaeff at uwa.edu.au>> wrote:
>
> Hi James et all,
>
> Just a quick note on that. It’s a bit of a guess but I’m
> suspecting that this might be referring to a Measurement Set (MS),
> which is a specific data format (and a model) used by CASA (and
> now ASKAPsoft) to store interferometric visibilities. So, the data
> type is then must be “visibilities”. To confuse things further,
> CASA can store continuum images and spectral-line image-cubes in
> the same MS format as CASA tables. In my view, it’d be ideal to
> have two things describing a) the nature of data and b) its
> specific format (including version).
>
> Cheers,
>
> Slava
>
> ________________________________
>
> *Dr Slava Kitaeff*
>
> Australian SKA Regional Centre Program Manager
>
> *International Centre for Radio Astronomy Research*
>
> The University of Western Australia
>
> *Astronomy and Space Science*
>
> CSIRO
>
> Email: slava.kitaeff at icrar.org <mailto:slava.kitaeff at icrar.org> or
> slava.kitaeff at csiro.au <mailto:slava.kitaeff at csiro.au>
>
> Tel.: (+61) (0) 8 6488 7744 (ICRAR) or (+61) (0) 8 6436 8865 (CSIRO)
>
> Mob.: +61 404 297 414
>
> //
>
> *Mailing addresses:*
>
> M468, University of Western Australia, 35 Stirling Highway,
> Crawley WA 6009, Australia, *or*
>
> CSIRO Astronomy and Space Science, PO Box 1130, Bentley WA 6102,
> Australia
>
> ICRAR: Discovering the hidden Universe through radio astronomy**
>
> www.icrar.org <http://www.icrar.org>Subscribe to our eNewsletter
> <http://www.icrar.org/#subscribe>ICRAR on Twitter
> <http://twitter.com/icrar>ICRAR on Facebook
> <http://www.facebook.com/pages/ICRAR/199692286227>
>
> *From: *<dal-bounces at ivoa.net <mailto:dal-bounces at ivoa.net>> on
> behalf of "Dempsey, James (IM&T, Black Mountain)"
> <James.Dempsey at csiro.au>
> *Date: *Wednesday, 11 March 2020 at 5:43 pm
> *To: *Markus Demleitner <msdemlei at ari.uni-heidelberg.de
> <mailto:msdemlei at ari.uni-heidelberg.de>>, "dal at ivoa.net
> <mailto:dal at ivoa.net>" <dal at ivoa.net <mailto:dal at ivoa.net>>,
> "registry at ivoa.net <mailto:registry at ivoa.net>" <registry at ivoa.net
> <mailto:registry at ivoa.net>>, "semantics at ivoa.net
> <mailto:semantics at ivoa.net>" <semantics at ivoa.net
> <mailto:semantics at ivoa.net>>
> *Subject: *Re: Vocabularising dataproduct_type
>
> Hi Markus,
>
> We are dealing with a lot of catalogues as well as images, cubes
> etc. Currently these have a blank dataproduct_type in the CASDA
> obscore instance as there wasn't anything we could use in v1.0. We
> will probably start using measurements soon as it is better than
> nothing, but it isn't a term that our community uses. Their
> expectation is to see catalogue in the type, so I'd be keen to see
> that as term available for use.
>
> Our catalogues are generally source lists produced from the
> associated image/cube.
>
> Cheers,
>
> James Dempsey
>
> Senior Developer
>
> Information Services Applications
>
> CSIRO Information Management & Technology (IM&T)
>
> ------------------------------------------------------------------------
>
> *From:*dal-bounces at ivoa.net <mailto:dal-bounces at ivoa.net>
> <dal-bounces at ivoa.net <mailto:dal-bounces at ivoa.net>> on behalf of
> Markus Demleitner <msdemlei at ari.uni-heidelberg.de
> <mailto:msdemlei at ari.uni-heidelberg.de>>
> *Sent:* Wednesday, 11 March 2020 7:48 PM
> *To:* dal at ivoa.net <mailto:dal at ivoa.net> <dal at ivoa.net
> <mailto:dal at ivoa.net>>; registry at ivoa.net
> <mailto:registry at ivoa.net> <registry at ivoa.net
> <mailto:registry at ivoa.net>>; semantics at ivoa.net
> <mailto:semantics at ivoa.net> <semantics at ivoa.net
> <mailto:semantics at ivoa.net>>
> *Subject:* Re: Vocabularising dataproduct_type
>
> Hi Sarah,
>
> On Tue, Mar 10, 2020 at 03:58:09PM +0000, Sarah Weissman wrote:
> > * Would too many things be mapped into "Measurement" to be useful
> > from a discovery perspective?
>
> Given my findings yesterday (no active obscore service right now has
> measurements rows), I think we shouldn't sweat this measurements
> thing too much at this point -- in Registry, this term would only be
> used for (SSAP, for now) services *returning* lists of such
> measurements (as they would otherwise return spectra), which seems a
> long shot.
>
> Normal discovery of catalogues and similar will of course proceed as
> usual.
>
> However, I largely agree with Laurent:
>
> On Tue, 10 Mar 2020 17:42:43 +0100, Laurent Michel wrote:
> > I would really prefer the list of dataproduct_type to ensur an
> ascending
> > compatibility with Obscore. I believe it is worth to keep as fas
> as possible a
> > consistance betwenn standards. This might help in many cases to
> map things
>
> So, even though I think we've found that the actual definition of
> measurement is, well, hard, I'm now rather convinced we shouldn't
> drop it. On the other hand, when we follow Marco --
>
> On Tue, 10 Mar 2020 13:53:09 +0100, Molinaro, Marco wrote:
> > Il giorno mar 10 mar 2020 alle ore 13:14 Markus Demleitner <
> >> A set of derived measurements obtained from a particular original
> >> dataset. The prototypical example would be a list of sources
> >> extracted from an image.
> >
> > Does it have to be "original dataset" _singular_?
>
> and Laurent's previous proposal, we'd end up with:
>
> A set of derived measurements obtained from original datasets or a
> catalog of sources.
>
> I'd say the "obtained from original datasets" in there doesn't really
> mean much (what else could the measurements be derived from?), so
> reduced it would work out to
>
> A set of derived measurements or a catalog of sources.
>
> Which, as Sarah rightly says, is really vague and all-encompassing.
> As I said, I'd normally throw out such a term on grounds of it
> colliding with too many other terms without really fitting well into
> a hierarchy.
>
> But again, since it's in obscore, we shouldn't drop it. But since
> it, to me, seems ill-defined, perhaps we should be frank and just say:
>
> Generic tabular data not fitting any of the other terms. Because
> of its lack of specificity, this term should generally be avoided,
> and new, more precise terms should be introduced instead.
>
> Can people live with that? If we find good use cases for
> "measurements" later, we still can get more precise in this
> definition as long as we now say "don't use it, really".
>
> Back to Sarah:
>
> On Tue, Mar 10, 2020 at 03:58:09PM +0000, Sarah Weissman wrote:
> > * Do we need a separate category for "Model"?
>
> The future Vocabularies 2 standard is designed to make it easy to add
> new terms exactly when they're actually needed. So, when you you
> have an SSA or Obscore service returning such Models instances -- or
> some future use of this vocabulary needs it --, we can include it.
> For now, let's see if we can just stick with what Obscore already
> has.
>
> > * Do you expect that more than one label would be applied to a data
> > product? For example (naively) could a "spectral image cube" be
> > labeled with "spectrum" and "image" and "cube"?
>
> This depends on where the terms are being used. In obscore, only one
> label per row is possible, and I don't think that can be changed in a
> backwards-compatible way; we could, however, in some future version
> of obscore, introduce a multi-valued "other_dataproduct_types" in the
> style of EPN-TAP -- but that's a different discussion.
>
> In what I'm drafting for SimpleDALRegExt, SSA services can be
> annotated with zero or more dataproductType elements (current
> internal draft: http://docs.g-vo.org/SimpleDALRegExt.pdf or on
> volute). More on this later on the Registry list.
>
> Thanks,
>
> Markus
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/semantics/attachments/20200317/edfc5330/attachment-0001.html>
More information about the semantics
mailing list