Vocabularising dataproduct_type

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Wed Mar 11 09:48:47 CET 2020


Hi Sarah,

On Tue, Mar 10, 2020 at 03:58:09PM +0000, Sarah Weissman wrote:
> * Would too many things be mapped into "Measurement" to be useful
> from a discovery perspective?

Given my findings yesterday (no active obscore service right now has
measurements rows), I think we shouldn't sweat this measurements
thing too much at this point -- in Registry, this term would only be
used for (SSAP, for now) services *returning* lists of such
measurements (as they would otherwise return spectra), which seems a
long shot.

Normal discovery of catalogues and similar will of course proceed as
usual.

However, I largely agree with Laurent:

On Tue, 10 Mar 2020 17:42:43 +0100, Laurent Michel wrote:
> I would really prefer the list of dataproduct_type to ensur an ascending
> compatibility with Obscore. I believe it is worth to keep as fas as possible a
> consistance betwenn standards. This might help in many cases to map things

So, even though I think we've found that the actual definition of
measurement is, well, hard, I'm now rather convinced we shouldn't
drop it.  On the other hand, when we follow Marco --

On Tue, 10 Mar 2020 13:53:09 +0100, Molinaro, Marco wrote:
> Il giorno mar 10 mar 2020 alle ore 13:14 Markus Demleitner <
>>   A set of derived measurements obtained from a particular original
>>   dataset.  The prototypical example would be a list of sources
>>   extracted from an image.
>
> Does it have to be "original dataset" _singular_?

and Laurent's previous proposal, we'd end up with:

  A set of derived measurements obtained from original datasets or a
  catalog of sources.

I'd say the "obtained from original datasets" in there doesn't really
mean much (what else could the measurements be derived from?), so
reduced it would work out to

  A set of derived measurements or a catalog of sources.

Which, as Sarah rightly says, is really vague and all-encompassing.
As I said, I'd normally throw out such a term on grounds of it
colliding with too many other terms without really fitting well into
a hierarchy.

But again, since it's in obscore, we shouldn't drop it.  But since
it, to me, seems ill-defined, perhaps we should be frank and just say:

  Generic tabular data not fitting any of the other terms.  Because
  of its lack of specificity, this term should generally be avoided,
  and new, more precise terms should be introduced instead.

Can people live with that?  If we find good use cases for
"measurements" later, we still can get more precise in this
definition as long as we now say "don't use it, really".

Back to Sarah:

On Tue, Mar 10, 2020 at 03:58:09PM +0000, Sarah Weissman wrote:
> * Do we need a separate category for "Model"?

The future Vocabularies 2 standard is designed to make it easy to add
new terms exactly when they're actually needed.  So, when you you
have an SSA or Obscore service returning such Models instances -- or
some future use of this vocabulary needs it --, we can include it.
For now, let's see if we can just stick with what Obscore already
has.

> * Do you expect that more than one label would be applied to a data
> product? For example (naively) could a "spectral image cube" be
> labeled with "spectrum" and "image" and "cube"?

This depends on where the terms are being used.  In obscore, only one
label per row is possible, and I don't think that can be changed in a
backwards-compatible way; we could, however, in some future version
of obscore, introduce a multi-valued "other_dataproduct_types" in the
style of EPN-TAP -- but that's a different discussion.

In what I'm drafting for SimpleDALRegExt, SSA services can be
annotated with zero or more dataproductType elements (current
internal draft: http://docs.g-vo.org/SimpleDALRegExt.pdf or on
volute).  More on this later on the Registry list.

Thanks,

       Markus


More information about the semantics mailing list