Product-type as a SKOS vocabulary

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Mon Jan 10 10:48:56 CET 2022


Dear François,

On Wed, Jan 05, 2022 at 04:50:59PM +0100, BONNAREL FRANCOIS wrote:
> If we want to organize dataproducts in a network we have to think what are
> the properties  able to characterize the dataproducts and see what are the
> most common cases

As usual, I'd like to come from use cases; it is certainly valuable
to have a formal and solid understanding of the full domain, but in
order to come up with vocabularies (or, perspectively, ontologies)
that people can work with, I am sure we will need to agree which
parts of a full mapping ought to enter into a given semantic
resource -- and which might be dispensable *for that specific*
semantic resource.

For product-type, I think we have two basic use cases: 

(A) discovery of artefacts relevant to a defined research project;
that's the obscore use case where people would, for instance, look
for spectrally resolved data and want to filter out images and time
series that are not spectrally resolved.

(B) routing of some artefact to the proper (SAMP) client; that's the
datalink use case where a user might simply double-click a datalink
row, and it'll open in TOPCAT when it's a table, except if it's
really a spectrum, in which case it would preferably go to Splat, and
it would go to Aladin if it's an image (or perhaps ds9 or whatever,
depending on the user session).

Taking François' schema:

>    I see at least 4 type of properties
>       1 )  What are the independent variables (in the context of  functional
> dependencies of variables with respect to others)? for example if time is
> independent we have a TimeSeries , if spectral coordinate is independent we
> have a spectrum
>       2 )  What are the dependent variables. In case of TimeSeries If it's a
> photometric quantity, it could be a lightcurve, if it's radial velocity it
> is a velocity curve.
>       3 ) Are the independent variable sparsed or regularly sampled ?
>       4 ) The organization : is this a table (where the different quantities
> of a given measurement are explicitly recorded) or a bitmap where the range
> of a dependent measurement  in the dependent measurement array is a function
> of the independent variables

I think (1) and (2) are obviously relevant to both cases.

For (3) I'm less certain.  Use case (A), I would claim, actually
requires making this ignorable.  If I'm looking for a spectrum of a
source, I'm happy to find anything spectrally resolved, and I'd
rather like to avoid having to remember to somehow include both
sparse and non-sparse datasets.  For use case (B), it is conceivable
that certain clients can only deal with regularly sampled data (e.g.,
an image, which I'd like to send to Aladin) and others only with
sparse data (e.g.  spatially resolved events, which might better be
dealt with in TOPCAT).  Is this really something we'd want to (be
able to) automatically handle?  I'm currently leaning towards a
tentative "no", but I could certainly be convinced I'm wrong here.

Item (4) is I think closely related to (3), though it's perhaps still
a bit more technical and father removed from scientific content. My
example here would be IRAF-type spectra (which are 1D bitmaps) versus
IVOA SDM spectra (which are tabular).  And my conclusion would be
rather analogous to item (3), also because SAMP doesn't allow to
tell one from the other at this point.

So, I think what we should come up with are (plausible) stories of
data usage.  These could guide us whethere (3) and (4) need to be
reflected in product-type -- and how we don't spoil (A) if we want to
have them for (B).

          -- Markus


More information about the semantics mailing list