Fwd: Obscore: obs_id not null requirement

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Wed Mar 23 09:04:01 CET 2022


François,

On Fri, Mar 18, 2022 at 04:34:28PM +0100, BONNAREL FRANCOIS wrote:
> When an observation encompass several datasets the obs_id is used to select
> all datasets belonging to the same observation as Pat and Mireille said.
> 
> When an observation encompass one single dataset, obs_id     and
> obs_publisher_did may seem redundant.
> 
> However looking for the number of datasets related to a single obs-id may
> help to distinguish simple and complex observations.

Just to be sure: Is this an argument for keeping the obs_id non-NULL
requirement or a general observation?

> I find this cleaner than doing the same using the obs_id = null criterium.
> How can we be sure that the "null" value is for the good reason (as staed by
> Mireille)

As I've tried to argue in
<https://blog.g-vo.org/requirements-and-validators.html>,
requriements need to have operational reasons.  Using them in order
to somehow force people to do good curation in my experience just
won't work.  There are infinitely many ways in which you can get
metadata wron, and you can only block very few of them with
requirements like this.  In particular, the way it's written now,
there is absolutely no guarantee that obs_id==obs_publisher_did in
the single-artefact case (as a matter of fact, this probably is never
true in GAVO's obscore table).

Making things "natural", on the other hand, increases the chances
that things work out well even when curation isn't all optimal.
Saying "Use obs_id for multi-artefact observations" will probably
lead to everyone having non-complex data just leave the field alone
and hence NULL.  And that would give a reasonably robust and fast way
to identify such data if that is desired.

> And as Pat states it is easier to use this obs_id field to join to other non
> ObsCire table where we could have other observation metadata

Data providers that have these external tables can use obs_id for
that whether or not Obscore requires it to be non-NULL.  So... I'd
say we'd still be all clear for lifting the requirement.

         -- Markus


More information about the dal mailing list