Obscore 1.1 Erratum 3: Drop obs_id non-NULL requirement

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Wed Jul 6 19:50:26 CEST 2022


Dear François,

On Wed, Jul 06, 2022 at 05:44:46PM +0200, BONNAREL FRANCOIS wrote:
> In the radio domain (see JIVE service for example) and also the High energy
> domain we often face the case where several dataproducts are produced from
> the same observation. But we can imagine services where some observations
> contain several dataproducts and some others only a single one (just by
> chance).

Right.  But even if such archives do not want to use datalink in such
a situation (which I suspect would almost always be the preferable
solution now that we have datalink), that in no way depends on having
obs_id mandatory, does it?

> If you want to aggregate all the obs_publisher_did, or (s_ra, s_dec) or
> whatever property of the products belonging to the same observations I think
> the GROUP BY will fail if we relax "obs_id = null".

Ummm... how so?  Of course, when a service that has this kind of
thing *also* has datasets with obs_id NULL, all these will end up in
a single aggregate, but that is, for all I can see, as good or as bad
as any other arrangement in this situation; and even when data
providers choose to do such a thing and users see unfavourable
consequences, it's easy to fix by appending an "AND obs_id IS NOT
NULL"; when people are savvy enough to reconstruct observations using
GROUP BY, that clause will be a breeze for them.

Perhaps it would help if you wrote down a concrete use case and a
query addressing it that has a less desirable outcome when we drop
the requirement on *all* obscore services to have obs_id non-NULL.
Note that of course individual data providers are still free to have
local non-NULL constraints if their actual data holdings require
that.

> And it's easy to create obs_id from obs_publisher_did in the case of unique
> dataproduct in an observation

The problem is not *filling* obs_id.  The problem is *validating* the
non-NULL requirement, which is fairly resource-intensive (a seqscan
of the entire ivoa.obscore table, or maintaining an appropriate index
on all tables contributing to ivoa.obscore).

I'd still suggest we should only require this investment from our
adopters if we actually have a good reason to do so (as in: X breaks
if we don't).  And that I still can't see.

       -- Markus


More information about the dal mailing list