Obscore 1.1 Erratum 3

Thu Dec 8 12:02:43 CET 2022

>From a Rubin perspective (as well as for SPHEREx, for which I don't have the final responsibility for data engineering but I have an opinion), we are going to use obs_id systematically as non-NULL and we do find it moderately useful as a tool for organizing related data (such as different data products from the same image).  With that in mind, we are likely to introduce behavior in our client-side tooling that takes advantage of that and allows users to group data accordingly.

However, like others, I don't see a _requirement_ to provide a non-NULL obs_id as essential to the ObsCore data model.  Projects and legacy datasets with less complex data organization might not gain anything from using it.

There is a small amount of extra work on the client side to be sure that datasets with NULL obs_ids are not inadvertently all described to the user as being related to each other, but this would just be prudent engineering in any event given the client's lack of recourse to enforce the existing standard's constraint against external services.

Gregory

________________________________________
From: dm <dm-bounces at ivoa.net> on behalf of Markus Demleitner <msdemlei at ari.uni-heidelberg.de>
Sent: Thursday, December 8, 2022 1:56 AM
To: dm at ivoa.net; tcg at ivoa.net
Subject: Obscore 1.1 Erratum 3

Dear DM, dear TCG,

at yesterday's TCG we remembered that Obscore 1.1 Erratum 3,
<https://wiki.ivoa.net/twiki/bin/view/IVOA/ObsCore-1_1-Erratum-3>, is
still open.  For context: This is about no longer requiring obs_id to
be non-NULL.  The benefit would be to lift a very noticeable load on
validators and implementations to ensure this, while no case has been
identified where this non-NULL requirement actually is necessary to
write obscore queries or properly interpret them.

There has been a bit of discussion on this in April,
<http://mail.ivoa.net/pipermail/dm/2022-April/006233.html> and then
again in July,
<http://mail.ivoa.net/pipermail/dm/2022-July/006251.html>.

The July discussion, as far as I can reconstruct it, ended with
François remarking (I'm not quite sure why I let it peter out back
then):

On Fri Jul 8 09:45:46 CEST 2022, BONNAREL FRANCOIS wrote:
>> in all services?  The worst that would happen with this query if we
>> don't is that all the "single" observations end up in one aggregate
>> with obs_id NULL, no?
> Probably yes.
>>    If you see a major problem in that, could you
>> elaborate?
>
> This is breaking all observation related queries into two very different
> ones. I find this ugly, and increasing complexity for the users.

Do you still remember why you felt this way?  I cannot see how NULL
obs_ids would make a difference to consumers in any way.  Can you
perhaps discuss a query that you see as adversely affected by the
change?

> By the way , ESO Obstap service has obs_id for all their datasets. For
> images, Each observation has the image and the mesurements datasets.
>
> CADC also has obs_id everywhere. Apparently all of theme has a 1 to 1
> remaitionship observation/dataset. They use the free syntax obs_id
> string  to build the ivi identifier obs_publisher_did string

Of course they have non-NULL everywhere -- it's required by the
current standard; the same is true for the obscore service(s) I
operate.

But the background of the erratum is not that having obs_id non-NULL
is hard to do, it is that it is hard to validate.  Which of course
would be perfectly ok if the requirement served a purpose, but since
we still have not identified such a purpose, it would make
implementors' lives substantially easier at no cost (well: that's
admittedly my claim) if we dropped it.

So...  Are people still concerned about the change?

Thanks,

           Markus