Obscore 1.1 Erratum 3

Thu Dec 8 13:09:07 CET 2022

Hi,

The rationale for this erratum is to avoid validators to have to scan whole Obscore tables to validate the obs_id!=NULL rule.
This is a good argument now that we have huge Obscore tables.

However, I’m still wondering whether the price for this, exposing Observation dataset that cannot be retrieved otherwise than with their meta-data, is not too high.
The gain of this proposal is clear but I’m not sure the benefit/drawback ratio is positive.
- what client are supposed to do when grouping datasets by obs?
- Should this be specified by the standard?
- what will happen to OBSCORE table joint with some extensions
- what will happen to the legacy clients that assume OB_ID as always being not null?

Laurent

> On 8 Dec 2022, at 12:22, Salgado, Jesus <jesus.salgado at skao.int> wrote:
> 
> Hi,
> 
> I am not really against of the proposal as there could be cases where having a non-NULL obs_id is not really justified, but I have defined different DAL protocols and I have always defined the id field as non-NULL. The reason is to allow clients that are gathering responses from various services to have a consistent view of the results (e.g. if the view of the result is expressed like a tree in the client, the name of the branches is the id). Not having a defined non-NULL field implies to construct this branch identifier on the fly with other metadata or using a different field. Having a unique non-NULL field makes life easier for client developers, in my view.
> 
> So, the main intention of this non-NULL for one of the identifiers in the protocols I have helped to developed was more in the flexibility of the view of the results than a scientific problem. If the view is a table, you are not so impacted (maybe you could find problems making use of this field for SAMP connections or to generate row events inside the application even bus)
> 
> As said, I am not against of the proposal but just to provide a historical insight of why this could be originally defined as non-NULL.
> 
> Cheers,
> Jesús Salgado 
> SKA Regional Centre Architect
> jesus.salgado at skao.int <mailto:jesus.salgado at skao.int>
> www.skao.int <http://www.skao.int/> <https://www.skao.int/ <https://www.skao.int/>>
> SKA Observatory
> Jodrell Bank, Lower Withington,
> Macclesfield, SK11 9FT, UK
> 
> 
> On 08/12/2022, 11:03, "dm on behalf of Dubois-Felsmann, Gregory P." <dm-bounces at ivoa.net <mailto:dm-bounces at ivoa.net> on behalf of gpdf at ipac.caltech.edu <mailto:gpdf at ipac.caltech.edu>> wrote:
> 
> From a Rubin perspective (as well as for SPHEREx, for which I don't have the final responsibility for data engineering but I have an opinion), we are going to use obs_id systematically as non-NULL and we do find it moderately useful as a tool for organizing related data (such as different data products from the same image). With that in mind, we are likely to introduce behavior in our client-side tooling that takes advantage of that and allows users to group data accordingly.
> 
> However, like others, I don't see a _requirement_ to provide a non-NULL obs_id as essential to the ObsCore data model. Projects and legacy datasets with less complex data organization might not gain anything from using it.
> 
> There is a small amount of extra work on the client side to be sure that datasets with NULL obs_ids are not inadvertently all described to the user as being related to each other, but this would just be prudent engineering in any event given the client's lack of recourse to enforce the existing standard's constraint against external services.
> 
> Gregory
> 
> ________________________________________
> From: dm <dm-bounces at ivoa.net> on behalf of Markus Demleitner <msdemlei at ari.uni-heidelberg.de>
> Sent: Thursday, December 8, 2022 1:56 AM
> To: dm at ivoa.net; tcg at ivoa.net <mailto:tcg at ivoa.net>
> Subject: Obscore 1.1 Erratum 3
> 
> Dear DM, dear TCG,
> 
> at yesterday's TCG we remembered that Obscore 1.1 Erratum 3,
> <https://wiki.ivoa.net/twiki/bin/view/IVOA/ObsCore-1_1-Erratum-3>, is
> still open. For context: This is about no longer requiring obs_id to
> be non-NULL. The benefit would be to lift a very noticeable load on
> validators and implementations to ensure this, while no case has been
> identified where this non-NULL requirement actually is necessary to
> write obscore queries or properly interpret them.
> 
> There has been a bit of discussion on this in April,
> <http://mail.ivoa.net/pipermail/dm/2022-April/006233.html <http://mail.ivoa.net/pipermail/dm/2022-April/006233.html>> and then
> again in July,
> <http://mail.ivoa.net/pipermail/dm/2022-July/006251.html>.
> 
> The July discussion, as far as I can reconstruct it, ended with
> Fran?ois remarking (I'm not quite sure why I let it peter out back
> then):
> 
> On Fri Jul 8 09:45:46 CEST 2022, BONNAREL FRANCOIS wrote:
> >> in all services? The worst that would happen with this query if we
> >> don't is that all the "single" observations end up in one aggregate
> >> with obs_id NULL, no?
> > Probably yes.
> >> If you see a major problem in that, could you
> >> elaborate?
> >
> > This is breaking all observation related queries into two very different
> > ones. I find this ugly, and increasing complexity for the users.
> 
> Do you still remember why you felt this way? I cannot see how NULL
> obs_ids would make a difference to consumers in any way. Can you
> perhaps discuss a query that you see as adversely affected by the
> change?
> 
> > By the way , ESO Obstap service has obs_id for all their datasets. For
> > images, Each observation has the image and the mesurements datasets.
> >
> > CADC also has obs_id everywhere. Apparently all of theme has a 1 to 1
> > remaitionship observation/dataset. They use the free syntax obs_id
> > string to build the ivi identifier obs_publisher_did string
> 
> Of course they have non-NULL everywhere -- it's required by the
> current standard; the same is true for the obscore service(s) I
> operate.
> 
> But the background of the erratum is not that having obs_id non-NULL
> is hard to do, it is that it is hard to validate. Which of course
> would be perfectly ok if the requirement served a purpose, but since
> we still have not identified such a purpose, it would make
> implementors' lives substantially easier at no cost (well: that's
> admittedly my claim) if we dropped it.
> 
> So... Are people still concerned about the change?
> 
> Thanks,
> 
> Markus
> 
> 
> The SKA Observatory is an inter-governmental organisation and the successor of SKA Organisation, a private limited company by guarantee registered in England and Wales with registered number 07881918, with a registered office of Jodrell Bank, Lower Withington, Macclesfield, Cheshire, England, SK11 9FT. 
> 
> This message is intended solely for the addressee and may contain confidential information. If you have received this message in error, please inform the sender, and immediately and permanently delete the email. Do not use, copy or disclose the information contained in this message or in any attachment. 
> 
> This email has been scanned for viruses and malware, and may have been automatically archived, by Mimecast Ltd. Although SKA Observatory and SKA Organisation have taken reasonable precautions to ensure no viruses are present in this email, neither SKA Observatory nor SKA Organisation accept responsibility for any loss or damage sustained as a result of computer viruses and the recipient must ensure that the email (and attachments) are virus free.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20221208/1aa80b80/attachment-0001.htm>