Obscore 1.1 Erratum 3

Fri Dec 9 10:41:18 CET 2022

Hi François,

On Thu, Dec 08, 2022 at 01:38:56PM +0100, BONNAREL FRANCOIS wrote:
>      To summarize my point : this discussion cannot be solved in an erratum.
> 
> *This is not an erratum this is the change in the data model.*

Well, ok, I'll retract the Erratum then.  I've instead created the
Obscore 1.1-Next wiki page:
https://wiki.ivoa.net/twiki/bin/view/IVOA/ObsCore11Next

Now that I look at it again: Oh noes, the new twiki has botched by
topic name.  Marco, if you're reading this: Can you rename it to what
it should be, ObsCore-1_1-Next?  And by the way, the date-changed
item on it looks a bit off, too.

Once that's fixed: DM chairs, can you link this in some suitable way
from your WG page so people find it if necessary?

Having done that...

> Le 08/12/2022 à 10:56, Markus Demleitner a écrit :
> > On Fri Jul 8 09:45:46 CEST 2022, BONNAREL FRANCOIS wrote:
> > > > in all services?  The worst that would happen with this query if we
> > > > don't is that all the "single" observations end up in one aggregate
> > > > with obs_id NULL, no?
> > > Probably yes.
> > > >     If you see a major problem in that, could you
> > > > elaborate?
> > > This is breaking all observation related queries into two very different
> > > ones. I find this ugly, and increasing complexity for the users.
> > Do you still remember why you felt this way?  I cannot see how NULL
> > obs_ids would make a difference to consumers in any way.  Can you
> > perhaps discuss a query that you see as adversely affected by the
> > change?
> 
> Select list of ob_publisher_did related to the same obs_id. The interest of
> this one is so obvious that I cannot understand why you deny it.
> 
> observations (or simulations) are not the same concept as datasets and a
> dataset always come either observation or simulation or experiment.

That may be, but I still cannot see what adverse operational effect
it would have if there are NULL obs_ids, and I think it would help
this discussion (or the later on the obscore change) if you could
make your concerns concrete by giving an actual query the results of
which might possibly confuse the consumers when they encounter a NULL
obs_id.

> > But the background of the erratum is not that having obs_id non-NULL
> > is hard to do, it is that it is hard to validate.  Which of course
> > would be perfectly ok if the requirement served a purpose, but since
> > we still have not identified such a purpose, it would make
> > implementors' lives substantially easier at no cost (well: that's
> > admittedly my claim) if we dropped it.
> 
> Technically as far as I understood the problem occurs only with ObscOre
> tables implemented as "views", because otherwise you can always create an
> index on this obs_id column.
> 
> Apparently "material views" can have indexes. So why not use them when
> available in your dbms ?

Because creating such a materialised view takes several minutes when
you have ~1e8 rows -- and it's something that you do whenever you
import new images or spectra on any contributing data collections.
This is just not a sensible option.

It would be a sensible option to create obs_id indexes on all the
contributing tables, and that's fast enough, but: I'm not a big fan
of doing (substantial) work that serves no purpose I can figure out.
And hence I think I'll be happy for now to be invalid when I time out
the validator queries checking for obs_id IS NOT NULL...

> Changing the data model for validation reasons before looking to other
> technical solutions seems to me a wrong way to solve these issues.

Ah... you see, I've always viewed data models as tools that should
enable and/or facilitate things.  If they instead make things
unnecessarily hard, I think it's not unreasonable to either fix
them or use other methods to ensure interoperability.

But that's becoming a much larger discussion, which we shouldn't have
here.

Thanks,

            Markus