Obscore 1.1 Erratum 3: Drop obs_id non-NULL requirement
BONNAREL FRANCOIS
francois.bonnarel at astro.unistra.fr
Thu Jul 7 10:40:01 CEST 2022
Dear Markus,
Le 06/07/2022 à 19:50, Markus Demleitner a écrit :
> Dear François,
>
> On Wed, Jul 06, 2022 at 05:44:46PM +0200, BONNAREL FRANCOIS wrote:
>> In the radio domain (see JIVE service for example) and also the High energy
>> domain we often face the case where several dataproducts are produced from
>> the same observation. But we can imagine services where some observations
>> contain several dataproducts and some others only a single one (just by
>> chance).
> Right. But even if such archives do not want to use datalink in such
> a situation (which I suspect would almost always be the preferable
> solution now that we have datalink),
These use cases don't fit with the "main item to linked resource"
DataLink scheme.
The basic item in ObsCore is some product with some consistency in the
characterization. (s_fov, s_ra, s_dec, em_min, em_max, t_main, t_max,
etc.... should make sense and be selective enough) which is not the case
of the observation as a whole in the most general case.
Obvious example is a radio interferometry observation where you get
several targets for the same "observation" and several spectral windows
, sometimes significantly apart.
> that in no way depends on having
> obs_id mandatory, does it?*
An observation is a different concept than a dataproduct/dataset. So the
observation_id is really an dditional information in the most general
case. This is for the theoretical aspect. But the issue is pragmatic
too. See below
>> If you want to aggregate all the obs_publisher_did, or (s_ra, s_dec) or
>> whatever property of the products belonging to the same observations I think
>> the GROUP BY will fail if we relax "obs_id = null".
> Ummm... how so? Of course, when a service that has this kind of
> thing *also* has datasets with obs_id NULL, all these will end up in
> a single aggregate, but that is, for all I can see, as good or as bad
> as any other arrangement in this situation; and even when data
> providers choose to do such a thing and users see unfavourable
> consequences, it's easy to fix by appending an "AND obs_id IS NOT
> NULL"; when people are savvy enough to reconstruct observations using
> GROUP BY, that clause will be a breeze for them.
I think I disagree there. The use case is to associate observations to
all their derived dataproducts. The fact that there is one single
dataproduct by chance or several doesn't matter. And if it single some
day, it could be different an other day in case youre continouasly
processing your observations and produce new dataproducts.
>
> Perhaps it would help if you wrote down a concrete use case and a
> query addressing it that has a less desirable outcome when we drop
> the requirement on *all* obscore services to have obs_id non-NULL.
> Note that of course individual data providers are still free to have
> local non-NULL constraints if their actual data holdings require
> that.
Query DataLink associated DataLink services to get all the links of an
observation (meaning all the dataproducts derived from this
observation). For this we need to get first the list of
obs_publisher_did for each observation and use them in multi ID DataLink
query
(This would also require to know the DataLink root URL for each service)
Something like
"select obs_id, string_agg(obs_publisher_did, ',') as publish_did_list
from obscore group by obs_id"
Then parsing publish_did_list to build the Dalink url
https://Organisation/dl-root?ID=...&ID=...&ID=...
>> And it's easy to create obs_id from obs_publisher_did in the case of unique
>> dataproduct in an observation
> The problem is not *filling* obs_id. The problem is *validating* the
> non-NULL requirement, which is fairly resource-intensive (a seqscan
> of the entire ivoa.obscore table, or maintaining an appropriate index
> on all tables contributing to ivoa.obscore).
I must confess, I'm not very familiar with validators and have to trust
you there.
But anything important will require some resource consumption, and I
still think observation is an important concept and obs_id is very useful.
Can other people speak ?
Cheers
François
>
> I'd still suggest we should only require this investment from our
> adopters if we actually have a good reason to do so (as in: X breaks
> if we don't). And that I still can't see.
>
> -- Markus
More information about the dm
mailing list