Facility and instrument terms in the different protocols

Fri Apr 29 14:05:27 CEST 2022

Hi,

On Thu, Apr 28, 2022 at 12:04:43PM +0100, Harrison Paul wrote:
> In ProposalDM, I have been trialing the idea of a 4th layer which I
> have named “backend" for now - which is roughly how the detected
> data are combined to produce a data product - So in radio astronomy

...which goes to show that a full specification of the "artefact of
origin" (for lack of a better term) for a data product, as
Paul said in yesterday's hackathon, probably is a tuple of *varying*
length.  Where ("icecube",) probably is enough to uniquely define the
whole thing going from collection (the ice) to detector (the
photomultiplier chains), a Schmidt telescope with an objective prism
that has been mounted at different places probably needs a few more
items; Paul has given other examples in the radio domain.

This is to say: the full specification thing is a hard problem.  It
is certain that we will not solve it with two columns alone (sure: we
could solve it with just a single string if that was an identifier of
the full tuple, but for that we'd have to have an exhaustive list of
such tuples, and as we heard in Semantics this morning, such an
exhaustive list is not going to come around very soon).

So...

On Thu, Apr 28, 2022 at 03:28:59PM +0200, Mireille LOUYS wrote:
> I agree we should update the definition of these fields in the various specs
> , but I believe the problem is more complex than that.

If "the problem" is unique, interoperable, query-ready identification
of artifacts of origin, then yes, absolutely.

But that is not the problem Tamara wants to see solved.  That problem
is a lot less complicated.  It is: "Give the existing facility and
instrument columns in obscore and the registry a meaning that is
enough so people have solid guidelines what to put there".  And
that's necessary to make them useful for *something*.

As usual, I'd propose to first agree on what that "something" is
(a.k.a. use cases).  As long as we don't have proper identifiers, I
basically see:

(1) A user who has retrieved a large list of data products to a query wants
    to throw out items that obviously are not useful because the
    instruments used couldn't possibly produce what they're looking
    for by scrolling, eyeballing, and mass filtering by instrument or
    facility name.

(2) A user has located a dataset for an unknown object.  They now
    want to see if the instrumental setup that has produced that
    dataset has also obtained data for well-characterised objects so
    they get an idea of the properties of that setup.

Feel free to contribute more of those, but I think that's about what
we can reasonably do as long as there's no way to predict what name
some instrument will have.

Keeping these two in mind, I think the two most important things to
characterise is the (roughly) collection device -- which would be
covered by Tamara's facility -- and the analysis/detection device --
which would be Tamara's instrument.

I'd say that's reasonable given the use cases and the constraints
we're working under (there are just these two columns).

Let me remark that SSAP is a bit of a pain in this respect: It has a
facility column (but no instrument).  But I'd say the spectrograph
("instrument" in our proposed obscore meaning) would be a lot more
relevant than the telesope.  I almost suspect in a clarifying erratum
we should say something like "SSAP services are encouraged to also
give a column with the utype ssa:instrument.name [or whatever]
containing the name of the spectrograph where facility does not
uniquely identify that".  But that's for another day.

> In a recent discussion we wanted also to clarify better by adding an
> Organization column, which designates the organisation in charge of the
> telescope.

Let's keep these extra questions out of the effort of clarifying
facility and instrument_name for now -- it's hard enough as is.

We can tackle that later, as use cases for having extra metadata in,
for instance, obscore are identified.  Meanwhile, note that the
Registry already has the notion of organisations, but for all I can
tell that metadata is not heavily used, not by the publishers nor by
queries (which of course may be because the publishers generally
don't put such information in).

Be that as it may: +1 on a running meeting deciding on how to go on
with this.  Are there volunteers for setting one up?

Thanks,

         Markus