Doubts with the facility name term

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Mon Feb 21 11:17:05 CET 2022


Dear Semantics WG,

Thanks for your contributions in this discussion.  Let me try to sum up
the discussion so far.

So,

On Thu, Feb 10, 2022 at 01:18:17PM +0100, Stéphane Erard wrote:
> As you know, in EPNCore we use a simple system to facilitate data search:
> 
> Instrument_host_name : is an observatory, spacecraft, or facility ( = lab) -
> by observatory here, we really mean: telescope
> Instrument_name : is an instrument, either spatial, accommodated on the
> telescope, or in the lab

This looks pretty much like a generalisation of the split between
collector/detector that I had suggested further up the thread, except of
course that on spacecraft hosting multiple collectors it's more
complicated, and all that only applies with Stéphane's clarification of
what an "observatory" is.  And laboratory astrophysics adds an extra
twist, as there, the notion of "collector" becomes a lot more colourful.

EPNCore has the luxury of allowing multiple values here:

> E.g.:
> I find HST data with Instrument_host_name  LIKE ‘%HST%’ 
> 	or ivo_hashlist_has(Instrument_host_name, ‘HST') = 1

Could we consider a similar hack for the other uses of at least
facility?  As long as we at least plan to control the vocabulary, we
probably could do that without loss of generality, as we'd just not
allow hashes in our identifiers.  I also don't think we'd break a lot of
existing practice when we did that, as, as Paul said, what we have so
far is not terribly useful to begin with.

And I suspect we will want the controlled vocabulary anyway, since, as
Stéphane says,

> I must add that what makes the system work is really to use limited
> lists of possible values, simple names with no fancy variations
> (lesson learnt from PDS3). 

But that's a bit of a chicken-and-egg problem: As long as this
controlled list of possible values doesn't exist, it's hard to make
instrument/facility useful, and as long as there's not much use of that
combo in queries, there's little to be gained by building such a list.
Well, let's see what the ongoing efforts in this area in Paris yield.

Then,

On Wed, Feb 09, 2022 at 12:58:16PM +0000, Paul Harrison wrote:
> I would like to point out that this area is also a central issue for
> the ProposalDM (https://github.com/ivoa/ProposalDM
> <https://github.com/ivoa/ProposalDM>) that is currently in
> development. It is wrestling with the distinctions between telescopes,
> instruments and backends and what combinations of them constitute and
> “observing system”  (which might be the same thing as a Facility if
> that is to be a useful separate concept…). 

Interesting.  Given that, could I charm you into running a breakout (or
perhaps even proper session) on this at the next Interop?  I believe
ProposalDM might be a good place to properly clarify the concepts that
we could then attach to the concepts from Obscore, VODataService, SSAP
and perhaps other places too.  ProposalDM at least is new and can
build on the previous efforts' experiences.

> I am not too convinced that a vocabulary is necessarily the best way
> to solve this - my feeling that some sort of dynamic service that
> allows observatories to curate their offerings might be better. It
> might be that this is doable within the registry, but it does depend
> on the complexity of the model that is required.

Well -- a simple hierarchical list of identifiers is the simplest
possible data structure that, I think, has any hopes of covering the
discovery use case.  If we want to do more, we might need something more
complex, and we will have to employ different tech.  Carlo (below) has
shown that other disciplines struggle with similar problems, and their
plans (in the end based on something like handles) are certainly
something we should keep in mind.

Having said that: I'm 100% certain that whatever we do, it will have to
be curated, at least until it is so established that instrument
builders' second thought will be to register their gear with the chosen
infrastructure.  And my suspicion is that our best chance to make that
happen would be a cooperation with the astro library community.

> > [Me:]
> > After this, I'm frankly tempted to try and clearly state that
> > "facility" should be, to first order, the telescope, and we ought to
> > explictly link it to FITS' TELESCOP.  Which, by the way, is defined
> > as
> > 
> I think that this would be a good pragmatic step

Ok... if nobody seriously objects, I'd plan for pushing this after the
Interop.

Also,

On Wed, Feb 09, 2022 at 09:52:09AM -0500, Anne Catherine Raugh wrote:
> First, the word "facility" has an implication of a physical location where
> something happens. So when the PDS Standards refer to a "facility" the
> expectation is that it is in reference to a building, or set of buildings,
> that are in a specific, permanent, geographic location. This is why PDS
> uses it to refer to things like laboratories and "observatories" - whatever
> that means.

Yeah... this has been my intuitive understanding, too, but that,
if I look around, is not what it has been used for, and I had tried to
argue further up the thread that using it as TELESCOP is more useful,
which perhaps is why so many people have used it that way.

As to calling TELESCOP "facility", I'd again try to argue that as long
as our definitions are clear, the concepts' concrete labels are perhaps
less relevant.  But of course, we could just adopt:

> For PDS, the basic ground-based observing system is composed of a
> facility (the "observatory"), the telescope, and the instrument that
> recorded the data. Each of these elements gets its own identification,

-- as it seems to make a lot of sense to me.  That *would* mean changes
to ObsCore and VODataService, but I'd argue they'd be minor version
changes.  If people spoke up loudly for that approach, I'd try to push
these forward, at least for VODataService as a first step (and as an
alternative plan to declaring facility is TELESCOP).

But then, given...

> A general solution is difficult - to put it mildly - because what you
> consider to be the correct answer for "Observatory?" depends on your goals:

...perhaps the plan with facility/detector is good enough and we
shouldn't even bother with "observatories"?  Are there use cases strong
enough to warrant having to think about Anne's question?  You see, I am
somewhat disheartened by:

> But no one else has solved the problem either. The general approach seems
> to be to either leave it up to the data suppliers, which leads to confusion
> and incomplete search results; or define a code system that is managed by
> an authority for the project and expanded as needed for the specific task
> in hand. The MPC, for example, is primarily concerned with astrometry, and

Anne finally suggests:

> I suspect the ultimate answer lies in being able to define atomic
> identities like "The Mauna Kea Observatories", "Las Cumbres Observatory",
> "Faulkes Telescope North", etc., and then defining the aggregate
> associations between them. You could do this with a SKOS sort of vocabulary
> if you can define a loose hierarchy that can be expanded to encompass all
> the various interpretations of "observatory" that might be applicable in

Hmha... one *might* hope the "softness" of SKOS would let one get away
with not answering the question what *exactly* observatory, collector,
and detector are.  And perhaps a bit of wider and narrower and related
is all that is needed to answer (some) useful questions.  However, my
gut feeling is that given the complex relationships *and* their time
dependence it's possible blurring things to what SKOS can do won't let
anyone clearly see anything and more.  If that were the case, we may be
putting a lot of effort into a thesaurus that in the end won't work for
anyone.

I'd say the way to make that gut feeling a bit more reasoned would be to
write the use cases: Who will do what with this kind of metadata?

This perhaps reflects a bit Carlo's skeptical remarks:

On Wed, Feb 09, 2022 at 02:32:15PM +0100, Carlo Maria Zwölf wrote:
> I am not sure we may define a rigorous hierarchy between the terms
> “Telescope”, “Facility”, “Instrument” and “detector”. Depending on the
> context and on people, some terms may overlap. I’m not sure we can
> reach easily a consensus vocabulary.

After what I've written above, I'm particularly intrigued by:

> Some recent output from the Research Data Alliance are in line with
> the Paul’s suggestion that the observatory choses to label its data
> products and the distinctions they make, rather than some globally
> agreed definition: the RDA produced a recommendation on the persistent
> identification of Instrument
> https://rda-pidinst.readthedocs.io/en/latest/white-paper/index.html#
> <https://rda-pidinst.readthedocs.io/en/latest/white-paper/index.html#>
> . 
> They are agnostic about what an instrument is, but they provide rules
> & mechanisms to identify it with all the required granularity and
> metadata. Those metadata values (cf. Par. 5.1 of the linked doc) may
> be free texts or coming from controlled vocabularies. 

So, this would essentially boil down to saying "there's things related
to taking data, and you register them.  Data products have a set (i.e.,
unstructured) of identifiers of such things.  When looking for products
from an observatory, you look for the observatory's id, when looking for
something measured with hypercam-23, you look for hypercam-23's id",
right?

That's an interesting idea enabled by having these sorts of ids, putting
the type information into what they reference rather than the place
they're referenced from.  I think I'd like it enough to adopt it for
something we'd start now.  Given there already is quite a bit of stuff
that's standardised and that would become obsolete if we adopted this
sort of, excuse me, dynamic typing, I'm less sure.

> In addition, should we think at adding a field like “Instrument PID”
> to achieve the unique identification? 

I believe to follow the PIDINST plan, it would have to "Instrument
PID*s*" (or perhaps something else to avoid the term "instrument" that
we've already burned): This would be a set-valued piece of metadata.

As I said, I'd like this, but I'd like it a lot better if this
contributed to unraveling the facility/instrument thing rather than
adding more colours and shapes to a picture that's already a bit too
crowded for my taste.

Thanks,

           Markus


More information about the semantics mailing list