Draft note on STC in the Registry

Wed Jan 31 13:06:55 CET 2018

[Apologies for the wide cross-post; I'd invite follow-ups to
Registry; for those that don't know what this is about: discussion
started at http://mail.ivoa.net/pipermail/registry/2018-January/005226.html]

Hi Arnold,

On Tue, Jan 30, 2018 at 02:13:36PM -0500, Arnold Rots wrote:
> First, I think we need to define the role of the coverage information
> in the registry, and specify the requirements.

Exactly.  Now, since the way from the use cases in the Note's
introduction to the actual features isn't that long, I've skipped
writing out requirements explitly.

If we wanted to have more wide-reaching requirements, I'd say we
first need credible use cases from which to derive them.  If you have
some, this would be the perfect time to bring them up and add them to
the Note's introduction (which, of course, applies to everyone).

> It seems to me that any resource should be able to provide an
> authoritative answer, as a follow-up to the information provided
> by the registry. And, frankly, and STC-S (not STC-X) string is, so

Well, that's the actual data query, via SIAP, TAP, or whatever, no?
Or do you see a need for a third layer between service discovery and
dataset discovery?

> Using a MOC for high precision coverage applications is problematic
> because it means that the server has to anticipate the client's

The question of general spatial representation(s) in VOTables or
other places is distinct from what we're dealing with here.  I'd like
to avoid burdening the question of STC-capable registries with any
wider-reaching STC discussion (except we need to make sure we're
roughly in line).  If we agree that MOCs in VOTable cells are a good
idea, let's move along with them. Of course, if people see problems
with *that* particular aspect, now is when we should discuss it.

> One more item about spatial coverage: we need to make sure
> that we are not painting ourselves into a corner, restricting the
> coordinates to ICRS. Galactic will come and if we want to have

As I'm writing in the note:

  On the side of registries trying to process VODataService~1.1
  coverage information, dealing with the STC-X embedded within the
  resource records was made difficult by the large feature set of
  STC-X, where coverages could be provided in a myriad of reference
  frames and shapes that needed to be unified to standard systems
  before they could meaningfully be searched.

-- now, true, it's trivial to convert Galactic to ICRS, but then
there's really no reason to put that burden on the Registries.  For
almost all other frames, the authors of the registry record are in a
much better position to convert whatever celestial reference frames
they use to ICRS then a harvesting registry is.

In actual usage, we'll need a uniform frame anyway -- have a look at
the example queries and try to imagine what they'd look like if we
let in other celestial frames (even if it were just Galactic and
ICRS).  Note that the "magic" conforming (as in
CONTAINS(POINT('ICRS'), CIRCLE('GALACTIC'))) forseen in ADQL 2.0 was
identified a pain in implementation and a trap for users, so it's no
longer part of ADQL 2.1.

> appeal to the solar system community, that will require more
> flexibility - which means that a reference position is needed, too,
> but possibly only for solar system frames.

That's a different issue -- as planned in the parent message, I'm now
proposing to reserve an extra column for such non-celestial use
(Volute rev. 4729).  The corresponding open question is now:

  \paragraph{Other reference systems?}  We currently require all
  spatial coordinates to be in the ICRS.  This obviously is not enough
  whenever objects move fast against the ICRS, as for instance for solar
  system objects and, in particular, their surface features and the like.  To
  enable future extensions to cover these domains, a column
  \verb|ref_system_name| must currently always be filled with
  \texttt{NULL} on the service side, and clients must always constrain
  coverage queries with a \verb|ref_system_name IS NULL| condition.

  Is this enough to cover forseeable and plausible use cases?  Should we
  write \verb|'ICRS'| rather than \texttt{NULL} already, and then perhaps
  already define some system names we already have resources for?  Given
  it will be present in almost all STC queries, should we have a less
  verbose name than \verb|ref_system_name|?

Comments on this are highly welcome.

> As to time, MJD is fine, but the combination of BARYCENTER and
> TT is an oxymoron. I would advocate BARYCENTER-TDB,

Well, BARYCENTER-TT is straightforward to define and compute, which is
why it was my first choice.  But I won't quarrel about split seconds,
and I see that we might be setting an unfortunate precedent here, so
I've changed TT to TDB in the document.

> GEOCENTER-TT and TOPOCENTER-TT. I will grant you that

Again, we have to use a uniform system if we want to enable global
discovery, and again the harvesting registries are in a much worse
position than the record record authors to perform the
transformation(s) -- actually, they can't do them at all most of the
time.

So, for Registry purposes we'll have to choose one.  From what I've
seen, people who care about reference positions in their temporal
metadata at all tend to choose BARYCENTER and then whatever time
scale their instruments happen to use.  Making things easy for them
will -- I hope -- increase the chances we'll actually see such
metadata in the Registry (which we currently don't).

> I still regret the use of wavelength as the spectral coordinate.
> Either frequency or energy have solid physical meaning and
> do not require the extra caveat "in vacuo".

So do I.  But unless many more people speak up that it's time to fix
that wart in the VO and promise to help out, switching to energy
would be little more than the first step in an XKCD 927 process:
https://xkcd.com/927/

> the resolution certainly is: a segment of the user community
> has a very good reason wanting to know whether Doppler
> velocity information is available in a resource and at what
> resolution.

Could you provide a use case for that?

Is this comparable to the role resolution plays in spectra discovery?
There, we've so far relied on the dataset discovery (i.e., SSAP and
Obscore) protocols to let people constrain resolutions.  I *suspect*
that's right at least in this case, as the resolutions of spectra
within a single service (which for now can stand for "resource") can
be so dramatically different that "spectral resolution" isn't a
meaningful concept on the resource level, and "range of spectral
resolutions" is too large to be useful.

On the other hand, there aren't many discovery cases that do not, in
some sense, involve quality measures (limiting magnitude, SNR,
resolution).  The Registry so far doesn't really talk about these,
even where (as for limiting magnitude) they are clearly properties
on the resource level at least for some resources (e.g., many
catalogs).

I can't honestly say I'll try to do something about such quality
measures, but I'd consider them to be roughly in scope for this note.
If anyone wanted to contribute text for them: The SVN is at
https://volute.g-vo.org/svn/trunk/projects/registry/regstcnote
(feel free to commit to trunk, but if you prefer to branch, do that).

Meanwhile, an updated build of the draft Note with the changes coming
out of the recent discussions (thanks to all participants!) is at
http://docs.g-vo.org/regstcnote.pdf.  I'll postpone publishing it to
the document repo as long as there's still discussion going on.

        -- Markus