STC-S in DataLink

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Thu Jun 20 07:46:48 PDT 2013


Dear DAL group,

Since this is going to be a long mail, I feel obliged to start with
an

Abstract

  In the context of DataLink, there is renewed interest in STC-S on a
  protocol level, i.e., for passing shape descriptions into services.
  I believe we should not do this. 

  In a first part of this (longish) mail, I try to make this point.
  There's a second part below in which I try to outline how STC-S *has*
  worked for me, and under which conditions; that part is basically
  something like "If you agree with me, encourage me, and I'll work on an
  STC-S working draft with actual EBNF."

Sorry for this long soliloquy, but I am very sure this is an
important point for the later interoperability of DataLink-based
services.


Part I: STC-S in Protocols

I'm arguing against usages like "CUTOUT=Ellipse ICRS 33 45 4 5 unit mas
SpectralInterval TOPOCENTER 55 65 unit MeV pixelSize 1" in protocol
parameters, and actually against abusing STC-S in cases where you just
want to define some shape in spherical geometry.  Here are my reasons:

(1) Mashing data and metdata is like denormalizing databases:
You may get away with it and save some work, but if you've not
understood exactly what you're doing, you'll almost certainly regret it.

(2) STC-S covers an enormous wealth of features.  Even suggesting that
all services should be able to transform, e.g., wavelength into the rest
system is not going to be a high incentive to make people take up a
standard.  Don't wave at "there's going to be a library" -- there's not.
Many years after STC-S was published and the STC DM passed, all we have
are some libraries that can -- more or less -- parse STC-S and spit the
stuff out in some other form.

Actually *doing* something with what's parsed is something completely
different.  In my STC library I'm allowing some "conforming" (making one
STC spec use the reference frame, units, and such of another STC spec),
but exactly because STC is a complex beast, that's no fun at all.  That
was part of the pain I alluded to above, and I'm still ignoring most
things that actually are complicated (like tranfroming spatial
coordinates from the EMBARYCENTER reference position to the PLUTO
reference position).

But worse (just for this example) -- if you want to transform positions
for differing reference positions, you need to know the source's
distance, and it's completely unclear how to do that for images.
Transforming spectral coordinates to, say, the observer's frame, you
need to know the source's redshift -- which is something I don't know
for the majority of the spectra in my database.  And what should happen
for, say, the Lyman forests in quasars, where there's sources with lots
of redshifts?

And now start to imagine the wealth of decisions facing your code when
people come in with CART3 coordinates, some of which are perfectly good
to define regions in SPHER2 or SPHER3.  Reject them?  Process them?
Only when you're dead sure you're not misunderstanding what people pass
your service?

This kind of thing goes on and on and on, just because there's so many
features in STC, and, to make things worse, most of them are optional
(for which there are good reasons, but again you have a combinatorial
explosion of what data you actually have available for your transform).

(3) After that, it's clear that no given service will support all of
STC-S.  To reliably operate such a service, a client would have to
discover the extent of that support (can it do coordinate
transformations?  which frames? which reference positions?  can it apply
proper motions?  will it include errors?  those I specify or those in
the data?  does it care about timeframes?  will it transform my spectral
intervals?  etc. pp).  I've thought a bit about how such a "STC-S
capabilities" record could look like, and I've come to the conclusion
that drawing up such a thing requires a greater mind than mine if the
result is supposed to work reasonably simply.

So: To use STC-S we need an STC capabilities record, and defining such a
record in a way that it is both comprehensive, useful, and usable
appears, to me, hair-raisingly close to impossible.


Part Ia: What I suggest instead

This requires a short excursion: I strongly believe we should stop
lying.  We're currently lying when we, as in current SSAP, say something
like <PARAM name="INPUT:BAND" datatype="double" unit="m"...> in the
service metadata.  That's a lie because if you actually pass in a double
("1e-7"), you'll likely get back an empty result.

What clients are expected to pass in is (for most services) something
like "1e-7/", which clearly is *not* a double literal.  The SSAP spec
even suggests something like "1e-7/2e-7,5e-7/6e-7;REST" could work --
now feed that to your favourite programming system's float parser (of
course, there aren't terribly many servers that actually support this
kind of thing, either).

There's the old saying: "If you lie to a computer, it will catch you".
Case in point: An SSA client effectively has no idea what syntaxes and
features a given service will support, which makes non-trivial all-VO SSA
queries pretty much a gamble.  This is even worse when it comes to
custom parameters; check out LOG_G support in theoretical spectral
services for a taste of why I am ranting here.

It turns out that most implementors in the real VO (not me, though, so
far, but I'll change that), when they had custom float parameters,
choose to define pairs of LOG_G_MIN and LOG_G_MAX.  Looks a bit evil on
the first glance, but it's actually close to perfect -- except you can
only specify one range, but I claim that's a good deal for no longer
having to lie, and whoever needs multiple intervals and similarly
complex stuff should be using ObsTAP anyway.  Future specifications, I
maintain, should follow suit: There are only "atomic" parameters, using
"structured" names (I'm open to discussion on whether machines should be
allowed to parse the the names to figure out that LOG_G_MIN and
LOG_G_MAX have a certain relationship: I think yes, but I also think
metadata responses should group them).

For what we've seen as STC-S usages, I therefore suggest getting the
cutout region into the service using parameters like POS_RA_MIN,
POS_RA_MAX, POS_DEC_MIN, POS_DEC_MAX.  If a service insists, it can have
POS_FRAME and must then, in a metadata PARAM VALUES child (or equivalent
if you insist not to use VOTable), let the client know which frames it
understands (but ICRS always is a must outside of solar system studies).
If people really insist on oddly-shaped regions (I don't think that's a
good idea, incidentally), you could still say CIRCLE_CENTER_RA and
friends, and by writing things out like that, you at least get a feeling
for the amount of implementation work.  Again, there's easy discovery of
features supported for clients for free.

Several such parameter names should probably be predefined in DataLink,
to the extent of the subset of STC-S we'd be willing to support *in all
services*.  A funky service that can, say, apply proper motions, could
still add POS_EPOCH and give a sensible description in its metadata
response (or datalink document), and a client can at least validate user
input against that (and maybe even make out what that is from its UCD).

The data model of those input parameters is fairly simple, so UCDs (and
possibly grouping) should do as metadata to allow clients semantically
sane and helpful user interfaces.  Or do the even righter thing and
write VO-DML, which would let you mark up where all your parameters are
in a data model (my take: overkill for this purpose, mainly because most
of the stuff that's actually requiring proper descriptions will probably
happen outside of the data model).



Part II: What about STC-S then?

There are two uses of STC-S in DaCHS (GAVO's data center software,
http://soft.g-vo.org) I actually like:

(1) STC coverage (resource profile) for registry purposes.  A resource
description could thus say something like:

  <meta name="coverage.profile">
    TimeInterval TT 1995-06-03T10:30:48 1998-01-12T01:41:56
    Circle ICRS 163 57.5 1
    SpectralInterval TOPOCENTER 1.318 1.446 unit MHz
  </meta>

This stuff is then turned into STC-X when resource records are
requested, which works fairly well.  Even there, STC-S is, really, much
too powerful, though, since the registries at the other end (would) have
to do something with this metadata.  Let's ignore for a second all the
stife about spatial specifications: If you're a registry and you harvest
the STC-X equivalent of "SpectralInterval TOPOCENTER 1.318 1.446 unit
MHz" -- what do you do with it?  To make this kind of thing useful,
you'd need to put it into a table next to, maybe, "SpectralInterval
PLUTO 1 2 unit m".  Requiring the registries to perform the magic
required to bring the two specifications to a common reference position
(which, indicentally, is advanced divination in this case since the
registry has no way of knowing what TOPOCENTER really refers to) is an
invitation to continue the current state (almost all searchable
registries have no STC support apart from waveband).

Still, the registry could define a subset of "permitted features" of STC
(only ICRS, only BARYCENTER refpos if people care about Refposses at
all, only Union and PositionInterval allowed, etc), and STC-S would still
be useful to input the data.

(2) Defining STC metadata

For this, I've made an extension to STC-S that allows column references.
Then, in the metadata declaration, you say something like

    <stc>
      Time TT "Date"
      Position ICRS CART3 Epoch J2010 "alpha" "delta" "distance"
      Velocity "mualpha" "mudelta" "radialvelocity"
      Redshift OPTICAL "z"
    </stc>

or (this is for SSAP):

    <stc>
      Time TT "ssa_dateObs" Size "ssa_timeExt" 
      Position ICRS [ssa_location] Size "ssa_aperture" "ssa_aperture"
      SpectralInterval "ssa_specstart" "ssa_specend"
        Spectral "ssa_specmid" Size "ssa_specext"
    </stc>


These then get translated into VOTable STC declarations
(http://www.ivoa.net/Documents/Notes/VOTableSTC/) -- and here, I'd say
we can be generous with the features.  On the client side, it's much
easier to communicate "I don't understand that particular feature of the
metadata description" or just "Here's what STC metadata I have -- now,
dear astronomer, make sense of that yourself".  Indeed, I had to
extend my "private" STC-S with the concepts of epoch and planetary
ephemeris, and to allow automatic error estimates I'd yet need the
concept of the mean epoch.

So -- when all you want is a structured description that is basically
directed at a scientist, STC's wealth of features is just fine (I'd even
advocate some additions).  But note again that the recipient here is not
(really) a program, it's a human that can decide what to do and how much
effort should go into bringing some data together.


My conclusion: Whenever you actually deal with STC instances, and you're
ready to do so (taking into account that nobody so far can do fancy
computations with a significant subset of it), STC-S has a place as a
convenient language to input them (as opposed to, e.g., STC-X or their
VOTable serialization, both of which you *really* don't want to type).
This -- and not the use in protocols, for which full STC is far to
heavyweight and prescribing systems, units, and such makes much more
sense -- is the niche I see for STC-S.


While I'm speaking, I've not been too happy by the combination of
positional and keyword+positional elements in current STC-S (Quick:
which of the following two STC-S specifications is valid (only one
answer possible):

(1) Position ICRS unit m pixsize 1 2
(2) Position ICRS pixsize 1 2 unit m
)

I'd therefore like to suggest that we should relax some of those
constraints and probably move everything to keyword/value except what's
already been used in actual protocols and clients; I'd expect that's
only the reference system, so we'd by fine as long as stuff like

Box ICRS 1 2 3 4

would remain being valid STC-S.


And here's now my offer: I'd write up EBNF and accompanying prose for
something that's "pretty much" like STC-S according to the current note,
leaving existing usages of STC-S intact and simplifying the remaining
rules to, e.g., allow both (1) and (2).  I'd have it ready for Hawaii,
complete with an implementation that at least can move the stuff to
STC-X and VOTable utypes.

Both encouragement and, erm, well, let's say discouragement is welcome
(since this definitely would not be a standard I'd enjoy writing, I'd
actually appreciate the latter a bit more...).

Cheers,

      Markus



More information about the dal mailing list