STC-S in DataLink

Thu Jun 20 14:40:36 PDT 2013

Markus,

A nice soliloquy :-) Thank you.

I'll make some marginal comments, to provide a little more perspective (I
hope).

The main issue seems to be the number of metadata permutations making STC-S
too complicated to be practical.

[As an aside, the irony is that STC-S started out as covering a limited
subset of
STC-X, to keep it simple. But at the urging of a small number of people
(among
whom Markus featured prominently ;-) ) more and more got added.]

I don't think saying that it can be kept simple is a lie.
If we allow STC-S strings to be used to provide the coordinate metadata in
a DAL
protocol, that protocol's standard can very well state that only ICRS and
GALACTIC
are allowed for the spatial reference system and that these are required to
be 2-D
spherical. Planetary systems already are excluded.
Other items can be subject to similar restrictions.
Another important question in this context is how one expects DAL servers
to be
configured.
I may be wrong, but to me it makes most sense if a generic thin server be
built
for general use that accepts the query and translates it into something
that makes
sense to the server's local repository. It would not be hard to include in
that module
equatorial-to-Galactic (and vice versa) transformation; the same for
handling units.
The point is that this only need to be invented once and can be completely
transparent
to repository resources.
A related point is the reference position. Sure, TOPOCENTER does not mean
a whole lot if the observer's location is unknown - that is provided for in
STC-X, but
left out of STC-S. But it does tell the client that it was within 1.5 *
10^11 m from the
barycenter - and then the client can decide whether that is good enough.
Labeling
it as BARYCENTER would be OK for Time (if properly transformed), but a lie
for the
spatial coordinates.

I hope this helps to give people a feeling that the situation is not as
dire as Markus
paints it and that STC-S is actually usable, provided appropriate
restrictions are
imposed by the protocols that use it and generic client-interface server
modules.

One more aside: I noticed that Markus had an example which listed three
velocity
components (mu_ra, mu_dec, and radial_velocity) and a redshift. However, if
that
radial velocity is actually a Doppler velocity (and I may be wrong on
this), then it
is in the wrong place and should be on the Redshift coordinate axis (which
then
should be called Doppler velocity, to be more helpful).

Cheers,

  - Arnold

-------------------------------------------------------------------------------------------------------------
Arnold H. Rots                                          Chandra X-ray
Science Center
Smithsonian Astrophysical Observatory                   tel:  +1 617 496
7701
60 Garden Street, MS 67                                      fax:  +1 617
495 7356
Cambridge, MA 02138
arots at cfa.harvard.edu
USA
http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------------------------------------------

On Thu, Jun 20, 2013 at 10:46 AM, Markus Demleitner <
msdemlei at ari.uni-heidelberg.de> wrote:

> Dear DAL group,
>
> Since this is going to be a long mail, I feel obliged to start with
> an
>
> Abstract
>
>   In the context of DataLink, there is renewed interest in STC-S on a
>   protocol level, i.e., for passing shape descriptions into services.
>   I believe we should not do this.
>
>   In a first part of this (longish) mail, I try to make this point.
>   There's a second part below in which I try to outline how STC-S *has*
>   worked for me, and under which conditions; that part is basically
>   something like "If you agree with me, encourage me, and I'll work on an
>   STC-S working draft with actual EBNF."
>
> Sorry for this long soliloquy, but I am very sure this is an
> important point for the later interoperability of DataLink-based
> services.
>
>
> Part I: STC-S in Protocols
>
> I'm arguing against usages like "CUTOUT=Ellipse ICRS 33 45 4 5 unit mas
> SpectralInterval TOPOCENTER 55 65 unit MeV pixelSize 1" in protocol
> parameters, and actually against abusing STC-S in cases where you just
> want to define some shape in spherical geometry.  Here are my reasons:
>
> (1) Mashing data and metdata is like denormalizing databases:
> You may get away with it and save some work, but if you've not
> understood exactly what you're doing, you'll almost certainly regret it.
>
> (2) STC-S covers an enormous wealth of features.  Even suggesting that
> all services should be able to transform, e.g., wavelength into the rest
> system is not going to be a high incentive to make people take up a
> standard.  Don't wave at "there's going to be a library" -- there's not.
> Many years after STC-S was published and the STC DM passed, all we have
> are some libraries that can -- more or less -- parse STC-S and spit the
> stuff out in some other form.
>
> Actually *doing* something with what's parsed is something completely
> different.  In my STC library I'm allowing some "conforming" (making one
> STC spec use the reference frame, units, and such of another STC spec),
> but exactly because STC is a complex beast, that's no fun at all.  That
> was part of the pain I alluded to above, and I'm still ignoring most
> things that actually are complicated (like tranfroming spatial
> coordinates from the EMBARYCENTER reference position to the PLUTO
> reference position).
>
> But worse (just for this example) -- if you want to transform positions
> for differing reference positions, you need to know the source's
> distance, and it's completely unclear how to do that for images.
> Transforming spectral coordinates to, say, the observer's frame, you
> need to know the source's redshift -- which is something I don't know
> for the majority of the spectra in my database.  And what should happen
> for, say, the Lyman forests in quasars, where there's sources with lots
> of redshifts?
>
> And now start to imagine the wealth of decisions facing your code when
> people come in with CART3 coordinates, some of which are perfectly good
> to define regions in SPHER2 or SPHER3.  Reject them?  Process them?
> Only when you're dead sure you're not misunderstanding what people pass
> your service?
>
> This kind of thing goes on and on and on, just because there's so many
> features in STC, and, to make things worse, most of them are optional
> (for which there are good reasons, but again you have a combinatorial
> explosion of what data you actually have available for your transform).
>
> (3) After that, it's clear that no given service will support all of
> STC-S.  To reliably operate such a service, a client would have to
> discover the extent of that support (can it do coordinate
> transformations?  which frames? which reference positions?  can it apply
> proper motions?  will it include errors?  those I specify or those in
> the data?  does it care about timeframes?  will it transform my spectral
> intervals?  etc. pp).  I've thought a bit about how such a "STC-S
> capabilities" record could look like, and I've come to the conclusion
> that drawing up such a thing requires a greater mind than mine if the
> result is supposed to work reasonably simply.
>
> So: To use STC-S we need an STC capabilities record, and defining such a
> record in a way that it is both comprehensive, useful, and usable
> appears, to me, hair-raisingly close to impossible.
>
>
> Part Ia: What I suggest instead
>
> This requires a short excursion: I strongly believe we should stop
> lying.  We're currently lying when we, as in current SSAP, say something
> like <PARAM name="INPUT:BAND" datatype="double" unit="m"...> in the
> service metadata.  That's a lie because if you actually pass in a double
> ("1e-7"), you'll likely get back an empty result.
>
> What clients are expected to pass in is (for most services) something
> like "1e-7/", which clearly is *not* a double literal.  The SSAP spec
> even suggests something like "1e-7/2e-7,5e-7/6e-7;REST" could work --
> now feed that to your favourite programming system's float parser (of
> course, there aren't terribly many servers that actually support this
> kind of thing, either).
>
> There's the old saying: "If you lie to a computer, it will catch you".
> Case in point: An SSA client effectively has no idea what syntaxes and
> features a given service will support, which makes non-trivial all-VO SSA
> queries pretty much a gamble.  This is even worse when it comes to
> custom parameters; check out LOG_G support in theoretical spectral
> services for a taste of why I am ranting here.
>
> It turns out that most implementors in the real VO (not me, though, so
> far, but I'll change that), when they had custom float parameters,
> choose to define pairs of LOG_G_MIN and LOG_G_MAX.  Looks a bit evil on
> the first glance, but it's actually close to perfect -- except you can
> only specify one range, but I claim that's a good deal for no longer
> having to lie, and whoever needs multiple intervals and similarly
> complex stuff should be using ObsTAP anyway.  Future specifications, I
> maintain, should follow suit: There are only "atomic" parameters, using
> "structured" names (I'm open to discussion on whether machines should be
> allowed to parse the the names to figure out that LOG_G_MIN and
> LOG_G_MAX have a certain relationship: I think yes, but I also think
> metadata responses should group them).
>
> For what we've seen as STC-S usages, I therefore suggest getting the
> cutout region into the service using parameters like POS_RA_MIN,
> POS_RA_MAX, POS_DEC_MIN, POS_DEC_MAX.  If a service insists, it can have
> POS_FRAME and must then, in a metadata PARAM VALUES child (or equivalent
> if you insist not to use VOTable), let the client know which frames it
> understands (but ICRS always is a must outside of solar system studies).
> If people really insist on oddly-shaped regions (I don't think that's a
> good idea, incidentally), you could still say CIRCLE_CENTER_RA and
> friends, and by writing things out like that, you at least get a feeling
> for the amount of implementation work.  Again, there's easy discovery of
> features supported for clients for free.
>
> Several such parameter names should probably be predefined in DataLink,
> to the extent of the subset of STC-S we'd be willing to support *in all
> services*.  A funky service that can, say, apply proper motions, could
> still add POS_EPOCH and give a sensible description in its metadata
> response (or datalink document), and a client can at least validate user
> input against that (and maybe even make out what that is from its UCD).
>
> The data model of those input parameters is fairly simple, so UCDs (and
> possibly grouping) should do as metadata to allow clients semantically
> sane and helpful user interfaces.  Or do the even righter thing and
> write VO-DML, which would let you mark up where all your parameters are
> in a data model (my take: overkill for this purpose, mainly because most
> of the stuff that's actually requiring proper descriptions will probably
> happen outside of the data model).
>
>
>
> Part II: What about STC-S then?
>
> There are two uses of STC-S in DaCHS (GAVO's data center software,
> http://soft.g-vo.org) I actually like:
>
> (1) STC coverage (resource profile) for registry purposes.  A resource
> description could thus say something like:
>
>   <meta name="coverage.profile">
>     TimeInterval TT 1995-06-03T10:30:48 1998-01-12T01:41:56
>     Circle ICRS 163 57.5 1
>     SpectralInterval TOPOCENTER 1.318 1.446 unit MHz
>   </meta>
>
> This stuff is then turned into STC-X when resource records are
> requested, which works fairly well.  Even there, STC-S is, really, much
> too powerful, though, since the registries at the other end (would) have
> to do something with this metadata.  Let's ignore for a second all the
> stife about spatial specifications: If you're a registry and you harvest
> the STC-X equivalent of "SpectralInterval TOPOCENTER 1.318 1.446 unit
> MHz" -- what do you do with it?  To make this kind of thing useful,
> you'd need to put it into a table next to, maybe, "SpectralInterval
> PLUTO 1 2 unit m".  Requiring the registries to perform the magic
> required to bring the two specifications to a common reference position
> (which, indicentally, is advanced divination in this case since the
> registry has no way of knowing what TOPOCENTER really refers to) is an
> invitation to continue the current state (almost all searchable
> registries have no STC support apart from waveband).
>
> Still, the registry could define a subset of "permitted features" of STC
> (only ICRS, only BARYCENTER refpos if people care about Refposses at
> all, only Union and PositionInterval allowed, etc), and STC-S would still
> be useful to input the data.
>
> (2) Defining STC metadata
>
> For this, I've made an extension to STC-S that allows column references.
> Then, in the metadata declaration, you say something like
>
>     <stc>
>       Time TT "Date"
>       Position ICRS CART3 Epoch J2010 "alpha" "delta" "distance"
>       Velocity "mualpha" "mudelta" "radialvelocity"
>       Redshift OPTICAL "z"
>     </stc>
>
> or (this is for SSAP):
>
>     <stc>
>       Time TT "ssa_dateObs" Size "ssa_timeExt"
>       Position ICRS [ssa_location] Size "ssa_aperture" "ssa_aperture"
>       SpectralInterval "ssa_specstart" "ssa_specend"
>         Spectral "ssa_specmid" Size "ssa_specext"
>     </stc>
>
>
> These then get translated into VOTable STC declarations
> (http://www.ivoa.net/Documents/Notes/VOTableSTC/) -- and here, I'd say
> we can be generous with the features.  On the client side, it's much
> easier to communicate "I don't understand that particular feature of the
> metadata description" or just "Here's what STC metadata I have -- now,
> dear astronomer, make sense of that yourself".  Indeed, I had to
> extend my "private" STC-S with the concepts of epoch and planetary
> ephemeris, and to allow automatic error estimates I'd yet need the
> concept of the mean epoch.
>
> So -- when all you want is a structured description that is basically
> directed at a scientist, STC's wealth of features is just fine (I'd even
> advocate some additions).  But note again that the recipient here is not
> (really) a program, it's a human that can decide what to do and how much
> effort should go into bringing some data together.
>
>
> My conclusion: Whenever you actually deal with STC instances, and you're
> ready to do so (taking into account that nobody so far can do fancy
> computations with a significant subset of it), STC-S has a place as a
> convenient language to input them (as opposed to, e.g., STC-X or their
> VOTable serialization, both of which you *really* don't want to type).
> This -- and not the use in protocols, for which full STC is far to
> heavyweight and prescribing systems, units, and such makes much more
> sense -- is the niche I see for STC-S.
>
>
> While I'm speaking, I've not been too happy by the combination of
> positional and keyword+positional elements in current STC-S (Quick:
> which of the following two STC-S specifications is valid (only one
> answer possible):
>
> (1) Position ICRS unit m pixsize 1 2
> (2) Position ICRS pixsize 1 2 unit m
> )
>
> I'd therefore like to suggest that we should relax some of those
> constraints and probably move everything to keyword/value except what's
> already been used in actual protocols and clients; I'd expect that's
> only the reference system, so we'd by fine as long as stuff like
>
> Box ICRS 1 2 3 4
>
> would remain being valid STC-S.
>
>
> And here's now my offer: I'd write up EBNF and accompanying prose for
> something that's "pretty much" like STC-S according to the current note,
> leaving existing usages of STC-S intact and simplifying the remaining
> rules to, e.g., allow both (1) and (2).  I'd have it ready for Hawaii,
> complete with an implementation that at least can move the stuff to
> STC-X and VOTable utypes.
>
> Both encouragement and, erm, well, let's say discouragement is welcome
> (since this definitely would not be a standard I'd enjoy writing, I'd
> actually appreciate the latter a bit more...).
>
> Cheers,
>
>       Markus
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/dal/attachments/20130620/f1d1b395/attachment-0001.html>