STC-S in DataLink

Thu Jun 20 14:45:12 PDT 2013

Le 20/06/2013 20:28, François Bonnarel a écrit :
> Dear Markus,
> Dear all,
>     Not to answer to all this interesting discussion you are opening 
> now, but just to give a small caveat about your title/subject.
> I would say it's not "STC-S in Datalink" but something like
> "STC-S in cutout services and access data methods".
> This kind of services and methods will be part of the ressources 
> Datalink will attach to Dataproducts indeed,
> but according to the discussion during Heidelberg interop last month 
> Datalink protocol in itself is only describing
> the nature , format, type and semantics or descriptions of the links 
> and will say nothing about the  ressources parameters themselves
>
> Tommorrow I willk concentrate on the actual matter of your mail
> Best regards
> François
>
>
> Le 20/06/2013 16:46, Markus Demleitner a écrit :
>> Dear DAL group,
>>
>> Since this is going to be a long mail, I feel obliged to start with
>> an
>>
>> Abstract
>>
>>    In the context of DataLink, there is renewed interest in STC-S on a
>>    protocol level, i.e., for passing shape descriptions into services.
>>    I believe we should not do this.
>>
>>    In a first part of this (longish) mail, I try to make this point.
>>    There's a second part below in which I try to outline how STC-S *has*
>>    worked for me, and under which conditions; that part is basically
>>    something like "If you agree with me, encourage me, and I'll work 
>> on an
>>    STC-S working draft with actual EBNF."
>>
>> Sorry for this long soliloquy, but I am very sure this is an
>> important point for the later interoperability of DataLink-based
>> services.
>>
>>
>> Part I: STC-S in Protocols
>>
>> I'm arguing against usages like "CUTOUT=Ellipse ICRS 33 45 4 5 unit mas
>> SpectralInterval TOPOCENTER 55 65 unit MeV pixelSize 1" in protocol
>> parameters, and actually against abusing STC-S in cases where you just
>> want to define some shape in spherical geometry.  Here are my reasons:
>>
>> (1) Mashing data and metdata is like denormalizing databases:
>> You may get away with it and save some work, but if you've not
>> understood exactly what you're doing, you'll almost certainly regret it.
>>
>> (2) STC-S covers an enormous wealth of features.  Even suggesting that
>> all services should be able to transform, e.g., wavelength into the rest
>> system is not going to be a high incentive to make people take up a
>> standard.  Don't wave at "there's going to be a library" -- there's not.
>> Many years after STC-S was published and the STC DM passed, all we have
>> are some libraries that can -- more or less -- parse STC-S and spit the
>> stuff out in some other form.
>>
>> Actually *doing* something with what's parsed is something completely
>> different.  In my STC library I'm allowing some "conforming" (making one
>> STC spec use the reference frame, units, and such of another STC spec),
>> but exactly because STC is a complex beast, that's no fun at all.  That
>> was part of the pain I alluded to above, and I'm still ignoring most
>> things that actually are complicated (like tranfroming spatial
>> coordinates from the EMBARYCENTER reference position to the PLUTO
>> reference position).
>>
>> But worse (just for this example) -- if you want to transform positions
>> for differing reference positions, you need to know the source's
>> distance, and it's completely unclear how to do that for images.
>> Transforming spectral coordinates to, say, the observer's frame, you
>> need to know the source's redshift -- which is something I don't know
>> for the majority of the spectra in my database.  And what should happen
>> for, say, the Lyman forests in quasars, where there's sources with lots
>> of redshifts?
>>
>> And now start to imagine the wealth of decisions facing your code when
>> people come in with CART3 coordinates, some of which are perfectly good
>> to define regions in SPHER2 or SPHER3.  Reject them?  Process them?
>> Only when you're dead sure you're not misunderstanding what people pass
>> your service?
>>
>> This kind of thing goes on and on and on, just because there's so many
>> features in STC, and, to make things worse, most of them are optional
>> (for which there are good reasons, but again you have a combinatorial
>> explosion of what data you actually have available for your transform).
>>
>> (3) After that, it's clear that no given service will support all of
>> STC-S.  To reliably operate such a service, a client would have to
>> discover the extent of that support (can it do coordinate
>> transformations?  which frames? which reference positions?  can it apply
>> proper motions?  will it include errors?  those I specify or those in
>> the data?  does it care about timeframes?  will it transform my spectral
>> intervals?  etc. pp).  I've thought a bit about how such a "STC-S
>> capabilities" record could look like, and I've come to the conclusion
>> that drawing up such a thing requires a greater mind than mine if the
>> result is supposed to work reasonably simply.
>>
>> So: To use STC-S we need an STC capabilities record, and defining such a
>> record in a way that it is both comprehensive, useful, and usable
>> appears, to me, hair-raisingly close to impossible.
>>
>>
>> Part Ia: What I suggest instead
>>
>> This requires a short excursion: I strongly believe we should stop
>> lying.  We're currently lying when we, as in current SSAP, say something
>> like<PARAM name="INPUT:BAND" datatype="double" unit="m"...>  in the
>> service metadata.  That's a lie because if you actually pass in a double
>> ("1e-7"), you'll likely get back an empty result.
>>
>> What clients are expected to pass in is (for most services) something
>> like "1e-7/", which clearly is *not* a double literal.  The SSAP spec
>> even suggests something like "1e-7/2e-7,5e-7/6e-7;REST" could work --
>> now feed that to your favourite programming system's float parser (of
>> course, there aren't terribly many servers that actually support this
>> kind of thing, either).
>>
>> There's the old saying: "If you lie to a computer, it will catch you".
>> Case in point: An SSA client effectively has no idea what syntaxes and
>> features a given service will support, which makes non-trivial all-VO 
>> SSA
>> queries pretty much a gamble.  This is even worse when it comes to
>> custom parameters; check out LOG_G support in theoretical spectral
>> services for a taste of why I am ranting here.
>>
>> It turns out that most implementors in the real VO (not me, though, so
>> far, but I'll change that), when they had custom float parameters,
>> choose to define pairs of LOG_G_MIN and LOG_G_MAX.  Looks a bit evil on
>> the first glance, but it's actually close to perfect -- except you can
>> only specify one range, but I claim that's a good deal for no longer
>> having to lie, and whoever needs multiple intervals and similarly
>> complex stuff should be using ObsTAP anyway.  Future specifications, I
>> maintain, should follow suit: There are only "atomic" parameters, using
>> "structured" names (I'm open to discussion on whether machines should be
>> allowed to parse the the names to figure out that LOG_G_MIN and
>> LOG_G_MAX have a certain relationship: I think yes, but I also think
>> metadata responses should group them).
>>
>> For what we've seen as STC-S usages, I therefore suggest getting the
>> cutout region into the service using parameters like POS_RA_MIN,
>> POS_RA_MAX, POS_DEC_MIN, POS_DEC_MAX.  If a service insists, it can have
>> POS_FRAME and must then, in a metadata PARAM VALUES child (or equivalent
>> if you insist not to use VOTable), let the client know which frames it
>> understands (but ICRS always is a must outside of solar system studies).
>> If people really insist on oddly-shaped regions (I don't think that's a
>> good idea, incidentally), you could still say CIRCLE_CENTER_RA and
>> friends, and by writing things out like that, you at least get a feeling
>> for the amount of implementation work.  Again, there's easy discovery of
>> features supported for clients for free.
>>
>> Several such parameter names should probably be predefined in DataLink,
>> to the extent of the subset of STC-S we'd be willing to support *in all
>> services*.  A funky service that can, say, apply proper motions, could
>> still add POS_EPOCH and give a sensible description in its metadata
>> response (or datalink document), and a client can at least validate user
>> input against that (and maybe even make out what that is from its UCD).
>>
>> The data model of those input parameters is fairly simple, so UCDs (and
>> possibly grouping) should do as metadata to allow clients semantically
>> sane and helpful user interfaces.  Or do the even righter thing and
>> write VO-DML, which would let you mark up where all your parameters are
>> in a data model (my take: overkill for this purpose, mainly because most
>> of the stuff that's actually requiring proper descriptions will probably
>> happen outside of the data model).
>>
>>
>>
>> Part II: What about STC-S then?
>>
>> There are two uses of STC-S in DaCHS (GAVO's data center software,
>> http://soft.g-vo.org) I actually like:
>>
>> (1) STC coverage (resource profile) for registry purposes.  A resource
>> description could thus say something like:
>>
>> <meta name="coverage.profile">
>>      TimeInterval TT 1995-06-03T10:30:48 1998-01-12T01:41:56
>>      Circle ICRS 163 57.5 1
>>      SpectralInterval TOPOCENTER 1.318 1.446 unit MHz
>> </meta>
>>
>> This stuff is then turned into STC-X when resource records are
>> requested, which works fairly well.  Even there, STC-S is, really, much
>> too powerful, though, since the registries at the other end (would) have
>> to do something with this metadata.  Let's ignore for a second all the
>> stife about spatial specifications: If you're a registry and you harvest
>> the STC-X equivalent of "SpectralInterval TOPOCENTER 1.318 1.446 unit
>> MHz" -- what do you do with it?  To make this kind of thing useful,
>> you'd need to put it into a table next to, maybe, "SpectralInterval
>> PLUTO 1 2 unit m".  Requiring the registries to perform the magic
>> required to bring the two specifications to a common reference position
>> (which, indicentally, is advanced divination in this case since the
>> registry has no way of knowing what TOPOCENTER really refers to) is an
>> invitation to continue the current state (almost all searchable
>> registries have no STC support apart from waveband).
>>
>> Still, the registry could define a subset of "permitted features" of STC
>> (only ICRS, only BARYCENTER refpos if people care about Refposses at
>> all, only Union and PositionInterval allowed, etc), and STC-S would 
>> still
>> be useful to input the data.
>>
>> (2) Defining STC metadata
>>
>> For this, I've made an extension to STC-S that allows column references.
>> Then, in the metadata declaration, you say something like
>>
>> <stc>
>>        Time TT "Date"
>>        Position ICRS CART3 Epoch J2010 "alpha" "delta" "distance"
>>        Velocity "mualpha" "mudelta" "radialvelocity"
>>        Redshift OPTICAL "z"
>> </stc>
>>
>> or (this is for SSAP):
>>
>> <stc>
>>        Time TT "ssa_dateObs" Size "ssa_timeExt"
>>        Position ICRS [ssa_location] Size "ssa_aperture" "ssa_aperture"
>>        SpectralInterval "ssa_specstart" "ssa_specend"
>>          Spectral "ssa_specmid" Size "ssa_specext"
>> </stc>
>>
>>
>> These then get translated into VOTable STC declarations
>> (http://www.ivoa.net/Documents/Notes/VOTableSTC/) -- and here, I'd say
>> we can be generous with the features.  On the client side, it's much
>> easier to communicate "I don't understand that particular feature of the
>> metadata description" or just "Here's what STC metadata I have -- now,
>> dear astronomer, make sense of that yourself".  Indeed, I had to
>> extend my "private" STC-S with the concepts of epoch and planetary
>> ephemeris, and to allow automatic error estimates I'd yet need the
>> concept of the mean epoch.
>>
>> So -- when all you want is a structured description that is basically
>> directed at a scientist, STC's wealth of features is just fine (I'd even
>> advocate some additions).  But note again that the recipient here is not
>> (really) a program, it's a human that can decide what to do and how much
>> effort should go into bringing some data together.
>>
>>
>> My conclusion: Whenever you actually deal with STC instances, and you're
>> ready to do so (taking into account that nobody so far can do fancy
>> computations with a significant subset of it), STC-S has a place as a
>> convenient language to input them (as opposed to, e.g., STC-X or their
>> VOTable serialization, both of which you *really* don't want to type).
>> This -- and not the use in protocols, for which full STC is far to
>> heavyweight and prescribing systems, units, and such makes much more
>> sense -- is the niche I see for STC-S.
>>
>>
>> While I'm speaking, I've not been too happy by the combination of
>> positional and keyword+positional elements in current STC-S (Quick:
>> which of the following two STC-S specifications is valid (only one
>> answer possible):
>>
>> (1) Position ICRS unit m pixsize 1 2
>> (2) Position ICRS pixsize 1 2 unit m
>> )
>>
>> I'd therefore like to suggest that we should relax some of those
>> constraints and probably move everything to keyword/value except what's
>> already been used in actual protocols and clients; I'd expect that's
>> only the reference system, so we'd by fine as long as stuff like
>>
>> Box ICRS 1 2 3 4
>>
>> would remain being valid STC-S.
>>
>>
>> And here's now my offer: I'd write up EBNF and accompanying prose for
>> something that's "pretty much" like STC-S according to the current note,
>> leaving existing usages of STC-S intact and simplifying the remaining
>> rules to, e.g., allow both (1) and (2).  I'd have it ready for Hawaii,
>> complete with an implementation that at least can move the stuff to
>> STC-X and VOTable utypes.
>>
>> Both encouragement and, erm, well, let's say discouragement is welcome
>> (since this definitely would not be a standard I'd enjoy writing, I'd
>> actually appreciate the latter a bit more...).
>>
>> Cheers,
>>
>>        Markus
>>
>