STC-S (with a view to DataLink)

Mon Jun 24 08:53:50 PDT 2013

Hi Markus et al.,

On Mon, 24 Jun 2013, Markus Demleitner wrote:

> Dear DAL list,
> 
> For those just coming in or wondering what the fuss is about, see a
> little example close to the bottom of this mail.
> 
> The fact that nobody spoke out in favour of atomic parameters so far
> is quite a heavy downpour on my parade, not to speak of my thunder
> the unexplained disappearence of which I regret.
> 
> Still, since I believe this is an important choice and I'm really
> worried by the SSAP precedent, I'll try again once more, and again
> with a diatribe bespeaking my secret love for the  humanities, at
> least through its length.  If then, still, nobody shows signs of
> starting to agree with me, I'll shut up, ok?

I have not really been following DataLink, but since Markus has
asked so nicely for opinions on this I will give mine, to the
extent that I have some.

1. I agree that as far as I can see STC-S makes more sense as a
write-only format than as a language for communicating coordinate
information between machines who want to do something non-trivial
with it.  If what you want to do is to record
some coordinate metadata in a well-defined way, I'm prepared to
believe that STC does a good job (I don't put it any stronger than that
since I don't have a good understanding of the underlying science).
But as Markus says, it's clearly not possible to write a component
which will take an arbitrary STC specification and turn it
into some usable coordinates (PLUTO reference positions etc).
So if STC-S text is going to be used as a value passed as a
protocol parameter, either there has to be careful thought given
to how it's going to be restricted, or it's going to be guesswork
for a given client/user whether a particular STC-S string will
make any sense to the service.  It's not a good starting point.
If people are keen to do it this way, I think the following
questions have to be considered explicitly:

   - What subset of STC-S would be permitted?
   - Does such an STC-S subset do a better job (in terms of how easy
     it is to handle by clients/servers, and how expressive it is)
     than much simpler options like _MIN/_MAX?

2. I strongly believe that keeping protocols simple is a very good
thing.  Paying for less complexity in the protocol by having to do
more work in the software using it (on either or both client or
server side) is nearly always a bargain worth making (my main
justification for this is that client and server implementations
can be, and often are, changed when it becomes clear they are
not working well, but standard protocols stick around for years
to plague you; I don't think I need to provide examples).
If the protocol really needs expressiveness which can't be obtained
by making it simple, and that can be justified, well so be it,
but don't put stuff in just because it looks neat.  Of course that
in itself leaves open the question of what counts as a simple
protocol - but to me something with a bunch of _MAX/_MINs, while
messy, is more comprehensible than essentially pulling in
a standard as involved (and, indeed, currently non-existent)
as STC-S with some as yet unspecified list of appropriate
restrictions.

To be a bit more concrete, here is a way you might trade complexity
in the protocol for complexity in the implementation.  Suppose you
care about enabling the capability for a software tool to acquire a
weird-shaped cutout of something or other.  Here are a couple of
ways you might go about that:

   A: Define the protocol to accept STC-S-with-weird-shapes capability.
      When the client wants a weird shape, it assembles the appropriate
      STC-S string, passes it to the service, and gets the cutout back.
      Job done.  There needs to be thought given up front to what
      subset of STC-S is permitted (how weird is allowed).
      Implementing the services is then pretty hard because they all
      have to understand STC weird shapes.  However the client's job
      is easy (well, maybe; is that STC-S string generated by input
      from a well-educated user, or from elsewhere?  Are the services
      actually all going to implement it right or will some just fail?)

   B: Define the protocol to take a bunch of dumb MIN/MAX pairs.
      When the client wants a weird shape it acquires some understanding
      of the weird shape, translates it into an N-dimensional
      rectangular bounding box, sends the corresponding query to the
      service, gets the result and then manipulates it locally to
      turn it into the requested shape for presentation to the user.
      Service implementation is easy (hence probably done right)
      but the client may have to do some work and probably ends
      up throwing away some of the data it gets from the service.

I vote B.  Reasons include (i) clients don't have a language imposed on
them for thinking about weird shape geometry (and services don't have
to think about it at all) and (ii) I bet most clients don't even want
to do weird shapes so the hard work of A is wasted most of the time.
I would only vote A if I was persuaded that weird shapes are a common
use case *AND* that you need to ship a lot more bytes over the wire
to transfer a rectangular box than for common weird shapes.

You could also do:

   C: Define the protocol to take a very restricted STC-S that
      just lets you specify RA and Dec (or something a bit more
      complicated) so it's easy for services to implement.

But is there really a point to that - does it buy you something that
you can't get from MIN/MAX pairs without (even restricted) reference
to a big complicated standard?  Well, maybe, but I'd like to see
use cases.

The two points above I consider to be broadly in support of Markus's
original message.

3. Markus said this:

> On Declaring Protocol Parameters
> ================================
> 
> Knowing full well I'm sounding like a broken record: This is what all
> this is really about.  We *must* define our services such that the
> knowledge of the protocol together with whatever service metadata we
> specify lets a (machine) client discover how valid requests to the
> service are constructed (i.e., in particular what parameters are
> supported and what literals are expected in each parameter).  Bonus
> points if the client can suggest values that actually return values
> to the user to alleviate the horror vacui in front of an interface

I'll go with the bonus points part, but I actually disagree with 
the main assertion.  I am not persuaded that it's necessary to
define a language as part of DataLink or other similar protocols
to describe to machines what counts as a valid query as regards
custom parameters.  For custom parameters (ones for which the
semantics are not specified in the standard) it will usually be
a human entering the value, so attaching human-readable metadata
that conveys this information would do the job at least as well.
That is, I would favour

   <PARAM name="INPUT:REGION">
     <DESCRIPTION>
       Region of query; use STC-S v1.33 as described in 
       http://www.ivoa.net/documents/Notes/STC-S/
       but it's only for regions on the sky, Convexes are not supported,
       and for goodness sake don't specify a reference position
       on one of the outer planets.
     </DESCRIPTION>
   </PARAM>

over

   <PARAM name="INPUT:REGION" datatype="char">
     <PARAM name="param-type" value="STC-S"/>
     <PARAM name="stc:version" value="1.33"/>
     <PARAM name="stc:shapes" value="PositionInterval,AllSky,Circle,Ellipse,Box,Polygon"/>
     <PARAM name="stc:axes" value="space"/>
     <PARAM name="stc:refpos" value="GEOCENTER,BARYCENTER,HELIOCENTER,TOPOCENTER,GALACTIC_CENTER,EMBARYCENTER"/>
     ...
   </PARAM>

yes there's all sorts of things wrong with the way I've written that
example but you get the idea.  The more complicated the 
syntax-specification language is, the harder it is for the machine
client to (a) understand it and (b) exploit it to inform the user
in a comprehensible fashion.  They probably won't bother.

For cases where it's straightforward to specify in the protocol
how to give the user hints about filling in parameter value
(a list of options is a good example) sure let's do it, but I don't
think it's incumbent on a standard, or even desirable, to provide a
language which can describe everything about the allowable values
in a machine-readable form.  It won't be powerful enough to do
it properly in any case (e.g. to disallow combinations of parameters
that don't make sense, cf. over-reliance on XML Schema).
So, I'm not even sure that the VALUES/{MIN,MAX} elements are
that much use.

Markus also said:

> My take on this: if there's a strong use case requiring those complex
> types, then it should carry for adding them to VOTable, too; if it's too
> weak for that, then maybe they shouldn't be in the protocols in the
> first place.

Erk!  I'm hoping that this is just by way of reductio ad absurdum
rather than a genuine suggestion.  If it begins to look otherwise
I will certainly have things to say about it.

> Note, however, that all the propsed new types (intervals, geometries)
> would also require extensions to the VOTable VALUES element, e.g.,
> because, being isomorphic to the R^n, none of them is (meaningfully)
> orderable, and hence MIN and MAX aren't terribly useful.  Of course,
> the original sin has been committed there already since we have
> arrays and complex numbers, for which MIN and MAX aren't well-defined
> either.

Related to this, I didn't realise until skimming the SSA document 
just now that VOTable PARAM elements are used to specify non-standard
parameters in SSA, and I don't know whether it's been decided to
do that in DataLink or whether that's still up for grabs.

For my money that looks pretty questionable, and possibly based on
a misunderstanding of the element name "PARAM".  I understood
PARAM in VOTable to be a contraction of the term "table parameter",
i.e. an item of per-table metadata, rather than anything to do
with the parameters in the sense of RPC which tell a service what
it's supposed to be doing.  Of course the usages are not completely
unconnected, but the VOTable PARAM element certainly wasn't designed
for specifying service parameters, and if its capabilities don't
match the requirements for doing that I'm not very surprised.

Yours not necessarily committing to further deep engagement in DataLink,

Mark

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/