STC-S (with a view to DataLink)

François Bonnarel francois.bonnarel at astro.unistra.fr
Wed Jul 3 08:49:18 PDT 2013


Hi all,
     My little views on this topic.

A ) First of all, and again, we are not speaking of DataLink. DataLink 
is a set of rules to describe and access ressources associated to 
"already discovered" datasets. Some of these resources may be other DAL 
services requiring Coordinate or regions constraints. Hence the 
confusion. But this is the job of these services to define how these 
constraints have to be expressed.

B ) Application field for STC-S versus MIN/MAX Parameters discussion:
    Where: DAL services using some kind of PQL.
    Which ones : the new ones because the old ones (SIA1, SSA have their 
own parameters defined allready)
    Which use cases : Data discovery. What we have to describe is a ROI
                                  Data access : selection / cutout ... 
It's no more ROI but something like "Coverage forcing", by sub copying 
the pixels in some region
                                                         Regridding : 
forcing both the coverage and  orientation, sampling. Data are 
transformed. maybe not for today !!

For ROI what we have now is what Markus has described
    1)  POS =    SIZE =    , BAND =       , TIME = .... etc ... But with 
the datatype issue he enhanced
Markus proposes (and Mark agrees if I understand well)
     2) RA_MIN =... RA_MAX = ... DEC-MIN = ... DEC-MAX = ...WL_MIN = 
WL_MAX = TIME_MIN = TIME_MAX = (could be added POLarimetry and FLUX). 
What we have is intervals on all the axes.
  3)  And the STC-S solution should be  ROI = "STC-S String" or 
CUTOUT-coverage = "STC-S string"

    Suppose we limit the scope of these services  to a small subset of 
predefined coordinate systems.
In that case we can avoid most of the difficulties Markus is enhancing.
    The main difference is that whatever the number of axes we want to 
use for a selection we may have a unified syntax with the STC-S 
solution... But has long as we restrict to intervals that's just two 
equivalent ways of writing. In one case the semantic is in the parameter 
name, in the other case it's included in the parameter value.

     The real use cases where there is a difference (asked by Mark) is 
when the ROI (for data discovery) is retrieved from somewhere else as 
the coverage of .... so and so !!
Then it may be different from intervals (polygons, axes coupling, set of 
intervals etc...) and then it makes sense to use STC-S. that makes a 
real different query.
      Mark's objection in that case would be : take the smallest 
interval including this ROI, perform    the discovery query in model 1) 
or 2) and do the selection at the client level, while the STC-S solution 
will allow reusing directly the previously retrieved region in the 
discovery query.

      For cutouts , the CUTOUT-Coverage present the  same alternative. 
If we have, let's say, the coverage of a catalog as a  polygon + TIME 
and Spectral intervals, we may want to ask a cube accessdata service for 
a subcube matching this region. Actually either the client does the 
regular bounding or we just transfer the STC-S string to the access data 
service which will do it.

    In my opinion the STC-S solution is a little richer and may be easy 
to implement as long as we limit it reasonably.

Best regards
François

PS: I am aware that this discussion omit that coverages will in many 
cases be expressed as MOCS, and more and more in the future. But that's 
another story.

Le 24/06/2013 17:53, Mark Taylor a écrit :
> Hi Markus et al.,
>
> On Mon, 24 Jun 2013, Markus Demleitner wrote:
>
>> Dear DAL list,
>>
>> For those just coming in or wondering what the fuss is about, see a
>> little example close to the bottom of this mail.
>>
>> The fact that nobody spoke out in favour of atomic parameters so far
>> is quite a heavy downpour on my parade, not to speak of my thunder
>> the unexplained disappearence of which I regret.
>>
>> Still, since I believe this is an important choice and I'm really
>> worried by the SSAP precedent, I'll try again once more, and again
>> with a diatribe bespeaking my secret love for the  humanities, at
>> least through its length.  If then, still, nobody shows signs of
>> starting to agree with me, I'll shut up, ok?
> I have not really been following DataLink, but since Markus has
> asked so nicely for opinions on this I will give mine, to the
> extent that I have some.
>
>
> 1. I agree that as far as I can see STC-S makes more sense as a
> write-only format than as a language for communicating coordinate
> information between machines who want to do something non-trivial
> with it.  If what you want to do is to record
> some coordinate metadata in a well-defined way, I'm prepared to
> believe that STC does a good job (I don't put it any stronger than that
> since I don't have a good understanding of the underlying science).
> But as Markus says, it's clearly not possible to write a component
> which will take an arbitrary STC specification and turn it
> into some usable coordinates (PLUTO reference positions etc).
> So if STC-S text is going to be used as a value passed as a
> protocol parameter, either there has to be careful thought given
> to how it's going to be restricted, or it's going to be guesswork
> for a given client/user whether a particular STC-S string will
> make any sense to the service.  It's not a good starting point.
> If people are keen to do it this way, I think the following
> questions have to be considered explicitly:
>
>     - What subset of STC-S would be permitted?
>     - Does such an STC-S subset do a better job (in terms of how easy
>       it is to handle by clients/servers, and how expressive it is)
>       than much simpler options like _MIN/_MAX?
>
>
> 2. I strongly believe that keeping protocols simple is a very good
> thing.  Paying for less complexity in the protocol by having to do
> more work in the software using it (on either or both client or
> server side) is nearly always a bargain worth making (my main
> justification for this is that client and server implementations
> can be, and often are, changed when it becomes clear they are
> not working well, but standard protocols stick around for years
> to plague you; I don't think I need to provide examples).
> If the protocol really needs expressiveness which can't be obtained
> by making it simple, and that can be justified, well so be it,
> but don't put stuff in just because it looks neat.  Of course that
> in itself leaves open the question of what counts as a simple
> protocol - but to me something with a bunch of _MAX/_MINs, while
> messy, is more comprehensible than essentially pulling in
> a standard as involved (and, indeed, currently non-existent)
> as STC-S with some as yet unspecified list of appropriate
> restrictions.
>
> To be a bit more concrete, here is a way you might trade complexity
> in the protocol for complexity in the implementation.  Suppose you
> care about enabling the capability for a software tool to acquire a
> weird-shaped cutout of something or other.  Here are a couple of
> ways you might go about that:
>
>     A: Define the protocol to accept STC-S-with-weird-shapes capability.
>        When the client wants a weird shape, it assembles the appropriate
>        STC-S string, passes it to the service, and gets the cutout back.
>        Job done.  There needs to be thought given up front to what
>        subset of STC-S is permitted (how weird is allowed).
>        Implementing the services is then pretty hard because they all
>        have to understand STC weird shapes.  However the client's job
>        is easy (well, maybe; is that STC-S string generated by input
>        from a well-educated user, or from elsewhere?  Are the services
>        actually all going to implement it right or will some just fail?)
>
>     B: Define the protocol to take a bunch of dumb MIN/MAX pairs.
>        When the client wants a weird shape it acquires some understanding
>        of the weird shape, translates it into an N-dimensional
>        rectangular bounding box, sends the corresponding query to the
>        service, gets the result and then manipulates it locally to
>        turn it into the requested shape for presentation to the user.
>        Service implementation is easy (hence probably done right)
>        but the client may have to do some work and probably ends
>        up throwing away some of the data it gets from the service.
>
> I vote B.  Reasons include (i) clients don't have a language imposed on
> them for thinking about weird shape geometry (and services don't have
> to think about it at all) and (ii) I bet most clients don't even want
> to do weird shapes so the hard work of A is wasted most of the time.
> I would only vote A if I was persuaded that weird shapes are a common
> use case *AND* that you need to ship a lot more bytes over the wire
> to transfer a rectangular box than for common weird shapes.
>
> You could also do:
>
>     C: Define the protocol to take a very restricted STC-S that
>        just lets you specify RA and Dec (or something a bit more
>        complicated) so it's easy for services to implement.
>
> But is there really a point to that - does it buy you something that
> you can't get from MIN/MAX pairs without (even restricted) reference
> to a big complicated standard?  Well, maybe, but I'd like to see
> use cases.
>
>
> The two points above I consider to be broadly in support of Markus's
> original message.
>
>
> 3. Markus said this:
>
>> On Declaring Protocol Parameters
>> ================================
>>
>> Knowing full well I'm sounding like a broken record: This is what all
>> this is really about.  We *must* define our services such that the
>> knowledge of the protocol together with whatever service metadata we
>> specify lets a (machine) client discover how valid requests to the
>> service are constructed (i.e., in particular what parameters are
>> supported and what literals are expected in each parameter).  Bonus
>> points if the client can suggest values that actually return values
>> to the user to alleviate the horror vacui in front of an interface
> I'll go with the bonus points part, but I actually disagree with
> the main assertion.  I am not persuaded that it's necessary to
> define a language as part of DataLink or other similar protocols
> to describe to machines what counts as a valid query as regards
> custom parameters.  For custom parameters (ones for which the
> semantics are not specified in the standard) it will usually be
> a human entering the value, so attaching human-readable metadata
> that conveys this information would do the job at least as well.
> That is, I would favour
>
>     <PARAM name="INPUT:REGION">
>       <DESCRIPTION>
>         Region of query; use STC-S v1.33 as described in
>         http://www.ivoa.net/documents/Notes/STC-S/
>         but it's only for regions on the sky, Convexes are not supported,
>         and for goodness sake don't specify a reference position
>         on one of the outer planets.
>       </DESCRIPTION>
>     </PARAM>
>
> over
>
>     <PARAM name="INPUT:REGION" datatype="char">
>       <PARAM name="param-type" value="STC-S"/>
>       <PARAM name="stc:version" value="1.33"/>
>       <PARAM name="stc:shapes" value="PositionInterval,AllSky,Circle,Ellipse,Box,Polygon"/>
>       <PARAM name="stc:axes" value="space"/>
>       <PARAM name="stc:refpos" value="GEOCENTER,BARYCENTER,HELIOCENTER,TOPOCENTER,GALACTIC_CENTER,EMBARYCENTER"/>
>       ...
>     </PARAM>
>
> yes there's all sorts of things wrong with the way I've written that
> example but you get the idea.  The more complicated the
> syntax-specification language is, the harder it is for the machine
> client to (a) understand it and (b) exploit it to inform the user
> in a comprehensible fashion.  They probably won't bother.
>
> For cases where it's straightforward to specify in the protocol
> how to give the user hints about filling in parameter value
> (a list of options is a good example) sure let's do it, but I don't
> think it's incumbent on a standard, or even desirable, to provide a
> language which can describe everything about the allowable values
> in a machine-readable form.  It won't be powerful enough to do
> it properly in any case (e.g. to disallow combinations of parameters
> that don't make sense, cf. over-reliance on XML Schema).
> So, I'm not even sure that the VALUES/{MIN,MAX} elements are
> that much use.
>
>
> Markus also said:
>
>> My take on this: if there's a strong use case requiring those complex
>> types, then it should carry for adding them to VOTable, too; if it's too
>> weak for that, then maybe they shouldn't be in the protocols in the
>> first place.
> Erk!  I'm hoping that this is just by way of reductio ad absurdum
> rather than a genuine suggestion.  If it begins to look otherwise
> I will certainly have things to say about it.
>
>> Note, however, that all the propsed new types (intervals, geometries)
>> would also require extensions to the VOTable VALUES element, e.g.,
>> because, being isomorphic to the R^n, none of them is (meaningfully)
>> orderable, and hence MIN and MAX aren't terribly useful.  Of course,
>> the original sin has been committed there already since we have
>> arrays and complex numbers, for which MIN and MAX aren't well-defined
>> either.
> Related to this, I didn't realise until skimming the SSA document
> just now that VOTable PARAM elements are used to specify non-standard
> parameters in SSA, and I don't know whether it's been decided to
> do that in DataLink or whether that's still up for grabs.
>
> For my money that looks pretty questionable, and possibly based on
> a misunderstanding of the element name "PARAM".  I understood
> PARAM in VOTable to be a contraction of the term "table parameter",
> i.e. an item of per-table metadata, rather than anything to do
> with the parameters in the sense of RPC which tell a service what
> it's supposed to be doing.  Of course the usages are not completely
> unconnected, but the VOTable PARAM element certainly wasn't designed
> for specifying service parameters, and if its capabilities don't
> match the requirements for doing that I'm not very surprised.
>
> Yours not necessarily committing to further deep engagement in DataLink,
>
> Mark
>
> --
> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk  +44-117-9288776http://www.star.bris.ac.uk/~mbt/




More information about the dal mailing list