STC-S (with a view to DataLink)

François Bonnarel francois.bonnarel at astro.unistra.fr
Wed Jun 26 01:15:17 PDT 2013


Hi Markus, all
Le 24/06/2013 11:57, Markus Demleitner a écrit :
> Dear DAL list,
>
> For those just coming in or wondering what the fuss is about, see a
> little example close to the bottom of this mail.
>
> The fact that nobody spoke out in favour of atomic parameters so far
> is quite a heavy downpour on my parade, not to speak of my thunder
> the unexplained disappearence of which I regret.
>
> Still, since I believe this is an important choice and I'm really
> worried by the SSAP precedent, I'll try again once more, and again
> with a diatribe bespeaking my secret love for the  humanities, at
> least through its length.  If then, still, nobody shows signs of
> starting to agree with me, I'll shut up, ok?
>
> (All quotes from mails that went over the DAL list in the last few
> days)
>
> So:
>
> What's this about?
> ==================
>
> (those wanting to look at a concrete example, see below)
>
> François said:
>
>> I would say it's not "STC-S in Datalink" but something like "STC-S
>> in cutout services and access data methods".  This kind of services
>> and methods will be part of the ressources Datalink will attach to
>> Dataproducts indeed, but according to the discussion during
>> Heidelberg interop last month Datalink protocol in itself is only
>> describing the nature , format, type and semantics or descriptions
>> of the links and will say nothing about the  ressources parameters
>> themselves
> Hm -- was that the agreement (I had to be largely in another session,
> sorry)?  If so, I'd find that regrettable, since if I don't know the
> parameters a service takes the link to it isn't terribly useful, is
> it?
Clearly the agreement was that DataLink will describe three type of 
links with a couple of FIELDS including an acref one.
     - Fixed URLS . No STC-S syntax has to be defined for that. But It 
may happen that STC-S (or atomic parameters) are embedded in the URL. 
DataLink in itself doesn't say anything about that.
     - Free defined services. That's where an autodescription of the 
parameters is to be given. Laurent proposed a mechanism in the note he 
wrote with Mireille and me. This has been presented in the DAL session, 
but there has been no time to really discuss this. Because these 
services  are free, you can have atomic parameters there, buit that's 
not defined by the IVOA
      - IVOA services: That's where the IVOA has something to say and 
this will include how to define Positions, Regions and refering to 
CoordSystems in the query pararameters. But that's the job of these 
protocols to define this, not the job od DataLink in itself.
> Anyway, I seem to remember some session that did contain talk about
> transmitting parameter metadata, and half the point of this whole
> thing is pointing out that we don't know how to do that for
> STC-S-valued parameters.  And we'll need to do that, whether in the
> immediate DataLink response or in a secondary service response; my
> worries aren't really affected by that location.
OK, the location would be cutout service, SIA2 service, new SSA maybe, 
PQL-OBSCore, etc ....
Best regards
François
>
> Why people don't like structured parameters
> ===========================================
>
> Well, from the answers, frankly, I've not been able to pull many
> arguments against what I called "structured parameters", i.e., X_MIN
> and X_MAX for intervals (and possibly more of this type).
>
> In terms of concrete criticism, Pat offered:
>
>> features)!! I do not agree at all with the idea of trying to do everything
>> with primitive datatypes and hordes of parameters.
> Well, at some level you'll have to have those parameters anyway --
> somewhere in your code there'll be "5th component of center of
> sphere".  The question is: Do you, for your protocol, define a
> special serialization (on top of what HTTP/VOTable already give you)
> for combinations/groups of those parameters or don't you?
>
> Of course, there's some value in abstraction, and being able to say
> "This is a 5-Sphere, where this is the center and that is the radius"
> *may* make things simpler -- though I have to say I doubt it's a big
> advantage on a protocol level, and I'd really like to see convincing
> use cases.
>
> By the way, it would of course be a sane way to *implement* a
> DataLink endpoint to de/serialize, say, rectangle objects directly
> from and to collections of HTTP parameters -- I just maintain that
> the serialization doesn't need to know about what a given software
> does with the message, and making it aware of it is a complication
> rather than a simplification.
>
> *If* we decide to define such types, let's not take that lightly lest
> we end up with ambiguous serializations and no ways to define
> capabilities, the domain of the parameter, etc.
>
> I have to confess I was alarmed when I read Pat saying:
>
>> beyond primitive integers and floats and strings. At CADC we have been
>> treating shapes (circles, polygons, etc) and intervals a real datatypes**
>> [...]
>> ** not advocating that we open VOTable up again and add data types
> *That* is exactly the problem.  Take a look at VODataService 1.1.
> There's already three type systems defined in there -- actually,
> it's even a hierarchy with, in total, six members:
>
> (DataType) -- SimleDataType
>     `-- (TableDataType) --  VOTableType
>              `-- (TAPDataType) -- TAPType
>
> -- and that still doesn't reflect the duct-tape that is xtype.
>
> What Pat is suggesting here is, in essence, to add a fourth type,
> DataLinkParameterType, say.  After all, you'll have to declare the
> type of your parameters somewhere (and preferably somewhere in the
> Registry, too), so if we don't extend what we have, we'd have to
> invent something else, hence another type system.
>
> Please don't!
>
> My take on this: if there's a strong use case requiring those complex
> types, then it should carry for adding them to VOTable, too; if it's too
> weak for that, then maybe they shouldn't be in the protocols in the
> first place.
>
> Note, however, that all the propsed new types (intervals, geometries)
> would also require extensions to the VOTable VALUES element, e.g.,
> because, being isomorphic to the R^n, none of them is (meaningfully)
> orderable, and hence MIN and MAX aren't terribly useful.  Of course,
> the original sin has been committed there already since we have
> arrays and complex numbers, for which MIN and MAX aren't well-defined
> either.
>
>
> I'm sorry if I missed other counter-arguments -- if so, would you
> raise them again?
>
>
> Why People want STC-S
> =====================
>
> I *thought* the reason to want STC-S was to allow non-rectangular
> cutouts.  But both François and Doug appeared to imply they didn't
> actually want this; Doug wrote:
>
>> I do think STC-S is a viable way to express a multi-dimensional bounding
>> box or region, for discovery queries and simple cutouts expressed in
>> world coordinates, so long as we limit the complexity.  Just expressing
>> a range of values (or possibly simple region) in each coordinate axis is
>> simple enough.  This much would not be that hard to parse, and could
> Well, for *that* task we don't need to invent serialization/metadata
> declaration on top of what we already have -- _MIN/_MAX is enough and
> more generic since it easily allows axes that in STC-S would be hard
> to describe.
>
> Another argument might have been to allow more or less arbitrary
> reference frames (even positions?) in server input; I've always
> maintained coordinate transformation is either trivial (in which case
> it doesn't merit protocol support) or too hard to perform without
> knowing the science use case (in which case the server can't do it
> anyway).  And indeed, Arnold suggested:
>
>> If we allow STC-S strings to be used to provide the coordinate
>> metadata in a DAL protocol, that protocol's standard can very well
>> state that only ICRS and GALACTIC are allowed for the spatial
>> reference system and that these are required to be 2-D spherical.
> Is *that* worth the complexities of introducing a special
> serialization format?  *That* transformation can be written in two
> lines of awk.  And restricting to 2D spherical seems to severely
> limit what that can be used for anyway -- what would our theory
> people have to say about such a limitation?
>
> As to Pat's (admittedly valid) argument:
>
>> beyond primitive integers and floats and strings. At CADC we have been
>> treating shapes (circles, polygons, etc) and intervals a real datatypes**
>> for a long time now and once you do that all the confusion goes away -- and
> I've already said that there's cases where you want them (ADQL, say)
> -- but that I can't see how DataLink is one of them. Given how hard
> it is to define type systems (including their valid values) sensibly
> and robustly, there should be a really strong reason to expose them
> on a protocol level (as opposed to just using them internally or
> within custom interfaces).
>
> Doug has, in addition:
>
>> back-end processing.  If has the advantage of allowing simple
>> multi-dimensional regions to be specified with a single parameter.
> Is the specification with a single parameter actually a measurable
> advantage?  In what use case does it make a difference if the
> parameter set has a custom serialization (i.e., STC-S) or just the
> normal HTTP www-form-urlencoded serialization?
>
> Again, if I've neglected some argument, please do tell me off and
> maybe try making your point again.
>
>
> On Declaring Protocol Parameters
> ================================
>
> Knowing full well I'm sounding like a broken record: This is what all
> this is really about.  We *must* define our services such that the
> knowledge of the protocol together with whatever service metadata we
> specify lets a (machine) client discover how valid requests to the
> service are constructed (i.e., in particular what parameters are
> supported and what literals are expected in each parameter).  Bonus
> points if the client can suggest values that actually return values
> to the user to alleviate the horror vacui in front of an interface
> like this:
>
>
>      Enter parameters:
>
>      _________________________________
>
>
>                      [Cancel]   [Send]
>
>
> Since that point is so dear to my heart after the SSAP experience,
> let me briefly reply to Doug:
>
>>> This requires a short excursion: I strongly believe we should stop
>>> lying.  We're currently lying when we, as in current SSAP, say something
>>> like<PARAM name="INPUT:BAND" datatype="double" unit="m"...>  in the
>>> service metadata.
>>> What clients are expected to pass in is (for most services) something
>>> like "1e-7/", which clearly is *not* a double literal.  The SSAP spec
>> BAND is an example of a custom datatype much as Pat suggested.  The
>> actual datatype is not double, but ordered rangelist.  List is obviously
> Well, that's the issue.  If you look at the sample metadata response
> from the SSAP standard, it says:
>
>    <PARAM name="INPUT:BAND" value="ALL" datatype="char" arraysize="*">
>        <DESCRIPTION>
>            Spectral coverage: Several values can be combined in a
>            comma separated list. Below values are treated case insensitive.
>            All spectra returned by this service belong mainly to the optical
>            reaching to the infrared regime. Therefore, the other values
>            won't yield any matching records in the query response.
>            Alternatively the wavenlength can be given in meters or as a
>            range thereof.
>        </DESCRIPTION>
>        <VALUES>
>            <OPTION value="ALL"/>
>            <OPTION value="radio"/>
>            <OPTION value="millimeter"/>
>            <OPTION value="infrared"/>
>            <OPTION value="optical"/>
>            <OPTION value="ultraviolet"/>
>            <OPTION value="x-ray"/>
>            <OPTION value="gamma-ray"/>
>        </VALUES>
>    </PARAM>
>
> (p. 61).  That, fortunately, is not quite a lie (we're not saying:
> this is a float), but it's not the whole truth either.  As you can
> see, a client would assume it can use "infrared" and a few others to
> fill that *string* that BAND is and that's it.  There's no way it
> could figure out this is an ordered rangelist.  And is it?  The
> comment appears to suggest otherwise even to humans.  Incidentally,
> the attempt to save parameters in this case opens up new questions --
> should
>
> radio,5e-7/7e-7
>
> be an allowed literal here?  Let's not do things like that again.
>
> And I cannot resist commenting on Doug's observations that lists are
>
>> a very common and indispensible datatype in most high level languages; a
>> range or rangelist is also quite common, and indispensible for many use
>> cases.  In the case of parameters like BAND and TIME, rangelists are
>> required for many use cases as we need to include or exclude selected
> Well, if the part about "indispensable" is true (and I doubt it given
> how few services actually understand and correctly implement the
> syntax, and the fact that of ~4000 "user" SSA queries with BAND I've
> seen here only one contained a comma and none a semicolon), then we
> need to figure out how to declare the syntax and semantics supported
> by a parameter in the metadata response.  That, or we keep clients in
> the dark about what they can and cannot pass to a given service.
>
> But there's a deeper issue here that goes to the fundamentals of
> protocol design: Programming languages are (usually) equivalent to
> Turing machines, and there's a good reason for that (most interesting
> problems need a Turing machine to solve).  Protocols usually are not,
> and there's a host of even better reasons for that (e.g., even
> deciding whether such protocol messages are valid might take an
> arbitrary amount of time and space).
>
> Now, admittedly lists don't shove us across any line here (typically,
> they'd still be in the regular domain), but in principle the argument
> "this is practice in programming language X" is not a good one when
> we're talking about protocols.  To conclude this digession I
> recommend the (for me) eye opening talk "The Science of Insecurity",
>
> http://mirror.fem-net.de/CCC/28C3/mp4-h264-LQ/28c3-4763-en-the_science_of_insecurity_h264-iprod.mp4
>
>
>> spectral regions (or time regions) when filtering data.  Range certainly
>> is a mandatory basic construct, and a rangelist is a trivial extension
>> of the concept.
> Range is easily covered by _MIN and _MAX (as even the SSAP spec
> itself showcases, p. 53).  Rangelist is *not* a trivial extension of the
> concept, though, as warranted by the fact that while ranges
> themselves work marvellously with VOTable data types and
> www-form-urlencoded serialization, rangelists do not (without ugly
> hacks).
>
>
> What's this about, part II
> ==========================
>
> In conclusion, to maybe pull in some passive listeners -- this
> discussion is about declaring protocol parameters.  I'm using the SSA
> way of declaring those; the problem is the same for datalink
> services, so at least in principle the arguments apply.
>
> What I propose is that if you offer cutouts within a spectral data
> cube, you'd say (roughly)
>
> <PARAM name="INPUT:RA_MIN" datatype="double" ucd="pos.eq.ra"
>    unit="deg">
>    <VALUES><MIN>2.3</MIN><MAX>4.2</MAX></VALUES>
> </PARAM>
> <PARAM name="INPUT:RA_MAX" datatype="double" ucd="pos.eq.ra"
>    unit="deg">
>    <VALUES><MIN>2.3</MIN><MAX>4.2</MAX></VALUES>
> </PARAM>
>
> <PARAM name="INPUT:DEC_MIN" datatype="double" ucd="pos.eq.dec"
>    unit="deg">
>    <VALUES><MIN>-78</MIN><MAX>-77</MAX></VALUES>
> </PARAM>
> <PARAM name="INPUT:DEC_MAX" datatype="double" ucd="pos.eq.dec"
>    unit="deg">
>    <VALUES><MIN>-78</MIN><MAX>-77</MAX></VALUES>
> </PARAM>
>
> <PARAM name="INPUT:SPEC_MIN" datatype="double" ucd="em.wl"
>    unit="m">
>    <VALUES><MIN>4e-7</MIN><MAX>7e-7</MAX></VALUES>
> </PARAM>
> <PARAM name="INPUT:SPEC_MAX" datatype="double" ucd="em.wl"
>    unit="m">
>    <VALUES><MIN>4e-7</MIN><MAX>7e-7</MAX></VALUES>
> </PARAM>
>
> -- no magic, all of the stuff exists and can be readily used in
> VOTable, the clients can figure out the complete physics, and the
> standard could still say "If you have declinations, your parameter
> must be called DEC" if we want.  This XML is simple to generate, the
> messages described are parsed by your HTTP library.
>
> If you actually want to tell min and max apart without looking at the
> names (which I think would be perfectly all right), it would be
> trivial to add, say, meta.max and meta.min as UCD words (we shouldn't
> use stat.min and stat.max here).
>
> If you want full STC metadata on this, there's the STC-in-VOTable
> note immediately applicable here.
>
> If you want more structure, you could still have (say)
>
> <GROUP name="INPUT:RA_INTERVAL">
>    <PARAMRef name="INPUT:RA_MIN"/>
>    <PARAMRef name="INPUT:RA_MAX"/>
> </GROUP>
>
>
> The STC-S solution would, as far as my imagination goes, look
> something like this:
>
> <PARAM name="INPUT:stcregion" datatype="char" arraysize="*">
>    ???<some magic>  ???
> </PARAM>
>
> <some magic>  would then tell a client that, in this case, a string
> like
>
> Union ICRS Circle 2.2 3.4 4.5 PositionInterval 1.2 2.2 2.3 4.5
>    Position 3.4 4.5 unit deg size 0.001 0.001
> SpectralInterval TOPOCENTER 4000 6000 unit Angstrom PixSize 2
>
> would probably be meaningful to the server, whereas
>
> TimeInterval MJD 4567 6384
> Box CART3D ICRS 0 1 2 3 4 5 unit km
>
> probably would not.
>
>
> And yes, we might cut and crop STC-S to make this a feasible problem.
> But then STC-S would no longer work as a fairly human-graspable way
> to input STC specifications, which would be a grave collateral damage
> at least for me (see also the original mail of this thread).
>
>
> Conclusion: Please, everyone involved in the DataLink effort, think
> hard if you need STC-S or any complex geometry at all.  And if you
> find you have to, think hard on how to declare valid literals,
> ranges, and all that.
>
> Cheers,
>
>          Markus
>
>
>



More information about the dal mailing list