SODA gripes (4): Mandatory multiplicities considered harmful

Tue Feb 9 14:33:10 CET 2016

Dear DAL folks,

While I'd still appreciate comments regarding the proposed
explanatory chapter -- see Gripe (3), 
http://mail.ivoa.net/pipermail/dal/2016-January/007281.html --
(and I still suspect it's useful to skim over that stuff to
understand what's being discussed here all the time), here's my next
gripe (it's not so much time until Cape Town any more).

There's a TL;DR below.

This is about mandating parameter multiplicities.  In case you were
wondering, this means text like this:

  In general, filtering parameters are single-valued in \{sync\}
  requests and multi-valued in \{async\} requests (exceptions noted
  below). When multiple values of filtering parameters are used in an
  \{async\} job, each combination of values produces zero or one
  result.

and then, nevertheless, for every parameter, stuff like:

  The POS parameter is single-valued for \{sync\} requests and
  multi-valued for \{async\} jobs.

I propose to strike all such language.  In a section on general rules
for parameters, there could be text like:

  This specification does not constrain the behaviour of services in
  the presence of repeated parameters.  For enumerated parameters
  (i.e., those with \xmlel{OPTIONS} in \xmlel{VALUES}), clients should
  display widgets allowing the selection of zero or more of the
  options available.  Services must therefore not fail when receiving
  multiple values even for single-valued enumerated parameters and
  discard all but one of the parameters passed.

Yes, it's suboptimal (but see below), but I think we can't really do
better at this point.

Rationale:

Whether or not it makes sense for a service to accept repeated
parameters (i.e., stuff like OBJECT=alp%20Cyg&OBJECT=bet%20Cyg) is
highly dependent on the service and on the nature of the parameter.
If we try to mandate behaviour in the standards text, we'll only
generate lots of non-compliant services.

Also, the implementation effort typically increases fairly
significantly when handling sequences (for my datalink
implementation, it was about 1.5 when allowing multiple values of ID
in; in the datalink XSLT client, dealing with the results of multi-ID
queries is still unsolved; the multiple-ID rule in Datalink precludes
using pre-generated files to serve responses[1]).

So, we should have a very good idea why we want this, and I don't
think we have that.  Indeed, given the wide range of SODA
applications (whether already operational, or specified, or
envisaged), I think we cannot.

As the existing language (see above) on what to do in the presence of
multiple multiple values -- e.g.,

  POS=CIRCLE 1 3 3&BAND=3e-7 4e-7&POS=CIRCLE 4 5 3&BAND=1e-7 2e-7

-- shows, not even the semantics are straightforward (guess what this
does, then try to figure out what really should happen according to
the standard (I don't believe there's a service doing this right now,
though)).  In that respect, I think *if* we really want "batch
processing" in SODA, we should go for a much more straightforward
way: upload a VOTable with one set of parameters per line.  No
combinatorial explosion, minimal specification effort.

But I doubt all this is even very useful as specified now -- the
plan, if I understand correctly, is that the results of such batch
operations would appear as separate results in a UWS document (this
would need much more explanation if we really go there).  That,
however, means that there's still one request per processed document,
so the actual savings in overhead or whatever are probably fairly
small.

So: It's not evidently useful, certainly not necessary to cover the
CSP requirements, I don't think anyone has implemented it, it's hard
for the clients.  Let's simply not say anything (except the very
general language proposed above) without serious prototyping.[2] 

However (additional proposal):

While I think mandatory multiplicities are a pain that will lead to
massive non-interoperability if it were ever taken up, I think it'd
be really useful if services announced which of their parameters can
actually be repeated.  That's important for clients to really produce
widgets properly guiding the user (e.g., only allowing one selection
for FORMAT but allowing multiple selections for OBJECT).  This could
also be a basis to allow multi-cutouts where they can usefully be
implemented (perhaps turning a long spectrum into a short SED, or
something producing an archive of little things).

In an ideal world, we'd have PDL with sufficient capabilities
formulated in VO-DML ready now.  That would be enough to have an
expressive and (for machines) easily interpretable annotation and
would solve several other problems with annotating parameter sets
(e.g., "if you give a range for PIXEL_3, you cannot give a range for
LAMBDA").

With a bleeding heart I'll concede that's something we'll have to
postpone to version 1.1.

While I'm sure a proper parameter DM is where we need to go, even now
we could, as a stopgap measure for this relatively important use
case, prescribe some ad-hoc annotation for repeatable params.
Looking at the VOTable spec, I'd say there are four relatively
non-destrucive ways we could do this:

(1) hog the utype attribute of the param
  <PARAM name="OBJECT" ... utype="temporary:repeatable"/>
  (this would be my favourite; I don't think PARAM/@utype will be
  used for anything else in future versions of SODA; even when VO-DML
  still used @utype, "legacy" utype attributes were left alone)

(2) use an immediate group:

  <PARAM...>
  <PARAM...>
  <PARAM...>
  <GROUP utype="temporary:repeatable">
    <PARAM...>
    <PARAM...>
 </GROUP>

 (that's a bit of a pain for the service)

(3) use group referencing:

  <GROUP utype="temporary:repeatable">
    <PARAMref ref="a"/>
    <PARAMref ref="b"/>
  </GROUP>
  <PARAM...>
  <PARAM id="a"...>
  <PARAM...>
  <PARAM id="b"...>
  <PARAM...>

 (that's a bit of a pain for the service)

(4) use LINK

<PARAM ...>
  <LINK content-role="adhoc-annotation"
    >ivo://ivoa.net/std/SODA#repeatable-param</LINK>
</PARAM>

I'm not terribly smitten with any of this.  

So, my preference remains for someone to fix up VO-DML and PDL for
version 1.1.  When there's no solution for the multiplicities problem
in 1.0, perhaps there's more pressure to actually make PDL-in-VO-DML
happen.

TL;DR: Services should have the right to decide on multiplicities
themselves.  It'd be nice if we gave clients some way to figure out a
given service's decision reliable, but I suspect we've been too lazy
these recent years in VO-DML and PDL to make it happen properly for
1.0.

Cheers,

         Markus

[1] You may guess that I'd much rather get rid of the multiple-ID
thing in datalink services.  That's true.  I'll shout as 1.1 comes
around.

[2] As far as I am concerned, we could simply strike async completely
and be done with it.  I don't think anyone could implement async
based on what's in the spec.  But that's another thing, and I don't
have it on my agenda right now.  Has anyone really tried async
SODA?  I'd be curious to compare if we came out with the same
choices...

PS: Preview on future gripes (sequence TBD):

() Spatial coverage discovery and the RA and DEC parameters
() Pixel coutouts: PIXEL_n
() Behaviour for no-ID queries?  For queries with only ID?
() POS doesn't have an xtype
() Examples stuff: example example, and perhaps a dl-id term?