SODA, half-client and remaining gripe collection

Tue Mar 29 16:21:28 CEST 2016

Dear Colleagues,

Sorry that this is again a rather long mail, but it should be the last
of these monsters on SODA for a while.  Also, it's split into two
sections, the first of which I'd ask everyone with some interest in SODA
to at least have a glance at.  The rest is for the nerds.

I. SODA Prototype
-----------------

So, building on the XSLT hack for datalink I reported on in Syndey (see
below for details) I've now done a makeshift SODA client so you can try
out how these things could work (and see where we should work on the
client side).  

To follow these instructions, start TOPCAT and Aladin.

(1) Discovery

We're using obscore; so, in TOPCAT open the VO/TAP dialog, double click
the "GAVO DC TAP" entry you're seeing.

Use

  select * from ivoa.obscore 
  where 
  dataproduct_type='cube'
  and obs_collection='CALIFA'

as your query.  Or, really, anything that'll return one of these cubes
(also see below, at (4) Other data).

(2) SODA

There's not yet a cube-enabled SODA client, so I'm using a bit of XSLT
and javascript that lets you use SODA results in the browser.  To use
it, in TOPCAT's main window click "Activation action" and check "View
URL as a Web Page".  Select "access_url" as "Web Page Location column"
and "system browser" (or anything but "basic browser") as the Browser
type.  "Ok" the dialog.

Now open a plot or table display for your obscore result.  When you
click on a row or point, a web browser page opens with a SODA dialog.
If you have javascript disabled, you can use a conventional form
interface, where the thing at least tells you what you can enter.  This
particular service supports BAND and POS from the current SODA draft
parameters, and in addition RA and DEC (which I claim we can't really do
without, see the nerd section).

There's also some additional parameters in there that might be up for
later standardisation -- don't worry about them now.  Up to now, that's
all datalink.

If you enable Javascript, you'll get some SODA magic: 

(a) there's a spatial cutout overlaid on a sky preview courtesy of the
Aladin image server.  Use a click-and-drag rubberband to determine your
spatial cutout.

(b) You'll get a custom BAND widget that lets you do the cutouts in your
chosen units.  For instance, to get the vicinity of H alpha, switch to
Ångström and enter 6560 and 6564 there (or something else if you got a
cube that doesn't have Halpha -- you'll see that readily from the
limits).  Yes, the formatting of the limits in that widget is suboptimal
at this point.  Visual improvements forthcoming.

Notice how changing these "custom" widgets updates the SODA parameters
(in the case of the spatial units, I could change POS, but since I have
RA and DEC anyway, I use this).

(3) Retrieval

Hit "broadcast dataset via SAMP" and inspect your cutout in Aladin (or
DS9, for that matter).  Or hit "Retrieve data" to download the cutout.

That's about it; no, I'm not claiming the XSLT-thing is more than a
proof-of-concept client.  But still, I think you can get an idea how the
pieces can fit together.

(4) Other data

This happens to be useful for other data, too.  For instance, I have
this collection of plate scans that are a Gigabyte a pop.  With SIAP, I
only handed out cutouts, but that's not an option with obscore, so I've
always been a bit worried.  Now, I'm handing out datalinks for those.
So see how things work out for these images (in effect, degenerate
cubes), try a discovery query like

  select * from ivoa.obscore 
  where 
  t_min<gavo_to_mjd('1925-01-01')
  and target_name='M42'
  order by t_min desc

If you don't want to configure the activation action, you can simply cut
the access URL in TOPCAT with a quadruple click and paste it somewhere
else.

And that concludes the part of the general public.  Nerds, please read
on.

II. Technics
------------

(1) The stylesheet

First off, the stylesheet I'm using is public.  I've put it on github
and I'd like it very much if we could share development there; even if I
don't hope datalink in the browser is going to be a big thing, it's
certainly a good thing to have.  So, please go ahead and 

  git clone https://github.com/msdemlei/datalink-xslt

It should already work for other datalink services.  In principle,
prepending something like 
<?xml-stylesheet href='/static/xsl/datalink-to-html.xsl' type='text/xsl'?>
should be enough; but there's a same-origin policy for XSLT, too, so
you'll have to have a local checkout.

For the javascript-based SODA, some javascript is pulled in that ideally
should come from your site, too.  How to parameterise all that and how
to pack up a distribution are things I'd love to work out with you.  As
is, the javascript retrieval will fail, but that's mainly a matter of
figuring out where to sensibly put all these resources.

Also, you'll need to arrange that browsers see the text/xml media type
for them to apply the stylesheet (see
http://wiki.ivoa.net/internal/IVOA/InteropOct2015DAL/datalink-xslt.pdf)[1].

(2) Standards issues

I guess I won't be able to spend as much time on SODA as I have in
January and February until Cape Town, so contrary to my original plan
I'll dump the remeining major gripes here all in one go.  So, the big
one really is:

(3) RA and DEC

The stylesheet really needs to know where it is looking in order to
generate a UI.  The easiest way to achieve that is by adding RA and DEC
parameters.  True, this is a bit tricky at the stitching line (easily
solved by allowing RA<0) and at the pole, which is ugly, but unavoidable
with POS, too, since RA and DEC is essentially just a rationalisation of
POS' RANGE.

The alternative would be Pat's CIRC and POLY from his Feb 29th mail 
http://mail.ivoa.net/pipermail/dal/2016-March/007370.html, except that 

> Pat:
> For CIRC and POLY the service includes a "maximum sensible extent" with
> which to perform cutouts.

I'm not sure I care a lot about the maximum sensible extents.  The
circle, however, would let you say where there is data, just as for
RA/DEC.  So, if there's really a good reason against RA and DEC and they
don't make it to the standard, I could see an escape by giving actual
per-element limits in CIRCLE, perhaps like this:

    <PARAM name="CIRC" datatype="double" ucd="obs.field" 
      unit="deg" xtype="circle" arraysize="3" value="">
       <VALUES>
          <MIN value="230.4 32.1 0.1" />
          <MAX value="236.4 34.7 3.5" />
        </VALUES>

-- which would say "this is a dataset covering RA 230.4 .. 236.4, DEC
32.1 .. 34.7, and you can choose cutout radii between 0.1 and 3.5".
Which would be ok with me, except we're probably violating VOTable.
I think all existing practice is that MIN and MAX are not array-valued but
give the values for the individual items of an array (not sure if the
standard itself is precise enough there).

Be that as it may, I'd still much prefer

      <PARAM arraysize="2" datatype="double" name="DEC" ucd="pos.eq.dec" 
        unit="deg" value="" xtype="interval">
        <DESCRIPTION>The latitude coordinate</DESCRIPTION>
        <VALUES>
          <MIN value="32.4"/>
          <MAX value="34.7"/>
        </VALUES>
      </PARAM>
      <PARAM arraysize="2" datatype="double" name="RA" ucd="pos.eq.ra" 
        unit="deg" value="" xtype="interval">
        <DESCRIPTION>The longitude coordinate</DESCRIPTION>
        <VALUES>
          <MIN value="32.1"/>
          <MAX value="34.7"/>
        </VALUES>
      </PARAM>

-- which uses VOTable  as in common practice and otherwise requires zero
additional definitions.  Also, it's probably what 95% of users and
implementors would ask for anyway.

Any elucidation as to why we shouldn't just do it would be appreciated.

(4) the async/multiple param thing

I've already said something about having multiple values for a single
param before, but as Pat mentioned it in his mail
http://mail.ivoa.net/pipermail/dal/2016-March/007372.html let me just
briefly comment on one thing I couldn't help thinking:

> Pat wrote:
>
> input file I can use with curl to post multiple positional cutouts (note
> the mix of POS, CIRC, and POLY):
> 
> ===multicut.txt===
> POS=circle 140 0 0.1&
> POS=circle 66 10 20&
[...]

Hm... If you found yourself writing your parameters to a file anyway --
why not then do the multi-cutout by just writing one set of parameters
per VOTable line -- it'd be much simpler and more predictable all around
-- and really not hard to implement (with some trivial conventions).

But anyway: Perhaps we should defer the whole multiple param thing until
the next version?  I see some urgency for enumerated params, but perhaps
even that can still wait.

(5) @value and @ref semantics documented for PARAMs

I've found the inclusion of "contant" params makes my life a bit easier.
So, in my service there's

  <PARAM name="ID" [...]

       value="ivo://org.gavo.dc/~?lswscans/data/part2/Walz/FITS/D774.fits">

       <DESCRIPTION>The pubisher DID of the dataset of interest</DESCRIPTION>
 </PARAM>

with the understanding that clients should just propagate on that PARAM
(the XSLT does, in input type="hidden").

One could do this by manipulating the accessURL, but I'd feel a bit
better if we were explicit about these things being params -- e.g., it
might be a nice thing to show informationally in UIs.

Also, I believe the PARAM/@ref semantics that's introduced in datalink
somewhat off-handish should be defined a bit more precisely.  Perhaps
this would be an erratum to Datalink.  I guess we would need to say
something like

  If PARAM/@ref points to a FIELD, the parameter's values must be taken
  from the corresponding column of the table embedding the FIELD.  An
  appropriate widget type in a UI could be a select box.

  [<Do we want this?> If PARAM/@ref points to another PARAM, that param's
  value is simply copied literally]

  A PARAM/@ref pointing to any other VOTable element is an error.

(6) @value="" universally valid

VOTable 1.3 says "If the TD element is empty (<TD/> or <TD></TD>) the
cell is considered to contain no data, i.e. to be null."  Since we're
saying "parse PARAM/@value like you parse TABLEDATA cells" (I really
think VOTable itself should say this in sect. 4.1, last bullet point), I
*think* we're fine saying "Generate your UI from all the PARAMs that
have value="" (which, in effect, we're doing right now and we'd
certainly be doing if we adopt (5).  I'd much prefer if that were made
explicit somewhere.  Bug me for a doc patch if everyone interested agrees.

(7) Behaviour for queries with only ID given

I think we should leave open at this point what should happen if a SODA
service is called with an ID only.  While people have suggested it
should return some sort of "self-description", I'd say it doesn't make
sense to replicate what datalink already does (to at least my
close-to-perfect satisfaction), and I don't think it makes much
sense to run both SODA and datalink on the same endpoint (in which case
you'd have that behaviour already).  

After these considerations, my services just return the full dataset
when you just pass in ID, which I think is the behaviour one would
naively expect ("no constraints).  But we might still find that's
actually not such a good idea.

Hence, I'd say we explicitly say "This is undefined behaviour at this
point.  Clients must not have any expectations as to what is returned."

Same thing for requests missing all parameters.  I doubt there's
something smart we can do with that, but who knows?  Let's just keep it
undefined for now.

So -- that's it from me.  There's nothing else really relevant up my
sleeve on SODA.

Ain't that nice?

       -- Markus

[1] There's another technicality: client-side XSLT plus DOM operations
plus javascript is a mixture which current browsers tend to be a bit
buggy with.  It's  bit like in the 90ies with Javascript only.  I
therefore do server-side XSLT right now if I guess a Datalink request
comes from a browser.  I think with a bit of experimentation and
restriction to known-good parts we can push XSLT to the client again,
but I'll invest that work only if I know others will re-use the result.