SODA, remaining gripe collection

Fri Apr 8 10:48:04 CEST 2016

Hi Markus,

So this second reply focuses on the 'nerd' portion of your email :) 

> -----Original Message-----
> From: dal-bounces at ivoa.net [mailto:dal-bounces at ivoa.net] On Behalf Of
> Markus Demleitner
> Sent: Wednesday, 30 March 2016 1:21 AM
> To: dal at ivoa.net
> Subject: SODA, half-client and remaining gripe collection
> 
> Dear Colleagues,
> 
[snip]

> II. Technics
> ------------
> 
> 
> (3) RA and DEC
> 
> The stylesheet really needs to know where it is looking in order to generate a
> UI.  The easiest way to achieve that is by adding RA and DEC parameters.  True,
> this is a bit tricky at the stitching line (easily solved by allowing RA<0) and at
> the pole, which is ugly, but unavoidable with POS, too, since RA and DEC is
> essentially just a rationalisation of POS' RANGE.
> 
> The alternative would be Pat's CIRC and POLY from his Feb 29th mail
> http://mail.ivoa.net/pipermail/dal/2016-March/007370.html, except that
> 
> > Pat:
> > For CIRC and POLY the service includes a "maximum sensible extent"
> > with which to perform cutouts.
> 
> I'm not sure I care a lot about the maximum sensible extents.  The circle,
> however, would let you say where there is data, just as for RA/DEC.  So, if
> there's really a good reason against RA and DEC and they don't make it to the
> standard, I could see an escape by giving actual per-element limits in CIRCLE,
> perhaps like this:
> 
>     <PARAM name="CIRC" datatype="double" ucd="obs.field"
>       unit="deg" xtype="circle" arraysize="3" value="">
>        <VALUES>
>           <MIN value="230.4 32.1 0.1" />
>           <MAX value="236.4 34.7 3.5" />
>         </VALUES>
> 
> -- which would say "this is a dataset covering RA 230.4 .. 236.4, DEC
> 32.1 .. 34.7, and you can choose cutout radii between 0.1 and 3.5".
> Which would be ok with me, except we're probably violating VOTable.
> I think all existing practice is that MIN and MAX are not array-valued but give
> the values for the individual items of an array (not sure if the standard itself is
> precise enough there).
> 
> Be that as it may, I'd still much prefer
> 
>       <PARAM arraysize="2" datatype="double" name="DEC" ucd="pos.eq.dec"
>         unit="deg" value="" xtype="interval">
>         <DESCRIPTION>The latitude coordinate</DESCRIPTION>
>         <VALUES>
>           <MIN value="32.4"/>
>           <MAX value="34.7"/>
>         </VALUES>
>       </PARAM>
>       <PARAM arraysize="2" datatype="double" name="RA" ucd="pos.eq.ra"
>         unit="deg" value="" xtype="interval">
>         <DESCRIPTION>The longitude coordinate</DESCRIPTION>
>         <VALUES>
>           <MIN value="32.1"/>
>           <MAX value="34.7"/>
>         </VALUES>
>       </PARAM>
> 
> -- which uses VOTable  as in common practice and otherwise requires zero
> additional definitions.  Also, it's probably what 95% of users and implementors
> would ask for anyway.
> 
> Any elucidation as to why we shouldn't just do it would be appreciated.
> 

This approach would certainly work for many cases but in cases where we have an image that covers either of the celestial poles I don't think it could accurately describe the extent. 

The simple outer bounds approach also has problems as images away from the celestial equator will not be square when projected into the equatorial coordinates. See https://confluence.csiro.au/display/~dem040/Equatorial+Coordinates for an example of the effect. That simply means that for larger images you could specify a cutout which is within the listed bounds but does not intersect with the image. 

> 
> (4) the async/multiple param thing
> 
> I've already said something about having multiple values for a single param
> before, but as Pat mentioned it in his mail
> http://mail.ivoa.net/pipermail/dal/2016-March/007372.html let me just briefly
> comment on one thing I couldn't help thinking:
> 
> > Pat wrote:
> >
> > input file I can use with curl to post multiple positional cutouts
> > (note the mix of POS, CIRC, and POLY):
> >
> > ===multicut.txt===
> > POS=circle 140 0 0.1&
> > POS=circle 66 10 20&
> [...]
> 
> Hm... If you found yourself writing your parameters to a file anyway -- why not
> then do the multi-cutout by just writing one set of parameters per VOTable line
> -- it'd be much simpler and more predictable all around
> -- and really not hard to implement (with some trivial conventions).
> 
> But anyway: Perhaps we should defer the whole multiple param thing until the
> next version?  I see some urgency for enumerated params, but perhaps even
> that can still wait.
> 

We’ve found the multiple parameters extremely useful for scripting. Working within the UWS standard we provide a set of parameters endpoints which allow the parameters to be built up and then evaluated as a set at runtime. The client can then specify a set of IDs and a set of cutout criteria and get any cutouts which satisfy the criteria. I’ll share some of those scripts next week. All of our cutouts are currently provided by async services only as we may need to pull the files in from tape if they haven’t been used in a while. 

> 
> (5) @value and @ref semantics documented for PARAMs
> 
> I've found the inclusion of "contant" params makes my life a bit easier.
> So, in my service there's
> 
>   <PARAM name="ID" [...]
> 
>        value="ivo://org.gavo.dc/~?lswscans/data/part2/Walz/FITS/D774.fits">
> 
>        <DESCRIPTION>The pubisher DID of the dataset of interest</DESCRIPTION>
> </PARAM>
> 
> with the understanding that clients should just propagate on that PARAM (the
> XSLT does, in input type="hidden").
> 
> One could do this by manipulating the accessURL, but I'd feel a bit better if we
> were explicit about these things being params -- e.g., it might be a nice thing to
> show informationally in UIs.
> 
> Also, I believe the PARAM/@ref semantics that's introduced in datalink
> somewhat off-handish should be defined a bit more precisely.  Perhaps this
> would be an erratum to Datalink.  I guess we would need to say something like
> 
>   If PARAM/@ref points to a FIELD, the parameter's values must be taken
>   from the corresponding column of the table embedding the FIELD.  An
>   appropriate widget type in a UI could be a select box.
> 
>   [<Do we want this?> If PARAM/@ref points to another PARAM, that param's
>   value is simply copied literally]
> 
>   A PARAM/@ref pointing to any other VOTable element is an error.
> 

We've used something similar, but a param seems a good approach, so long as datalink responses are restricted to a single ID.

> 
> (6) @value="" universally valid
> 
> VOTable 1.3 says "If the TD element is empty (<TD/> or <TD></TD>) the cell is
> considered to contain no data, i.e. to be null."  Since we're saying "parse
> PARAM/@value like you parse TABLEDATA cells" (I really think VOTable itself
> should say this in sect. 4.1, last bullet point), I
> *think* we're fine saying "Generate your UI from all the PARAMs that have
> value="" (which, in effect, we're doing right now and we'd certainly be doing if
> we adopt (5).  I'd much prefer if that were made explicit somewhere.  Bug me
> for a doc patch if everyone interested agrees.
> 

Yes that sounds good to me.

> 
> (7) Behaviour for queries with only ID given
> 
> I think we should leave open at this point what should happen if a SODA service
> is called with an ID only.  While people have suggested it should return some
> sort of "self-description", I'd say it doesn't make sense to replicate what
> datalink already does (to at least my close-to-perfect satisfaction), and I don't
> think it makes much sense to run both SODA and datalink on the same endpoint
> (in which case you'd have that behaviour already).
> 
> After these considerations, my services just return the full dataset when you
> just pass in ID, which I think is the behaviour one would naively expect ("no
> constraints).  But we might still find that's actually not such a good idea.
> 
> Hence, I'd say we explicitly say "This is undefined behaviour at this point.
> Clients must not have any expectations as to what is returned."
> 

Our service does the same - no cutout params means the whole image/cube is returned. This could easily be moved to a different endpoint if the standard disallowed it though.

> Same thing for requests missing all parameters.  I doubt there's something
> smart we can do with that, but who knows?  Let's just keep it undefined for
> now.

With async we do allow a request to be built up over time, so conceivably you could create it without an ID and add it later. If the job is started without an ID then it will go straight to error.

Cheers,
James Dempsey.