SODA gripes (1): The Big One

Tue Jan 5 18:47:33 CET 2016

Dear all,

     This is a long email and only a partial answer to what Markus is 
saying. SODA service specification drives a lot about overall conception 
of what the future landcsape of DAL services will be. So sorry about 
that length, but I think it's necessary and will be followed by other 
emails.

    happy new year anyway. Full sucess in your VO work (and any other 
personal or business activities) :-) :-) ;

A )  To summarize my concerns with Markus approach below I could say the 
following.
       a ) It is true that the main point of discussion is about the 
descriptions of the PARAMETER domains mainly when it is not directly 
available in the client (for example via the metadata provided by the 
discovery phase). And also that in the case of custom parameters (as 
well as it would be for custom services parameters) there is nothing 
that could be discoverable.
       b )  My point is that it is possible to postpone the solution of 
that use case FOR NOW for three reasons:
             1 ) The current draft allows to fulfill  the basic 
requirements of the CSP  in 95% of the cases. We can wait next version 
of ObstAP/DALI/SIAV2 and SODA to solve the remaining 5%. This is the 
point I develop below (B and C).
             2 ) the discussion on the solution proposed in Markus' 
version of the WD promise to be long. I am strongly against some of the 
involved features. This includes proposing a concurrent technology for 
describing the domains  as we have allready the description in the 
Obscore table. This also includes describing the data content  in the 
DataLink response which should be agnostic to the content of the dataset 
to which it is relating resources. I will develop this in another email
             3 ) the current draft is totally open on future evolution 
on this point. It may be consistent with the solution proposed by Markus 
and with the one I have in mind. It may also be consistent with any 
other solution which could emerge from the WG discussion.  Adopting 
rapidly the current draft (with minor updates) will not close anything 
for the future. It allows feedback from implementers and seems to be 
reasonable incremental development process.

B ) This is now a reminder of the CSP priorities. Remember Data 
discovery is done via ObsTAP 1.0 (1.1 soon) or SIAV2.0. Both are IVOA 
recommendations now. DataAccess and cutout is done via acref field in 
query response (full download) or SODA service. SODA service is referred 
from the Discovery response using DataLink technology. DataLink 1.0 is 
also an IVOA recommendation. SODA 1.0 is the missing part in the puzzle.

CSP said:
a ) Data Discovery (Query)
A service shall return to the client a list of observations, and
the corresponding metadata for each observation, meeting the
user imposed constraints. In the event that the user places no
constraints, the entire list of observations, and the correspondin
metadata for each data set, shall be returned. In the event that no
data meet the user's constraints, the service shall indicate the
absence of any matches.

b ) Data Access
* Once a user has the list of observations that satisfy the
constraints, they select all or a subset of the observations and:
* Download the complete science data for each of the
selected observations (the service shall return the complete
  multidimensional science data and metadata for each selected
observation) or;
* Download simple cutouts of the science data for each of the
selected observations (the service shall be able to extract and
return a user specified subset of the complet
e multidimensional
science data and metadata for each selected observation).

Simple Cutout
* For a simple cutout, the user specified subset is restricted to
be a contiguous interval within each dimension of the multidimensional 
science data.
The user should *not* be allowed to specify subsets with "gaps" or 
resampling or anything like that.
* Spatial: a circle (a coordinate and a radius)
* Energy: one interval (from energy1 to energy2)
* Time: one interval (from time1 to time2)
* Polarization: a list

C ) With the current recommendations and the  SODA WD as it has been 
proposed by the WD editor what can be implemented by data services. How 
IVOA applications ( service clients) can manage with that and serve the 
end-user needs ?

     a ) You MUST build a SIAV2.0 service or an ObsTaP service dedicated 
to your data cubes. Or both.
     b ) You MAY build a DataLink service providing resources attached 
to the data cubes
     c ) you MUST build a SODA service providing cutout facilities for 
your data cubes
     d ) the SODA service SHOULD be refered from the SIAV2.0 or ObsTAP 
response via a service descriptor (with appropriate reference to the 
publisher DID column) (case d1). Or it SHOULD be refered in the DataLink 
resource response (if it exists) with appropriate reference to the iD 
column in this response (case d2).

     Possible scenario for a end-user (with variations)

      The end user query the SIAV2.0 service or the OBstAP service via a 
client. (If he/she uses an ObsTAP service he/she should retrieve enough 
fields for description of dataset bundaries)

       The end user can read the response within the client interface 
and select a dataset of interest (or several). IN some cases he/she can 
download the full dataset by using the acref field in the response (by 
the way some time the acref field may alternatively contain the URL for 
a datalink resource response (see below). the format field makes the 
distinction )

     In case d1, for a cutout,  Client  SHOULD have the capacity to read 
the SODA service descriptor in the query response and open appropriate 
interface window with empty SODA standard parameters and  ID parameter 
filled by the Publisher DID of the  dataset chosen by the end-user. User 
can fill the POS, BAND, TIME, POL according to limits given by the query 
response.

       In the DataLink response (if DataLink service is available) the 
SODA descriptor could also be found (case d2). The client able to read a 
DataLink service will discover it and should be able to open an 
interface window as above.
       Alternativly (or in addition) the DataLink response could provide 
a fixed link with a SODA pre-computed URL for each dataset. In that case 
the limits of the cutout on each axis will probably be the same limits 
than the one used for the ROI in the discovery phase. This assumes that 
when you query a discovery service the ROI you use is also defining the 
limits you want to set on the cutout you will be able to retrieve on the 
discovered datasets. This sounds reasonable in many cases.

       If a client is not smart enough to manage Discovery service 
querying, SODA service interface, DataLink response display and 
interpretation and eventually data cube visualization, the end-user may 
use several combined applications communicating via SAMP. This point 
doesn't make any difference as long as all applications are run on the 
same Deskop

        Of course there is no way there to automatically control a 
priori that SODA parameter values entered by hand are correct or to 
manage free PARAMETERS (except blindly which may be reasonable as a 
fisrt step). But I think the basic CSP requirement are filled using the 
current draft. Refinment and sophistication will come in next version  
and  could adress Markus concerns.

Cheers
François

On 05/01/2016 08:51, Markus Demleitner wrote:
> Dear Colleagues,
>
> On Thu, Dec 24, 2015 at 03:40:21PM +0100, François Bonnarel wrote:
>>       This close-to-christmas email to announce that SODA1.0 (previously
>> known as AccessData1.0) WD has been released last monday. See :
>> http://www.ivoa.net/documents/SODA/20151221/index.html
>>        There has been very long discussions among authors and we made some
>> progress in convergence. However there is still points hardly debated. This
>> is my responsability of editor as well as DAL chair to provide now a version
>> which is regarded as insufficient according to some of us but is nonetheless
>> fulfilling the CSP and community basic requirements according to me.
>> Probably the discussion will start very soon on the DAL mailing list.
> Indeed -- there are of order 10 topics on which I'd like to see
> discussion on this draft (there's a list of them at the end of this
> mail).
>
> I'd like to start with the Big One (this is probably also going to be
> the longest mail in this series; please indulge me).  In one
> sentence, it's
>
>    The protocol must be written such that clients can work out what
>    parameter values will probably yield useful results.
>
> This, in my opinion, is really the make-or-break thing, i.e., what
> decides whether what we write will actually be useful as a generic
> access protocol, or whether it will be a source of constant annoyance
> all around[1].
>
> So -- even if you have only marginal interest in SODA, and even if this
> is a long mail, please take a few 10s of minutes to try and make up your
> mind based on the two drafts mentioned below.  You'd have my blessing to
> ignore the remaining SODA discussions if you are so inclined.
>
> The premise above applied to SODA becomes: All parameters (except for
> the oddball POS, which really has a special position; but I'll revisit
> that in a later mail) must be fully declared by the service (including
> VALUES and OPTION elements as appropriate) and be systematically
> discovered for UI/API generation by the client in a SODA exchange.
>
> I've written standards prose for that already that I think is about what
> a standards document can do to mandate such practices (of course, this
> is largely a matter of implementation style, which is hard to regulate).
> It's been in the text in volute rev. 3192 [2]; for your convenience, I
> have built the document as of that revision and put it on
> http://docs.g-vo.org/SODA-r3192.pdf.  The contentious prose starts at
> page 8 -- if you'd be so kind as to read sect. 2.6 ("three-factor
> semantics", 4 pages).
>
> You can comapare with sect. 2.6 as published (the published version is
> in effect volute rev. 3200, in case you'd like to see a diff).  Let me
> again bambi-eye all around and ask everyone with even a remote
> involvement in the cube thing to try and make up their minds and speak
> up, even if this thing appears a bit complicated at first, in particular
> because, in a way, it's really part of datalink and cannot be understood
> without it (I've argued it should really have been part of datalink in
> the first place).
>
> If there's anything we can do to help comprehensability, let us know,
> too.
>
>
> Meanwhile, allow me to once more try to argue why it is so important to
> urge services to provide consistent, dataset-specific metadata and the
> clients to use it in SODA.
>
> SODA is designed to operate on concrete datasets -- you've discovered
> something that looks like it might be interesting, but you're only
> interested in a small part or a particular mogrification of the dataset,
> so your client gets information on the dataset and then figures out what
> to do to retrieve the information relevant to you.  This means that you
> cannot just put in some value into a service parameter and watch what's
> coming out -- you'll almost always get nothing back because the coverage
> of a typical dataset is small and not easily predictable.
>
> The "horror vacui", the dreaded moment in GUIs when an input field is
> displayed and users have no idea what to put there, with SODA therefore
> isn't a minor usability issue, it's a protocol killer.
>
> It has been put forward that clients could infer the domains of the
> parameters (the "good" values) from a previous discovery query (e.g.,
> from SIAv2, they'd know the spatial and spectral coverage).
> Unfortunately, this line of reasoning is flawed in at least to respects:
>
> (1) The results of the discovery query might not be available to the
> client dealing with the SODA descriptor
>
> (2) This technique breaks down with the first custom parameter (is the
> corresponding item in the discovered metadata?  And what does the
> parameter correspond to in the first place?), and that would, again, be
> a killer for SODA's usefulness.
>
> Let me dwell on both points for a little while.
>
> Ad (1).   I expect the most common source for SODA descriptors will be
> Obscore (and it's a CSP-official usecase in case you don't agree).
> There, the access URL for cubes and other large datasets won't be the
> dataset itself, because you don't want people blindly pulling several
> 100s of gigabytes (or just one gigabyte, really).  Instead, you return a
> datalink document, which contains the SODA descriptor.  We at Heidelberg
> already occasonally do that, the CADC has datalink documents throughout
> IIRC (although I think they don't have custom SODA descriptors yet).
>
> To query Obscore, people typically use TAP, and their queries  will
> fairly typcially not be just "select * from" but very possibly rather
> something like "select access_url, target_name from ivoa.obscore
> join...."  Hence, a client doesn't have access to the obscore metadata,
> and even if it had, it might have a hard time recognising it in the
> possibly wide result tuples coming back from the database.
>
> Another scenario in which dataset metadata possibly obtained during
> discovery would get lost is when sending the datalink document (URL)
> through SAMP.  Whether we like it or not, our users love SAMP more than
> anything else we've come up with so far, and telling them SODA doesn't
> play with SAMP isn't going to make SODA popular.
>
> Ad (2).  The dataset operations that data providers will want to enable
> through SODA are essentially endless -- rebinning, renormalisation,
> format conversion, "logical" cutouts (e.g., on selected extensions
> only), etc.  Making SODA something that (to some extent) works with a
> select set of standard parameters but fails (in the sense of: client
> behaviour is unpredictable) as soon as a service needs a bit more is
> going to render it almost useless, and data providers will keep doing
> things through custom web pages.  It's the situation we have with SSAP;
> although that, as a discovery protocol, at least can limp along to some
> extent.  SODA, as an access protocol, wouldn't even limp.
>
> So, we need to say: "A well-behaved SODA client will do X any Y and
> *not* ignore Z" to give data providers the confidence that independent
> of the client their users choose they still see whatever operations they
> consider important.  That's what I've tried in rev. 3192 section 2.6.
>
> As an additional indication that full metadata in the SODA descriptor is
> a very good idea, let me mention in passing that
>
> (3) it would enable usable interfaces in stop-gap XSLT-based datalink
> interfaces (as discussed in Sydney,
> http://wiki.ivoa.net/internal/IVOA/InteropOct2015DAL/datalink-xslt.pdf)
>
>
> Just so nobody can't say later I didn't warn them: Yes, this means that
> the datalink document that contains the SODA descriptor has to be
> tailored for each dataset.  But that's really not a big deal, because
> the datalink documents themselves vary with dataset (well, typically) --
> previews, plots, provenance, whatever all depend on the dataset.
> Dropping in the limits into the SODA descriptor in addition at least for
> me hasn't been a major additional implementation burden.
>
>
> That's it for my first SODA gripe, and thanks for making it here.  I
> plan to have, roughly weekly, additional SODA gripes, one after the
> other to allow productive discussions on each point.  To give you an
> idea what I have up my sleeve here's a tentative programme:
>
> (2) Spatial coverage discovery and the RA and DEC parameters
> (3) Pixel coutouts: PIXEL_n
> (4) Mandated multiplicities considered harmful
> (5) Behaviour for no-ID queries?  For queries with only ID?
> (6) No gratuitous xtypes
> (7) POS doesn't have an xtype
> (8) Examples stuff: example example, and perhaps a dl-id term?
>
> If this sounds scary, don't worry -- this kind of thing has IMHO worked
> great for datalink.
>
> Cheers,
>
>              Markus
>
>
> [1] Incidentally, it also coincides with my conviction that in protocol
> development in the VO, we should be thinking much more than in the past
> from the client perspective, even if most of the protocol developers sit
> on the server side.
>
> [2] To get the source from the repository, use something like
>
> svn co -r 3192 https://volute.g-vo.org/svn/trunk/projects/dal/SODA
>