dal Digest, Vol 76, Issue 10

Wed Jan 13 15:08:09 CET 2016

Concerning the offset option in ADQL.
This option is friendly in ADQL, but the implementation will be more 
difficult in particular for big tables (as Francois Xavier said: it 
needs to reorder the result).

This option is clearly interesting for pagination.
Could we considered to push this pagination capability in UWS?
UWS is may be not dedicated for tables; however a UWS capability 
allowing to get a "cut" of a result (delimited by an offset of lines or 
an option to split the result) could be an other solution?

Gilles Landais (CDS)

On 13/01/2016 12:00, dal-request at ivoa.net wrote:
> Send dal mailing list submissions to
> 	dal at ivoa.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://mail.ivoa.net/mailman/listinfo/dal
> or, via email, send a message with subject or body 'help' to
> 	dal-request at ivoa.net
>
> You can reach the person managing the list at
> 	dal-owner at ivoa.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of dal digest..."
>
>
> Today's Topics:
>
>     1. Re: ADQL evolution: OFFSET? (Mark Taylor)
>     2. Re: SODA gripes (1): The Big One (Markus Demleitner)
>     3. RE: ADQL evolution: OFFSET? (gerard.lemson at gmail.com)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 12 Jan 2016 11:17:16 +0000 (GMT)
> From: Mark Taylor <M.B.Taylor at bristol.ac.uk>
> To: dal at ivoa.net
> Subject: Re: ADQL evolution: OFFSET?
> Message-ID:
> 	<alpine.LRH.2.20.1601121113400.24326 at andromeda.star.bris.ac.uk>
> Content-Type: text/plain; charset=US-ASCII
>
> On Tue, 12 Jan 2016, Markus Demleitner wrote:
>
>> In terms of whether OFFSET makes sense without ORDER BY: I think
>> that's really a moot point since ADQL has allowed TOP without ORDER BY
>> since the beginning, which is exactly as questionable.  So, I'd claim
> I don't think that's really true.  TOP is just if you want a result
> but you don't want it too big.  Non-ORDERed OFFSET would on the other
> hand presumably be used to provide pagination.  The problem there is
> that you might get different orderings for different queries, so
> end up with duplicate or missing records.
>
> --
> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 12 Jan 2016 15:25:12 +0100
> From: Markus Demleitner <msdemlei at ari.uni-heidelberg.de>
> To: dal at ivoa.net
> Subject: Re: SODA gripes (1): The Big One
> Message-ID: <20160112142512.GB4381 at victor>
> Content-Type: text/plain; charset=utf-8
>
> Dear DAL,
>
> I'll try to reply to several of the contributions of the last week at
> once; think the threads are close enough to merit that, although it
> means that this mail, again, is a bit on the long side; but having it
> all in one narrative perhaps saves you time, and it's again
> essentially all on: Domain metadata or no domain metadata?
>
> First,
>
> On Tue, Jan 12, 2016 at 06:57:12AM +0000, James.Dempsey at csiro.au wrote:
>> Parameter ranges are really useful, and one of our early testing tools
>> was a page which has RA/Dec entry fields that default to the centre of
>> the image cube to be processed. However to me aggregate ranges seem a
>> lot less useful, e.g. a range covering three cubes with narrow
>> spectral ranges that are widely spaced from each other will leave
>> plenty of room for empty result sets. The reference values for a data
>> product are in the ObsTAP/SIA2 response and I'd not like to
>> duplicate them elsewhere. Thus I'm in favour of the current draft
>> text over Markus' suggestion.
>>
>> Note: This is based on the assumption that a client app would have to
>> be ObsTAP/SIA2 aware to use SODA.
> Right -- and as I pointed out, even ObsTAP doesn't necessarily help you
> because there's no guarantee that the evaluating application has access
> to all Obscore columns.
>
> So, I keep maintaing it would be an error to restrict SODA to a "full
> metadata known" scenario, in particular because I expect it will not be
> unusual that the link between parameters and the relevant pieces of
> metadata is not known to the client (and as I said, custom parameters
> will be all over the place, as they are for SSA today.  Only more so).
>
> As to duplication -- note that even in the SIAv2 case with full metadata
> availability, there are two use cases.
>
> (1) the user selects a single dataset.  In that case, a model-aware
> client would need to fill parameters in the DAL-embedded service
> descriptor from dataset metadata as good as  it can (i.e., for those
> that it really knows).
>
> I'd maintain that's not a good practice, as that is error-prone, and the
> client should rather retrieve a datalink document.  The datalink
> descriptors embedded into DAL responses aren't really suited for
> single-dataset access, exactly because the client has a hard time
> figuring out what custom parameters correspond to which pieces of
> metadata, if there's such a correspondance in the first place.
>
> (2) the user wants to do multiple cutouts.  This is where the aggregate
> limits become important.  If you want, you can already try this with
> recent versions of splat (even if the UI to SODA on published versions
> admittedly is ugly) -- on SODA-enabled services, you can, for all
> spectra, say you'd like a certain spectral region and a special format
> (due to a bug in the published versions, you'll have to use that later
> feature to request FITS results if you check it out).  With that, you
> can retrieve *multiple* spectra processed in the same way.  The ranges
> (which published versions of splat show when mousing over the input
> fields) in these cases again have to come from the service, as again the
> relationship between result columns and parameters is hard to declare.
>
> Even if some of the results will be empty because of the orginal
> dataset's coverage, this possibility to process multiple datasets in the
> same way is eminently useful, e.g., if you only want to retrieve the
> immediate vicinity of H alpha (or whatever) -- and that is what the
> in-DAL service definitions were really intended for in my early
> proposals.  But it's something completely different from exploring,
> slicing and dicing and individual dataset.  In particular, it
> presupposes a fairly intimate knowledge of the data collection you're
> working on.
>
> So, I think we should keep domain definitions even in in-DAL service
> descriptors (but it might be wise to add prose explaining what they're
> intended for: they're shortcuts to mass processing).
>
> In the datalink-embedded service descriptors, I still think there's no
> actual alternative.
>
>> Perhaps table 2 could be expanded to list the ObsCore fields that
>> define the range for the parameter, or those could be included in the
>> parameter???s subsection?
> Again, that's only helping if we restrict SODA to operating when there's
> an ObsCore definition present and only on concepts present in ObsCore.
> I'd claim that's unnecessary, and it's actually much easier for the
> client (because otherwise it has to gather together limits from wherever
> some metadata may be located) and not noticeably harder to the server if
> we're explicit about the domains.
>
> Because it fits here, let me drop in my PARTISAN CONCLUSION here
> already: I see a choice between a very specialised protocol that's hard
> to use and a general protocol that's easy to use, all hinging on the
> proper declaration of metadata, in particular the domains.
>
> Of course, I may miss something what's not to like about proper domain
> metadata -- if so, someone get a cluestick.
>
>> One related observation ??? in sections 2.6.1 and 3.2.2, BAND has a
>> UCD of ???em???. Should this instead be ???em.wl??? to provide an
> That's already fixed in SVN (rev. 3203) -- I just didn't get around to
> repairing it before the Dec 24 release.
>
>> exact match with the ObsCore em_min and em_max fields and be clear
>> that it is a wavelength? This will help client apps to make the link
>> and will guide users such as radio astronomers who work more often in
>> frequency terms.
> ...where of course clients should allow users to use their
> domain-specific units, so hopefully this won't be that much of an issue.
>
> Then, on to Mark's mail:
>
> On Fri, Jan 08, 2016 at 10:39:58PM +0000, Mark Taylor wrote:
>> Sec 1:
>>     Most of the use cases in sec 1 are labelled "will be developed
>>     and supported in [a later SODA version]".  Does this mean that
>>     this version of SODA is only targetted at simple (POS/BAND/TIME/POL)
>>     cutouts?  That's fine if so, but it would be helpful to note that
> Hm, ah well, I'd claim it's not fine if so, because that'd lead client
> development into a harmful direction where they ignore the service
> descriptors and just run based on Obscore results.  Which would put SODA
> to where SSA is today: barely working for the simple cases, a matter of
> finger-crossing everywhere else.
>
> It's (almost; I'm not a big fan of announcements in standards) fine to
> say "standard parameters to do these other things will be defined
> later", but I'm sure we can write the standard now in a way that clients
> written to 1.0 will work fine with more capable services, possibly
> adhering to later standards -- essentially by three-factor-semantics and
> proper metadata generation and usage.
>
>> Finally (at least for now), it's not obvious to me from this document
>> how to actually use a SODA service.  Possibly that's because I'm
>> not familiar enough with Datalink or other associated standards,
>> but I may not be the only one...  Presumably (in view of the
> I agree this needs better explanation -- have you had a look at my rev.
> 3192 build at http://docs.g-vo.org/SODA-r3192.pdf, section 2.6?  I make
> an effort to explain the information flow there, as that is really
> important to understand why the protocol really hemorraghes usefulness
> when we don't mandate parameter domain definitions.
>
> On to Fran?ois' mails.
>
> On Tue, Jan 05, 2016 at 06:47:33PM +0100, Fran?ois Bonnarel wrote:
>>        a ) It is true that the main point of discussion is about the
>> descriptions of the PARAMETER domains mainly when it is not directly
>> available in the client (for example via the metadata provided by the
>> discovery phase). And also that in the case of custom parameters (as well as
>> it would be for custom services parameters) there is nothing that could be
>> discoverable.
>>        b )  My point is that it is possible to postpone the solution of that
>> use case FOR NOW for three reasons:
>>              1 ) The current draft allows to fulfill  the basic requirements
>> of the CSP  in 95% of the cases. We can wait next version of
>> ObstAP/DALI/SIAV2 and SODA to solve the remaining 5%. This is the point I
> The 95% are conjecture, and I dispute them.  On my end, 100% of the cases
> require full domain definition (spectral cutouts from splat, and
> XSLT-processed datalink in the browser).  Which future SODA clients will
> have what discovery metadata available nobody can start to predict.  But
> I think it's easy to agree that either way they'll have a much easier
> life if we're explicit from the start, as they won't have to have
> complicated metadata mapping schemes just to discover what the service
> can simply tell them from the start.
>
>> features. This includes proposing a concurrent technology for describing the
>> domains  as we have allready the description in the Obscore table. This also
> But it doesn't, and there may be no relation of parameters to the
> obscore items in the first place, in particular not for custom or future
> parameters.  Even if there were, there is no way to declare that right
> now, and inventing one is much more complicated than doing the right
> thing (proper metadata declaration) from the start.
>
> So, there is do duplication of information in reality.
>
> And again (just to be on the safe side, although Fran?ois stressed so
> himself): This doesn't help *at all* in the, IMHO typical, use case
> where a client looks at a datalink document.  The current WD simply
> completely breaks that use-case, and I'd argue needlessly.
>
>>              3 ) the current draft is totally open on future evolution on
>> this point. It may be consistent with the solution proposed by Markus and
> Unfortunately, it's not.  Once the first clients are out and it becomes
> clear that they're not useful for what people want to do with their
> services, they'll keep developing web interfaces, and SODA will go the
> way of <insert your favourite non-taken-up IVOA standard>, and people
> will screen scrape and type into web forms for the next five years at
> least.
>
>> B ) This is now a reminder of the CSP priorities. Remember Data discovery is
>> done via ObsTAP 1.0 (1.1 soon) or SIAV2.0. Both are IVOA recommendations
>> now. DataAccess and cutout is done via acref field in query response (full
>> download) or SODA service. SODA service is referred from the Discovery
> No, as I pointed  out above, the typical way should be to first retrieve
> a datalink document for a discovered dataset, and work on this -- and
> actually, this is what the CADC does in its obscore service throughout
> already, and everything else (i.e., working from a service block) is
> shortcuts for special situations (e.g., mass cutout of the vicinity of a
> spectral line).
>
>> C ) With the current recommendations and the  SODA WD as it has been
>> proposed by the WD editor what can be implemented by data services. How IVOA
>> applications ( service clients) can manage with that and serve the end-user
>> needs ?
>>
>>      a ) You MUST build a SIAV2.0 service or an ObsTaP service dedicated to
>> your data cubes. Or both.
>>      b ) You MAY build a DataLink service providing resources attached to the
>> data cubes
>>      c ) you MUST build a SODA service providing cutout facilities for your
>> data cubes
>>      d ) the SODA service SHOULD be refered from the SIAV2.0 or ObsTAP
>> response via a service descriptor (with appropriate reference to the
>> publisher DID column) (case d1). Or it SHOULD be refered in the DataLink
>> resource response (if it exists) with appropriate reference to the iD column
>> in this response (case d2).
> Uh -- this looks scary.  Before anyone panics, can't we simply say:
>
> (1) You build an Obscore serive, if you want add SIAv2 glued on top.
> There's several usable TAP engines out there that you can use, so that's
> relatively easy to do.
>
> (2) For each dataset, you generate (either pre-generate or generate on
> the fly, which would be a datalink service) a little VOTable that
> describes access options.  This is what you let your Obscore table point
> to.
>
> (3) If you run SODA (e.g., for cutouts), this little VOTable also
> contains a description of how to operate it for the dataset in question.
>
> Much less scary, straightforward in implementation, regardless if you're
> a large or a small provider.
>
>>        If a client is not smart enough to manage Discovery service querying,
>> SODA service interface, DataLink response display and interpretation and
>> eventually data cube visualization, the end-user may use several combined
>> applications communicating via SAMP. This point doesn't make any difference
>> as long as all applications are run on the same Deskop
> It does make a huge difference (works so-so vs. doesn't work at all)
> unless we have full parameter metadata, because otherwise you'll have to
> transmit the discovered metadata *together with* the datalink document.
> We don't have a technique for that, and developing one is a much larger
> pain than simply doing the right thing in parameter definitions
>
>> PARAMETERS (except blindly which may be reasonable as a fisrt step). But I
>> think the basic CSP requirement are filled using the current draft.
>> Refinment and sophistication will come in next version  and  could adress
> Again: We can do something complicated that may just barely fulfil the
> basic CSP requirements in some special scenarios, or we can do something
> simple that fulfills the CSP requirements even in the presumably common
> case of a datalink transmitted over SAMP.  Hence, I don't think the CSP
> requirements can serve as a guideline to choose here.
>
> And then on to Fran?ois' last mail:
>
> On Thu, Jan 07, 2016 at 04:26:18PM +0100, Fran?ois Bonnarel wrote:
>> C ) the {link} resource of the DataLink spec is working like a glue between
>> datasets and additional resources such as fixed links or services applied on
>> a given dataset. It contains external descriptions of the links and
> True.
>
>> resources, and of services input PARAMETERS. It should not contain
>> description of the dataset themselves which is the work of discovery
>> services or accessData or server side processing WEB services ( as SODA is
>> intended to be), in order to avoid confusion between the role of each module
>> in the whole DAL scheme.
> I don't think I understand this argument.  Whose confusion are you
> worried about?  Why should the description of the dataset be the job of
> discovery services?   Of course the dataset itself contains its
> metadata, and I don't think anyone was ever confused by encountering WCS
> information or the image size in a FITS header.
>
> On the contrary: As you say, the datalink document for a dataset
> accessible through one or more SODA services will contain their
> parameter metadata.  An important part of this is the domains these
> parameters admit.  That these may or may not correspond to properties
> of the dataset in question goes without saying -- how could that ever be
> confusing?
>
>> with Markus approach of the input PARAMETER domain metadata issue (see
>> http://docs.g-vo.org/SODA-r3192.pdf  ,section 6 for his views and compare
>> with the same section in the editor WD).
> Just to be on the safe side: Fran?ois of course is talking about 2.6
> (fortunately, there's no section 6, so hopefully there's been no
> additional confusion).
>
>> I propose a mechanism which I think is more consistent with what we allready
>> have and the general DAL architecture. However I don't wnat to push it now
>> in the WD and in the spec, because I Think we have time to discuss these
>> matters until the next version of SODA, SIAV2 and DataLink. In my first
> On that I think we should really try hard to avoid putting forward a
> standard with the express plan to, in all likelihood incompatibly,
> invalidate it right away.  If I were an outside implementor I'd say that
> VO standards authors must have lost their minds when confronted with
> such a proposition.  Planning for growth and custom features is good,
> announcing immediate obsolescence is not.
>
>> email I tried to convince you that we allready have, without that "domain
>> metadata" feature a workable spec to fulfill the basic CSP spec.
>>     The solution is based on the inclusion of "ref" attributes in the service
>> descriptor PARAMETER elements for all the standard input PARAMETERS. ref to
>> the appropriate Obscore FIELD/PARAMETER or GROUP of FIELDS/PARAMETERS. This
>> can be done in the discovery service response, or in the response given by
>> the SODA service queried with the unique ID="dataset_id" constraint. Let's
>> see how it can work with examples in E and F.
> I'm tempted to remark that this kind of double referencing is pretty
> heavy stuff just a avoid what I've argued above isn't actually
> repetition in the first place, and that of course it doesn't solve
> operating SODA from datalink (or, equivalently, via SAMP).
>
> But the bigger problem with the proposed mechanism is that it breaks (or
> incompatibly overrides) what we have in datalink:
>
>    Although this version of DataLink only has one parameter (ID), using a
>    GROUP and providing the service parameter name allows this recipe
>    [parameters being filled from what its definition @ref's] to be used
>    with any service and (with the GROUP) with multi-parameter services.
>
> (http://ivoa.net/documents/DataLink/20150617/REC-DataLink-1.0-20150617.pdf,
> PDF page 21).  So, if SODA now proposed using @ref for pointing to
> relevant pieces of metadata, we'd have to explain when to immediately
> fill the parameter as per Datalink and when to use the ref'ed value as a
> hint as per SODA.  I don't like it.  At all.
>
> Well, thanks for making it here.  Sorry for being so verbose, but I'm
> really, really worried we're messing up our one chance to have a widely
> adopted *and* implemented (on the client side, primarily) protocol for
> SODA: server-side operations on data.
>
> Cheers,
>
>                 Markus
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 12 Jan 2016 10:43:19 -0500
> From: <gerard.lemson at gmail.com>
> To: <dal at ivoa.net>
> Subject: RE: ADQL evolution: OFFSET?
> Message-ID: <000001d14d4f$f64cabb0$e2e60310$@gmail.com>
> Content-Type: text/plain;	charset="us-ascii"
>
>
> Hi list
> Just wanted to voice my support for Mark's use case, which is exactly the
> way I use TOP: to see an example of the data, generally used without
> ordering.
>
>>> In terms of whether OFFSET makes sense without ORDER BY: I think
>>> that's really a moot point since ADQL has allowed TOP without ORDER BY
>>> since the beginning, which is exactly as questionable.  So, I'd claim
>> I don't think that's really true.  TOP is just if you want a result but
> you don't want
>> it too big.  Non-ORDERed OFFSET would on the other hand presumably be used
>> to provide pagination.  The problem there is that you might get different
>> orderings for different queries, so end up with duplicate or missing
> records.
> Gerard
>
>
>> --
>> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
>> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
>
>
> ------------------------------
>
> _______________________________________________
> dal mailing list
> dal at ivoa.net
> http://mail.ivoa.net/mailman/listinfo/dal
>
> End of dal Digest, Vol 76, Issue 10
> ***********************************