Auxiliary answer to Re: SODA gripes (1): The Big One

James.Dempsey at csiro.au James.Dempsey at csiro.au
Tue Jan 12 07:57:12 CET 2016


Hi,

We are in the process of testing out our new cut-out functionality for CASDA at the moment, so we have been looking closely at interacting with this service. Our cut-out service is intended to be SODA compliant (it is currently based on the old AccessData spec).

Parameter ranges are really useful, and one of our early testing tools was a page which has RA/Dec entry fields that default to the centre of the image cube to be processed. However to me aggregate ranges seem a lot less useful, e.g. a range covering three cubes with narrow spectral ranges that are widely spaced from each other will leave plenty of room for empty result sets. The reference values for a data product are in the ObsTAP/SIA2 response and I’d not like to duplicate them elsewhere. Thus I’m in favour of the current draft text over Markus’ suggestion.

Note: This is based on the assumption that a client app would have to be ObsTAP/SIA2 aware to use SODA.

Perhaps table 2 could be expanded to list the ObsCore fields that define the range for the parameter, or those could be included in the parameter’s subsection?

One related observation – in sections 2.6.1 and 3.2.2, BAND has a UCD of “em”. Should this instead be “em.wl” to provide an exact match with the ObsCore em_min and em_max fields and be clear that it is a wavelength? This will help client apps to make the link and will guide users such as radio astronomers who work more often in frequency terms.

Cheers,
James Dempsey


From: dal-bounces at ivoa.net [mailto:dal-bounces at ivoa.net] On Behalf Of François Bonnarel
Sent: Friday, 8 January 2016 2:26 AM
To: dal at ivoa.net
Subject: Auxiliary answer to Re: SODA gripes (1): The Big One

Dear all,
       In my main answer to Markus last tuesday I wrote

            2 ) the discussion on the solution proposed in Markus' version of the WD promise to be long. I am strongly against some of the involved features. This includes proposing a concurrent technology for describing the domains  as we have allready the description in the Obscore table. This also includes describing the data content  in the DataLink response which should be agnostic to the content of the dataset to which it is relating resources. I will develop this in another email
I am proceeding to this development here. Sorry to be long, but this is a major discussion.
Cheers
François

A ) The upcoming new protocols (DataLink as well as ObsTAP or SIAV2 and SODA) are independant but they are part of the same scheme. They could have been  merged in a single protocol. Modularity allow some flexibility but imposes some precautions in order not to diverge and create unconsistencies.

B ) SODA standard Parameters (as well as SIAV2.0 PARAMETERS of which they are a sublist) are consistent with  Obscore model fields.

  *   ID is related to obs_publisher_did
  *   POS is related to s_region
  *   BAND is related to em_min and em_max
  *   TIME is related to t_min and t_max
  *   and POL  is related to pol_states
    The input PARAMETER domain valid for each dataset is exactly given by a combination of these Obscore table FIELDS. Let's detail this

  *   s_region gives the spatial support of the dataset as an STC AStroCoord Area (could be a POLYGON, a CIRCLE, a 2D-INTERVAL (or RANGE), etc ...). Any valid value of the SODA POS parameter for a given dataset should be a region included in the region specified by s_region value.
  *   em_min and em_max give the bounds of the dataset on the spectral axis. This is exactly the valid domain for a given dataset of the SODA BAND parameter.
  *   t_min and t_max give the bounds of the data set on the time axis. This is exactly the valid domain for a given dataset of the SODA TIME parameter.
  *   pol_states gives the list of polarization states present in a dataset. This is exactly the valid list  of polarization states in which the SODA POL parameter can select.
C ) the {link} resource of the DataLink spec is working like a glue between datasets and additional resources such as fixed links or services applied on a given dataset. It contains external descriptions of the links and resources, and of services input PARAMETERS. It should not contain description of the dataset themselves which is the work of discovery services or accessData or server side processing WEB services ( as SODA is intended to be), in order to avoid confusion between the role of each module in the whole DAL scheme.

D)  As the consequency of the opinions exposed above I have severe concerns with Markus approach of the input PARAMETER domain metadata issue (see  http://docs.g-vo.org/SODA-r3192.pdf  ,section 6 for his views and compare with the same section in the editor WD).
I propose a mechanism which I think is more consistent with what we allready have and the general DAL architecture. However I don't wnat to push it now in the WD and in the spec, because I Think we have time to discuss these matters until the next version of SODA, SIAV2 and DataLink. In my first email I tried to convince you that we allready have, without that "domain metadata" feature a workable spec to fulfill the basic CSP spec.
   The solution is based on the inclusion of "ref" attributes in the service descriptor PARAMETER elements for all the standard input PARAMETERS. ref to the appropriate Obscore FIELD/PARAMETER or GROUP of FIELDS/PARAMETERS. This can be done in the discovery service response, or in the response given by the SODA service queried with the unique ID="dataset_id" constraint. Let's see how it can work with examples in E and F.

E ) Example with the SODA service descriptor in the sia2 discovery response

Query example
http://dalservices.ivoa.net/sia2/query?POS=CIRCLE 2.8425 74.4846 0.1 &BAND=0.0002&BAND=0.00006&COLLECTION=IRAS-IRIS

excerpt of query example response

<?xml version="1.0" encoding="UTF-8" ?>
<VOTABLE version="1.2" xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"<http://www.w3.org/2001/XMLSchema-instance> xsi:noNamespaceSchemaLocation = "xmlns:http://www.ivoa.net/xml/VOTable-1.2.xsd" >
      <RESOURCE type="meta" utype="adhoc:service" name="this">
          <PARAM name="standardID" datatype="char" arraysize="*" value="ivo://ivoa.net/std/SODA#sync-1.0" />
          <PARAM name="accessURL" datatype="char" arraysize="*"  value="http://example.com/SODA/sync"<http://example.com/SODA/sync> />
          <GROUP name="inputParams">
               <PARAM name="ID" ucd="meta.id" datatype="char" arraysize="*" xtype="ivoident" ref="pdid" />
               <PARAM name="POS" ucd="pos" unit="deg" datatype="char" arraysize="*" xtype="polygon" ref="sreg" />
               <PARAM name="BAND" ucd="em" unit="m" datatype="double" arraysize="*" xtype="interval" ref="sbound" />
               <PARAM name="TIME" ucd="time" unit="d" datatype="double" arraysize="*" xtype="interval" ref="tbound"/>
               <PARAM name="POL" ucd="pol" datatype="char" arraysize="*" xtype="Stokes" ref="pstates" />
          </GROUP>
      </RESOURCE>
      <RESOURCE type="results”>
           <INFO name="QUERY_STATUS" value="OK"/>
          <TABLE>
              <GROUP ID="tbound">
                  <FIELDref ref="tmin"/>
                  <FIELDref ref="tmax"/>
             </GROUP>
            <GROUP ID="sbounds">
                 <FIELDref ref="smin"/>
                <FIELDref ref="smax"/>
            </GROUP>
           <FIELD name="dataproduct_type" ucd="meta.id" datatype="char"utype="obscore:ObsDataSet.dataProductType" arraysize="*" />
           <FIELD name="calib_level" ucd="meta.code;obs.calib" datatype="int" utype="obscore:ObsDataSet.calibLevel" />
           <FIELD name="obs_collection" datatype="char" ucd="meta.id" utype="obscore:DataID.Collection" arraysize="*" />
           <FIELD name="obs_id" ucd="meta.id" datatype="char" utype="obscore:DataID.observationID" arraysize="*" />
           <FIELD ID ="pdid" name="obs_publisher_did" ucd="meta.ref.url;meta.curation" datatype="char" utype="obscore:Curation.PublisherDID" arraysize="*" />
            .......................... (removed fields)
          <FIELD ID="sreg" name="s_region" datatype="char" ucd="phys.angArea;obs" utype="obscore:Char.SpatialAxis.Coverage.Support.Area" arraysize="*" unit="deg" />
            ..... (removed fields)
          <FIELD ID="tmin" name="t_min" datatype="double" ucd="time.start;obs.exposure" utype="obscore:Char.TimeAxis.Coverage.Bounds.Limits.StartTime" unit="s" />
          <FIELD ID="tmax" name="t_max" datatype="double" ucd="time.end;obs.exposure" utype="obscore:Char.TimeAxis.Coverage.Bounds.Limits.StopTime" unit="s" />        ......... (removed fields)
          <FIELD ID="smin" name="em_min" datatype="double" ucd="em.wl;stat.min" utype=" obscore: Char.SpectralAxis.Coverage.Bounds.Limits. LoLimit " unit="m" />
           <FIELD ID="smax" name="em_max" datatype="double" ucd="em.wl;stat.max" utype="obscore:Char.SpectralAxis.Coverage.Bounds.Limits.HiLimit"  unit="m" />
           ..................(removed fields)
           <FIELD name="instrument_name" ucd="meta.id;instr" datatype="char" arraysize="*" utype="obscore:Provenance.ObsConfig.instrument.name" />

           <DATA>
             <TABLEDATA>
              <TR>
                  <TD>cube</TD>
                  <TD>1</TD>
                  <TD>IRAS-IRIS</TD>
                  <TD>I422B2H0</TD>
                  <TD>ivo://cds.u-strasbg.fr/IRAS-IRIS/25MU/I422B2H0</TD>
                  <TD><![CDATA[http://aladix.u-strasbg.fr/cgi-bin/nph-Aladin++dev.cgi?out=image&position=0.000000+80.000000&field=I422B2H0&survey=IRAS-IRIS&color=25MU&mode=view]]></TD>
                  <TD>image/fits</TD>
                  <TD>1600</TD>
                  <TD>I422B2H0</TD>
                  <TD>0.000000 </TD>
                  <TD>80.000000 </TD>
                  <TD>0.5</TD>
                  <TD>POLYGON 30.0 200.0 32.0 200.0 32.0 198.0 30.0 198.0</TD>
                  <TD></TD>
                  <TD></TD>
                  <TD></TD>
                  <TD>1000</TD>
                  <TD>1.0</TD>
                  <TD>0.20</TD>
                  <TD>0.22</TD>
                  <TD>5.0</TD>
                  <TD></TD>
                  <TD>StokesQ,StokesU,StokesV</TD>
                  <TD>IRAS-IRIS</TD>
                  <TD></TD>
              </TR>
 ...........................


F ) Example with the SODA service descriptor in the SODA  response

Query example to the soda service (with unique  ID=.... filled parameter)
http://dalservices.ivoa.net/soda?ID="ivo://cds/IRAS-IRIS/25MU?...."

The response will be the PARAM description with Domain metadata "à la" Obscore
<?xml version="1.0" encoding="UTF-8" ?>
<VOTABLE version="1.2" xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"<http://www.w3.org/2001/XMLSchema-instance> xsi:noNamespaceSchemaLocation = "xmlns:http://www.ivoa.net/xml/VOTable-1.2.xsd" >
      <RESOURCE type="meta" utype="adhoc:service" name="this">
          <PARAM name="standardID" datatype="char" arraysize="*" value="ivo://ivoa.net/std/SODA#sync-1.0" />
          <PARAM name="accessURL" datatype="char" arraysize="*" value="http://example.com/SODA/sync"<http://example.com/SODA/sync> />
          <GROUP name="inputParams">
             <PARAM name="ID" ucd="meta.id" datatype="char" arraysize="*" xtype="ivoident" value="ivo://cds/IRAS-IRIS/25MU?...." />
             <PARAM name="POS" ucd="pos" unit="deg" datatype="char" arraysize="*" xtype="polygon" ref="sreg" />
             <PARAM name="BAND" ucd="em" unit="m" datatype="double" arraysize="*" xtype="interval" ref="sbound" />
             <PARAM name="TIME" ucd="time" unit="d" datatype="double" arraysize="*" xtype="interval" ref="tbound"/>
             <PARAM name="POL" ucd="pol" datatype="char" arraysize="*" xtype="Stokes" ref="pstates" />
         </GROUP>
        <GROUP name="DomainMetadata">
          <GROUP ID="tbounds">
             <FIELDref ref="tmin"/>
             <FIELDref ref="tmax"/>
          </GROUP>
         <GROUP ID="sbounds">
            <FIELDref ref="smin"/>
            <FIELDref ref="smax"/>
          </GROUP>
         <PARAM ID="sreg" name="s_region" datatype="char" ucd="phys.angArea;obs" utype="obscore:Char.SpatialAxis.Coverage.Support.Area" arraysize="*" unit="deg" value="POLYGON 30.0 200.0 32.0 200.0 32.0 198.0 30.0 198.0"/>
         <PARAM ID="tmin" name="t_min" datatype="double" ucd="time.start;obs.exposure" utype="obscore:Char.TimeAxis.Coverage.Bounds.Limits.StartTime" unit="s" value="52300.5"/>
          <PARAM ID="tmax" name="t_max" datatype="double" ucd="time.end;obs.exposure" utype="obscore:Char.TimeAxis.Coverage.Bounds.Limits.StopTime" unit="s" value="52300.6"/>
           <PARAM ID="smin" name="em_min" datatype="double" ucd="em.wl;stat.min" utype=" obscore: Char.SpectralAxis.Coverage.Bounds.Limits. LoLimit " unit="m" value="0.20"/>
          <PARAM ID="smax" name="em_max" datatype="double" ucd="em.wl;stat.max" utype="obscore:Char.SpectralAxis.Coverage.Bounds.Limits.HiLimit"  unit="m" value="0.22"/>
          <PARAM ID="pstates" name="pol_states" datatype="char" ucd="meta.code;phys.polarization" utype="obscore:Char.PolarizationAxis.stateList" arraysize="*" value="StokesQ,StokesU,StokesV"/>
       </GROUP>
    </RESOURCE>



On 05/01/2016 08:51, Markus Demleitner wrote:

Dear Colleagues,



On Thu, Dec 24, 2015 at 03:40:21PM +0100, François Bonnarel wrote:

     This close-to-christmas email to announce that SODA1.0 (previously

known as AccessData1.0) WD has been released last monday. See :

http://www.ivoa.net/documents/SODA/20151221/index.html

      There has been very long discussions among authors and we made some

progress in convergence. However there is still points hardly debated. This

is my responsability of editor as well as DAL chair to provide now a version

which is regarded as insufficient according to some of us but is nonetheless

fulfilling the CSP and community basic requirements according to me.

Probably the discussion will start very soon on the DAL mailing list.



Indeed -- there are of order 10 topics on which I'd like to see

discussion on this draft (there's a list of them at the end of this

mail).



I'd like to start with the Big One (this is probably also going to be

the longest mail in this series; please indulge me).  In one

sentence, it's



  The protocol must be written such that clients can work out what

  parameter values will probably yield useful results.



This, in my opinion, is really the make-or-break thing, i.e., what

decides whether what we write will actually be useful as a generic

access protocol, or whether it will be a source of constant annoyance

all around[1].



So -- even if you have only marginal interest in SODA, and even if this

is a long mail, please take a few 10s of minutes to try and make up your

mind based on the two drafts mentioned below.  You'd have my blessing to

ignore the remaining SODA discussions if you are so inclined.



The premise above applied to SODA becomes: All parameters (except for

the oddball POS, which really has a special position; but I'll revisit

that in a later mail) must be fully declared by the service (including

VALUES and OPTION elements as appropriate) and be systematically

discovered for UI/API generation by the client in a SODA exchange.



I've written standards prose for that already that I think is about what

a standards document can do to mandate such practices (of course, this

is largely a matter of implementation style, which is hard to regulate).

It's been in the text in volute rev. 3192 [2]; for your convenience, I

have built the document as of that revision and put it on

http://docs.g-vo.org/SODA-r3192.pdf.  The contentious prose starts at

page 8 -- if you'd be so kind as to read sect. 2.6 ("three-factor

semantics", 4 pages).



You can comapare with sect. 2.6 as published (the published version is

in effect volute rev. 3200, in case you'd like to see a diff).  Let me

again bambi-eye all around and ask everyone with even a remote

involvement in the cube thing to try and make up their minds and speak

up, even if this thing appears a bit complicated at first, in particular

because, in a way, it's really part of datalink and cannot be understood

without it (I've argued it should really have been part of datalink in

the first place).



If there's anything we can do to help comprehensability, let us know,

too.





Meanwhile, allow me to once more try to argue why it is so important to

urge services to provide consistent, dataset-specific metadata and the

clients to use it in SODA.



SODA is designed to operate on concrete datasets -- you've discovered

something that looks like it might be interesting, but you're only

interested in a small part or a particular mogrification of the dataset,

so your client gets information on the dataset and then figures out what

to do to retrieve the information relevant to you.  This means that you

cannot just put in some value into a service parameter and watch what's

coming out -- you'll almost always get nothing back because the coverage

of a typical dataset is small and not easily predictable.



The "horror vacui", the dreaded moment in GUIs when an input field is

displayed and users have no idea what to put there, with SODA therefore

isn't a minor usability issue, it's a protocol killer.



It has been put forward that clients could infer the domains of the

parameters (the "good" values) from a previous discovery query (e.g.,

from SIAv2, they'd know the spatial and spectral coverage).

Unfortunately, this line of reasoning is flawed in at least to respects:



(1) The results of the discovery query might not be available to the

client dealing with the SODA descriptor



(2) This technique breaks down with the first custom parameter (is the

corresponding item in the discovered metadata?  And what does the

parameter correspond to in the first place?), and that would, again, be

a killer for SODA's usefulness.



Let me dwell on both points for a little while.



Ad (1).   I expect the most common source for SODA descriptors will be

Obscore (and it's a CSP-official usecase in case you don't agree).

There, the access URL for cubes and other large datasets won't be the

dataset itself, because you don't want people blindly pulling several

100s of gigabytes (or just one gigabyte, really).  Instead, you return a

datalink document, which contains the SODA descriptor.  We at Heidelberg

already occasonally do that, the CADC has datalink documents throughout

IIRC (although I think they don't have custom SODA descriptors yet).



To query Obscore, people typically use TAP, and their queries  will

fairly typcially not be just "select * from" but very possibly rather

something like "select access_url, target_name from ivoa.obscore

join...."  Hence, a client doesn't have access to the obscore metadata,

and even if it had, it might have a hard time recognising it in the

possibly wide result tuples coming back from the database.



Another scenario in which dataset metadata possibly obtained during

discovery would get lost is when sending the datalink document (URL)

through SAMP.  Whether we like it or not, our users love SAMP more than

anything else we've come up with so far, and telling them SODA doesn't

play with SAMP isn't going to make SODA popular.



Ad (2).  The dataset operations that data providers will want to enable

through SODA are essentially endless -- rebinning, renormalisation,

format conversion, "logical" cutouts (e.g., on selected extensions

only), etc.  Making SODA something that (to some extent) works with a

select set of standard parameters but fails (in the sense of: client

behaviour is unpredictable) as soon as a service needs a bit more is

going to render it almost useless, and data providers will keep doing

things through custom web pages.  It's the situation we have with SSAP;

although that, as a discovery protocol, at least can limp along to some

extent.  SODA, as an access protocol, wouldn't even limp.



So, we need to say: "A well-behaved SODA client will do X any Y and

*not* ignore Z" to give data providers the confidence that independent

of the client their users choose they still see whatever operations they

consider important.  That's what I've tried in rev. 3192 section 2.6.



As an additional indication that full metadata in the SODA descriptor is

a very good idea, let me mention in passing that



(3) it would enable usable interfaces in stop-gap XSLT-based datalink

interfaces (as discussed in Sydney,

http://wiki.ivoa.net/internal/IVOA/InteropOct2015DAL/datalink-xslt.pdf)





Just so nobody can't say later I didn't warn them: Yes, this means that

the datalink document that contains the SODA descriptor has to be

tailored for each dataset.  But that's really not a big deal, because

the datalink documents themselves vary with dataset (well, typically) --

previews, plots, provenance, whatever all depend on the dataset.

Dropping in the limits into the SODA descriptor in addition at least for

me hasn't been a major additional implementation burden.





That's it for my first SODA gripe, and thanks for making it here.  I

plan to have, roughly weekly, additional SODA gripes, one after the

other to allow productive discussions on each point.  To give you an

idea what I have up my sleeve here's a tentative programme:



(2) Spatial coverage discovery and the RA and DEC parameters

(3) Pixel coutouts: PIXEL_n

(4) Mandated multiplicities considered harmful

(5) Behaviour for no-ID queries?  For queries with only ID?

(6) No gratuitous xtypes

(7) POS doesn't have an xtype

(8) Examples stuff: example example, and perhaps a dl-id term?



If this sounds scary, don't worry -- this kind of thing has IMHO worked

great for datalink.



Cheers,



            Markus





[1] Incidentally, it also coincides with my conviction that in protocol

development in the VO, we should be thinking much more than in the past

from the client perspective, even if most of the protocol developers sit

on the server side.



[2] To get the source from the repository, use something like



svn co -r 3192 https://volute.g-vo.org/svn/trunk/projects/dal/SODA



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20160112/be112782/attachment-0001.html>


More information about the dal mailing list