Auxiliary answer to Re: SODA gripes (1): The Big One

François Bonnarel francois.bonnarel at astro.unistra.fr
Thu Jan 7 16:26:18 CET 2016


Dear all,
        In my main answer to Markus last tuesday I wrote
>             2 ) the discussion on the solution proposed in Markus' 
> version of the WD promise to be long. I am strongly against some of 
> the involved features. This includes proposing a concurrent technology 
> for describing the domains  as we have allready the description in the 
> Obscore table. This also includes describing the data content  in the 
> DataLink response which should be agnostic to the content of the 
> dataset to which it is relating resources. I will develop this in 
> another email 
I am proceeding to this development here. Sorry to be long, but this is 
a major discussion.
Cheers
François

A ) The upcoming new protocols (DataLink as well as ObsTAP or SIAV2 and 
SODA) are independant but they are part of the same scheme. They could 
have been  merged in a single protocol. Modularity allow some 
flexibility but imposes some precautions in order not to diverge and 
create unconsistencies.

B ) SODA standard Parameters (as well as SIAV2.0 PARAMETERS of which 
they are a sublist) are consistent with  Obscore model fields.

  * ID is related to obs_publisher_did
  * POS is related to s_region
  * BAND is related to em_min and em_max
  * TIME is related to t_min and t_max
  * and POL  is related to pol_states

     The input PARAMETER domain valid for each dataset is exactly given 
by a combination of these Obscore table FIELDS. Let's detail this

  * s_region gives the spatial support of the dataset as an STC
    AStroCoord Area (could be a POLYGON, a CIRCLE, a 2D-INTERVAL (or
    RANGE), etc ...). Any valid value of the SODA POS parameter for a
    given dataset should be a region included in the region specified by
    s_region value.
  * em_min and em_max give the bounds of the dataset on the spectral
    axis. This is exactly the valid domain for a given dataset of the
    SODA BAND parameter.
  * t_min and t_max give the bounds of the data set on the time axis.
    This is exactly the valid domain for a given dataset of the SODA
    TIME parameter.
  * pol_states gives the list of polarization states present in a
    dataset. This is exactly the valid list  of polarization states in
    which the SODA POL parameter can select.

C ) the {link} resource of the DataLink spec is working like a glue 
between datasets and additional resources such as fixed links or 
services applied on a given dataset. It contains external descriptions 
of the links and resources, and of services input PARAMETERS. It should 
not contain description of the dataset themselves which is the work of 
discovery services or accessData or server side processing WEB services 
( as SODA is intended to be), in order to avoid confusion between the 
role of each module in the whole DAL scheme.

D)  As the consequency of the opinions exposed above I have severe 
concerns with Markus approach of the input PARAMETER domain metadata 
issue (see  http://docs.g-vo.org/SODA-r3192.pdf  ,section 6 for his 
views and compare with the same section in the editor WD).
I propose a mechanism which I think is more consistent with what we 
allready have and the general DAL architecture. However I don't wnat to 
push it now in the WD and in the spec, because I Think we have time to 
discuss these matters until the next version of SODA, SIAV2 and 
DataLink. In my first email I tried to convince you that we allready 
have, without that "domain metadata" feature a workable spec to fulfill 
the basic CSP spec.
    The solution is based on the inclusion of "ref" attributes in the 
service descriptor PARAMETER elements for all the standard input 
PARAMETERS. ref to the appropriate Obscore FIELD/PARAMETER or GROUP of 
FIELDS/PARAMETERS. This can be done in the discovery service response, 
or in the response given by the SODA service queried with the unique 
ID="dataset_id" constraint. Let's see how it can work with examples in E 
and F.

E ) Example with the SODA service descriptor in the sia2 discovery response

Query example
http://dalservices.ivoa.net/sia2/query?POS=CIRCLE 2.8425 74.4846 0.1 
&BAND=0.0002&BAND=0.00006&COLLECTION=IRAS-IRIS

excerpt of query example response

<?xml version="1.0" encoding="UTF-8" ?>
<VOTABLE version="1.2" xmlns:xsi = 
"http://www.w3.org/2001/XMLSchema-instance" 
xsi:noNamespaceSchemaLocation = 
"xmlns:http://www.ivoa.net/xml/VOTable-1.2.xsd" >
       <RESOURCE type="meta" utype="adhoc:service" name="this">
           <PARAM name="standardID" datatype="char" arraysize="*" 
value="ivo://ivoa.net/std/SODA#sync-1.0" />
           <PARAM name="accessURL" datatype="char" arraysize="*"  
value="http://example.com/SODA/sync" />
           <GROUP name="inputParams">
                <PARAM name="ID" ucd="meta.id" datatype="char" 
arraysize="*" xtype="ivoident" ref="pdid" />
                <PARAM name="POS" ucd="pos" unit="deg" datatype="char" 
arraysize="*" xtype="polygon" ref="sreg" />
                <PARAM name="BAND" ucd="em" unit="m" datatype="double" 
arraysize="*" xtype="interval" ref="sbound" />
                <PARAM name="TIME" ucd="time" unit="d" datatype="double" 
arraysize="*" xtype="interval" ref="tbound"/>
                <PARAM name="POL" ucd="pol" datatype="char" 
arraysize="*" xtype="Stokes" ref="pstates" />
           </GROUP>
       </RESOURCE>
       <RESOURCE type="results”>
            <INFO name="QUERY_STATUS" value="OK"/>
           <TABLE>
               <GROUP ID="tbound">
                   <FIELDref ref="tmin"/>
                   <FIELDref ref="tmax"/>
              </GROUP>
             <GROUP ID="sbounds">
                  <FIELDref ref="smin"/>
                 <FIELDref ref="smax"/>
             </GROUP>
            <FIELD name="dataproduct_type" ucd="meta.id" 
datatype="char"utype="obscore:ObsDataSet.dataProductType" arraysize="*" />
            <FIELD name="calib_level" ucd="meta.code;obs.calib" 
datatype="int" utype="obscore:ObsDataSet.calibLevel" />
            <FIELD name="obs_collection" datatype="char" ucd="meta.id" 
utype="obscore:DataID.Collection" arraysize="*" />
            <FIELD name="obs_id" ucd="meta.id" datatype="char" 
utype="obscore:DataID.observationID" arraysize="*" />
            <FIELD ID ="pdid" name="obs_publisher_did" 
ucd="meta.ref.url;meta.curation" datatype="char" 
utype="obscore:Curation.PublisherDID" arraysize="*" />
             .......................... (removed fields)
           <FIELD ID="sreg" name="s_region" datatype="char" 
ucd="phys.angArea;obs" 
utype="obscore:Char.SpatialAxis.Coverage.Support.Area" arraysize="*" 
unit="deg" />
             ..... (removed fields)
           <FIELD ID="tmin" name="t_min" datatype="double" 
ucd="time.start;obs.exposure" 
utype="obscore:Char.TimeAxis.Coverage.Bounds.Limits.StartTime" unit="s" />
           <FIELD ID="tmax" name="t_max" datatype="double" 
ucd="time.end;obs.exposure" 
utype="obscore:Char.TimeAxis.Coverage.Bounds.Limits.StopTime" unit="s" 
/>        ......... (removed fields)
           <FIELD ID="smin" name="em_min" datatype="double" 
ucd="em.wl;stat.min" utype=" obscore: 
Char.SpectralAxis.Coverage.Bounds.Limits. LoLimit " unit="m" />
            <FIELD ID="smax" name="em_max" datatype="double" 
ucd="em.wl;stat.max" 
utype="obscore:Char.SpectralAxis.Coverage.Bounds.Limits.HiLimit" unit="m" />
            ..................(removed fields)
            <FIELD name="instrument_name" ucd="meta.id;instr" 
datatype="char" arraysize="*" 
utype="obscore:Provenance.ObsConfig.instrument.name" />

            <DATA>
              <TABLEDATA>
               <TR>
                   <TD>cube</TD>
                   <TD>1</TD>
                   <TD>IRAS-IRIS</TD>
                   <TD>I422B2H0</TD>
<TD>ivo://cds.u-strasbg.fr/IRAS-IRIS/25MU/I422B2H0</TD>
<TD><![CDATA[http://aladix.u-strasbg.fr/cgi-bin/nph-Aladin++dev.cgi?out=image&position=0.000000+80.000000&field=I422B2H0&survey=IRAS-IRIS&color=25MU&mode=view]]></TD>
                   <TD>image/fits</TD>
                   <TD>1600</TD>
                   <TD>I422B2H0</TD>
                   <TD>0.000000 </TD>
                   <TD>80.000000 </TD>
                   <TD>0.5</TD>
                   <TD>POLYGON 30.0 200.0 32.0 200.0 32.0 198.0 30.0 
198.0</TD>
                   <TD></TD>
                   <TD></TD>
                   <TD></TD>
                   <TD>1000</TD>
                   <TD>1.0</TD>
                   <TD>0.20</TD>
                   <TD>0.22</TD>
                   <TD>5.0</TD>
                   <TD></TD>
                   <TD>StokesQ,StokesU,StokesV</TD>
                   <TD>IRAS-IRIS</TD>
                   <TD></TD>
               </TR>
  ...........................


F ) Example with the SODA service descriptor in the SODA  response

Query example to the soda service (with unique  ID=.... filled parameter)
http://dalservices.ivoa.net/soda?ID="ivo://cds/IRAS-IRIS/25MU?...."

The response will be the PARAM description with Domain metadata "à la" 
Obscore
<?xml version="1.0" encoding="UTF-8" ?>
<VOTABLE version="1.2" xmlns:xsi = 
"http://www.w3.org/2001/XMLSchema-instance" 
xsi:noNamespaceSchemaLocation = 
"xmlns:http://www.ivoa.net/xml/VOTable-1.2.xsd" >
       <RESOURCE type="meta" utype="adhoc:service" name="this">
           <PARAM name="standardID" datatype="char" arraysize="*" 
value="ivo://ivoa.net/std/SODA#sync-1.0" />
           <PARAM name="accessURL" datatype="char" arraysize="*" 
value="http://example.com/SODA/sync" />
           <GROUP name="inputParams">
              <PARAM name="ID" ucd="meta.id" datatype="char" 
arraysize="*" xtype="ivoident" value="ivo://cds/IRAS-IRIS/25MU?...." />
              <PARAM name="POS" ucd="pos" unit="deg" datatype="char" 
arraysize="*" xtype="polygon" ref="sreg" />
              <PARAM name="BAND" ucd="em" unit="m" datatype="double" 
arraysize="*" xtype="interval" ref="sbound" />
              <PARAM name="TIME" ucd="time" unit="d" datatype="double" 
arraysize="*" xtype="interval" ref="tbound"/>
              <PARAM name="POL" ucd="pol" datatype="char" arraysize="*" 
xtype="Stokes" ref="pstates" />
          </GROUP>
         <GROUP name="DomainMetadata">
           <GROUP ID="tbounds">
              <FIELDref ref="tmin"/>
              <FIELDref ref="tmax"/>
           </GROUP>
          <GROUP ID="sbounds">
             <FIELDref ref="smin"/>
             <FIELDref ref="smax"/>
           </GROUP>
          <PARAM ID="sreg" name="s_region" datatype="char" 
ucd="phys.angArea;obs" 
utype="obscore:Char.SpatialAxis.Coverage.Support.Area" arraysize="*" 
unit="deg" value="POLYGON 30.0 200.0 32.0 200.0 32.0 198.0 30.0 198.0"/>
          <PARAM ID="tmin" name="t_min" datatype="double" 
ucd="time.start;obs.exposure" 
utype="obscore:Char.TimeAxis.Coverage.Bounds.Limits.StartTime" unit="s" 
value="52300.5"/>
           <PARAM ID="tmax" name="t_max" datatype="double" 
ucd="time.end;obs.exposure" 
utype="obscore:Char.TimeAxis.Coverage.Bounds.Limits.StopTime" unit="s" 
value="52300.6"/>
            <PARAM ID="smin" name="em_min" datatype="double" 
ucd="em.wl;stat.min" utype=" obscore: 
Char.SpectralAxis.Coverage.Bounds.Limits. LoLimit " unit="m" value="0.20"/>
           <PARAM ID="smax" name="em_max" datatype="double" 
ucd="em.wl;stat.max" 
utype="obscore:Char.SpectralAxis.Coverage.Bounds.Limits.HiLimit" 
unit="m" value="0.22"/>
           <PARAM ID="pstates" name="pol_states" datatype="char" 
ucd="meta.code;phys.polarization" 
utype="obscore:Char.PolarizationAxis.stateList" arraysize="*" 
value="StokesQ,StokesU,StokesV"/>
        </GROUP>
     </RESOURCE>



On 05/01/2016 08:51, Markus Demleitner wrote:
> Dear Colleagues,
>
> On Thu, Dec 24, 2015 at 03:40:21PM +0100, François Bonnarel wrote:
>>       This close-to-christmas email to announce that SODA1.0 (previously
>> known as AccessData1.0) WD has been released last monday. See :
>> http://www.ivoa.net/documents/SODA/20151221/index.html
>>        There has been very long discussions among authors and we made some
>> progress in convergence. However there is still points hardly debated. This
>> is my responsability of editor as well as DAL chair to provide now a version
>> which is regarded as insufficient according to some of us but is nonetheless
>> fulfilling the CSP and community basic requirements according to me.
>> Probably the discussion will start very soon on the DAL mailing list.
> Indeed -- there are of order 10 topics on which I'd like to see
> discussion on this draft (there's a list of them at the end of this
> mail).
>
> I'd like to start with the Big One (this is probably also going to be
> the longest mail in this series; please indulge me).  In one
> sentence, it's
>
>    The protocol must be written such that clients can work out what
>    parameter values will probably yield useful results.
>
> This, in my opinion, is really the make-or-break thing, i.e., what
> decides whether what we write will actually be useful as a generic
> access protocol, or whether it will be a source of constant annoyance
> all around[1].
>
> So -- even if you have only marginal interest in SODA, and even if this
> is a long mail, please take a few 10s of minutes to try and make up your
> mind based on the two drafts mentioned below.  You'd have my blessing to
> ignore the remaining SODA discussions if you are so inclined.
>
> The premise above applied to SODA becomes: All parameters (except for
> the oddball POS, which really has a special position; but I'll revisit
> that in a later mail) must be fully declared by the service (including
> VALUES and OPTION elements as appropriate) and be systematically
> discovered for UI/API generation by the client in a SODA exchange.
>
> I've written standards prose for that already that I think is about what
> a standards document can do to mandate such practices (of course, this
> is largely a matter of implementation style, which is hard to regulate).
> It's been in the text in volute rev. 3192 [2]; for your convenience, I
> have built the document as of that revision and put it on
> http://docs.g-vo.org/SODA-r3192.pdf.  The contentious prose starts at
> page 8 -- if you'd be so kind as to read sect. 2.6 ("three-factor
> semantics", 4 pages).
>
> You can comapare with sect. 2.6 as published (the published version is
> in effect volute rev. 3200, in case you'd like to see a diff).  Let me
> again bambi-eye all around and ask everyone with even a remote
> involvement in the cube thing to try and make up their minds and speak
> up, even if this thing appears a bit complicated at first, in particular
> because, in a way, it's really part of datalink and cannot be understood
> without it (I've argued it should really have been part of datalink in
> the first place).
>
> If there's anything we can do to help comprehensability, let us know,
> too.
>
>
> Meanwhile, allow me to once more try to argue why it is so important to
> urge services to provide consistent, dataset-specific metadata and the
> clients to use it in SODA.
>
> SODA is designed to operate on concrete datasets -- you've discovered
> something that looks like it might be interesting, but you're only
> interested in a small part or a particular mogrification of the dataset,
> so your client gets information on the dataset and then figures out what
> to do to retrieve the information relevant to you.  This means that you
> cannot just put in some value into a service parameter and watch what's
> coming out -- you'll almost always get nothing back because the coverage
> of a typical dataset is small and not easily predictable.
>
> The "horror vacui", the dreaded moment in GUIs when an input field is
> displayed and users have no idea what to put there, with SODA therefore
> isn't a minor usability issue, it's a protocol killer.
>
> It has been put forward that clients could infer the domains of the
> parameters (the "good" values) from a previous discovery query (e.g.,
> from SIAv2, they'd know the spatial and spectral coverage).
> Unfortunately, this line of reasoning is flawed in at least to respects:
>
> (1) The results of the discovery query might not be available to the
> client dealing with the SODA descriptor
>
> (2) This technique breaks down with the first custom parameter (is the
> corresponding item in the discovered metadata?  And what does the
> parameter correspond to in the first place?), and that would, again, be
> a killer for SODA's usefulness.
>
> Let me dwell on both points for a little while.
>
> Ad (1).   I expect the most common source for SODA descriptors will be
> Obscore (and it's a CSP-official usecase in case you don't agree).
> There, the access URL for cubes and other large datasets won't be the
> dataset itself, because you don't want people blindly pulling several
> 100s of gigabytes (or just one gigabyte, really).  Instead, you return a
> datalink document, which contains the SODA descriptor.  We at Heidelberg
> already occasonally do that, the CADC has datalink documents throughout
> IIRC (although I think they don't have custom SODA descriptors yet).
>
> To query Obscore, people typically use TAP, and their queries  will
> fairly typcially not be just "select * from" but very possibly rather
> something like "select access_url, target_name from ivoa.obscore
> join...."  Hence, a client doesn't have access to the obscore metadata,
> and even if it had, it might have a hard time recognising it in the
> possibly wide result tuples coming back from the database.
>
> Another scenario in which dataset metadata possibly obtained during
> discovery would get lost is when sending the datalink document (URL)
> through SAMP.  Whether we like it or not, our users love SAMP more than
> anything else we've come up with so far, and telling them SODA doesn't
> play with SAMP isn't going to make SODA popular.
>
> Ad (2).  The dataset operations that data providers will want to enable
> through SODA are essentially endless -- rebinning, renormalisation,
> format conversion, "logical" cutouts (e.g., on selected extensions
> only), etc.  Making SODA something that (to some extent) works with a
> select set of standard parameters but fails (in the sense of: client
> behaviour is unpredictable) as soon as a service needs a bit more is
> going to render it almost useless, and data providers will keep doing
> things through custom web pages.  It's the situation we have with SSAP;
> although that, as a discovery protocol, at least can limp along to some
> extent.  SODA, as an access protocol, wouldn't even limp.
>
> So, we need to say: "A well-behaved SODA client will do X any Y and
> *not* ignore Z" to give data providers the confidence that independent
> of the client their users choose they still see whatever operations they
> consider important.  That's what I've tried in rev. 3192 section 2.6.
>
> As an additional indication that full metadata in the SODA descriptor is
> a very good idea, let me mention in passing that
>
> (3) it would enable usable interfaces in stop-gap XSLT-based datalink
> interfaces (as discussed in Sydney,
> http://wiki.ivoa.net/internal/IVOA/InteropOct2015DAL/datalink-xslt.pdf)
>
>
> Just so nobody can't say later I didn't warn them: Yes, this means that
> the datalink document that contains the SODA descriptor has to be
> tailored for each dataset.  But that's really not a big deal, because
> the datalink documents themselves vary with dataset (well, typically) --
> previews, plots, provenance, whatever all depend on the dataset.
> Dropping in the limits into the SODA descriptor in addition at least for
> me hasn't been a major additional implementation burden.
>
>
> That's it for my first SODA gripe, and thanks for making it here.  I
> plan to have, roughly weekly, additional SODA gripes, one after the
> other to allow productive discussions on each point.  To give you an
> idea what I have up my sleeve here's a tentative programme:
>
> (2) Spatial coverage discovery and the RA and DEC parameters
> (3) Pixel coutouts: PIXEL_n
> (4) Mandated multiplicities considered harmful
> (5) Behaviour for no-ID queries?  For queries with only ID?
> (6) No gratuitous xtypes
> (7) POS doesn't have an xtype
> (8) Examples stuff: example example, and perhaps a dl-id term?
>
> If this sounds scary, don't worry -- this kind of thing has IMHO worked
> great for datalink.
>
> Cheers,
>
>              Markus
>
>
> [1] Incidentally, it also coincides with my conviction that in protocol
> development in the VO, we should be thinking much more than in the past
> from the client perspective, even if most of the protocol developers sit
> on the server side.
>
> [2] To get the source from the repository, use something like
>
> svn co -r 3192 https://volute.g-vo.org/svn/trunk/projects/dal/SODA
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20160107/24eceb2e/attachment-0001.html>


More information about the dal mailing list