SODA gripes (1): The Big One

Fri Jan 22 15:19:54 CET 2016

Hello,

As understand the problem is to know how to get the meta-data of a given dataset. These meta-data are requested to get the value 
ranges of the SODA query parameters.

The case mostly discussed here is this where a SODA response is embedded within a DATALINK as a custom service.
One point about which everyone agrees is that the parameter ranges must be given for all parameters of all datasets contained in 
that DL response.
That is unfortunately not possible with the actual DL service descriptors where only one range can be given for one parameter.
There are 3 possibilities to sort this out
A) Restricting DL response to one single dataset:
     TO LATE
B) Changing the the schema of the service descriptors to support multiple ranges for each parameter with a ref mechanism
     That means moving toward DataLink 1.x : Soda's trouble is enough for today
C) Duplicating the service descriptors and declaring one resource per dataset
     Could work with but messy VOTables.

No one of these solutions (especially #3) looks good to me for 3 reasons:
- Copying the metadata in a Datalink response is very confusing. That should be specify at DataLink level not for a specific 
custom service  at least (see#2)
- We can not say that 2 services having the same IVOID/name/URL are different just because there their parameters space differ. 
Thus duplicating the service descriptor resources is rather a workaround
- What about a SODA service invoked from another context that DataLink (e.g. standalone) ?

These thoughts lead me to support the James's proposal: Adding to SODA an endpoint returning the meta-data
of a given dataset.
That makes SODA working in any context as a self-described service.
One can object that this requires a 2 steps query but that is not a matter. It look natural to first know about a dataset before 
to run some processing on it. That is reachable for client not that smart.

Cheers
LM

Le 19/01/2016 11:35, James.Dempsey at csiro.au a écrit :
> Hi Markus,
>
> I have a suggestion, driven by the idea that SODA params may include some not represented in the obscore data model.
>
> Could a ranges endpoint be added to the SODA interface (perhaps within an {async} resource) which could provide the valid ranges for each param for the ids provided so far?
>
> This would likely work better if the service provides the optional UWS ability to add parameters after the initial POST call to construct an async job.
>
> An alternative would be a separate ranges resource which takes one or more IDs and returns the parameter ranges.
>
> The advantage I see with this is that it doesn't push knowledge of how SODA works with specific data products outside the SODA service itself, but rather encapsulates that knowledge in the service.
>
> Cheers,
> James Dempsey
>
> ________________________________________
> From: dal-bounces at ivoa.net [dal-bounces at ivoa.net] on behalf of Markus Demleitner [msdemlei at ari.uni-heidelberg.de]
> Sent: Tuesday, 19 January 2016 9:09 PM
> To: dal at ivoa.net
> Subject: Re: SODA gripes (1): The Big One
>
> Dear Colleagues,
>
> Given this has been a discussion exclusively between the current
> authors so far, I'd propose to delay some definite decision on "The
> Big One" until a few more people had a chance to wrap their heads
> around how SODA is intended to work.
>
> Let me nevertheless respond to some of the new points that Pat and
> François have made -- there's a TD;DR at the foot of this mail.
>
> On Fri, Jan 15, 2016 at 08:48:54AM -0800, Patrick Dowler wrote:
>> I would like to add (remind) that the evolution plan includes a {metadata}
>> capability that we nominally said would be part of SIA-2.1 but since it is
>> another capability it could be defined there or in a new spec or in another
>> spec (eg SODA-1.1). The {metadata} capability is intended to allow clients
>> to get the necessary metadata for a single dataset (ID=...) so they can
>> figure out how to call the SODA service and take advantage of all the
>> features offered.
>
> Having this additional endpoint would indeed solve some of the
> problems I see with the current draft.  However, I can only see
> disadvantages wrt simply giving proper parameter metadata:
>
> * much more complicated (e.g., linking params with pieces of
>    metadata, parsing and  representing the metadata...)
> * requires an extra request per dataset
> * doesn't help with parameters not covered by the data model in
>    question
> * [the big one here]: Only works if there is an appropriate data
>    model in the first place.  Experience in the VO tells me that that
>    is a very big if.
>
> And I cannot see a single advantage over proper parameter metadata
> generation, except perhaps:
>
>> Now, that general usage pattern (make a remote call to get metadata) is
>> nice an clean but it isn't necessarily optimal if you want to process many
>
> We-eell, I could claim that proper definition of the parameters in an
> RPC is nice and clean, too (and not doing it is mean and dirty), so
> I'm not sure I'd count that as an advantage of your scheme.
>
>> things the same way. I can understand Markus' idea to define domain
>> metadata inline in a SODA service descriptor but it looks a lot like an
>> optimisation to me. I' not saying it isn't useful/necessary to optimise,
>> but I do not think we should try to do that without having tackled the
>> general problem.
>
> Again, rev 3192 is not (really) trying to define the dataset itself.  I
> am convninced the latter is a very hard problem, and one we won't
> solve in full for a long time to come.
>
> It is about defining *parameter metadata*, which has some
> relationship to dataset metadata in general, but that relationship is
> neither trivial nor easily expressible.
>
> But even if we had some way to define that relationship: relying on
> a full description of datasets would mean SODA wouldn't work outside
> of a small niche for a long, long time.
>
> So even if you think proper parameter metadata is a (premature?)
> optimisation (I don't), I claim it's unavoidable, and it's certainly
> "good enough".  Plus, I've still not heard an actual argument against
> it that's actually rooted in technology (rather than philosophy):
> What becomes more difficult, less robust, less desirable with domain
> declarations on the parameters compared  to a solution where you get
> the metadata from somewhere else and then do some magic inference of
> the domain?
>
>
> While I'm writing, and to avoid another deluge of mails, I'll briefly
> comment on François' mails:
>
> Let me start with the question of using PARAM/@ref to link params and
> metadata items:
>
> On Fri, Jan 15, 2016 at 05:42:40PM +0100, François Bonnarel wrote:
>>       - However, the feature you point out in DataLink is not yet used by
>> current version of protocols except for ID, which is fully consistent with
>> the solution I have drawn. So we could imagine modify slightly the DataLink
>> text in next version, if we admit the "ref" mechanism.
>
> That is not true -- clients are expected to collect the values of all
> PARAMs with @ref.  I didn't like that requirement myself, and I'm not
> sure client authors have picked it up, but it's there, and changing
> it after 1.0 would IMHO need a very strong case.
>
> Which this is not, for instance, because it still doesn't solve
> parameter metadata for parameters not in the dataset metadata (e.g.,
> picking FITS extensions, rebinning, ...). In general:
>
>> What I really want to avoid is having the dataset limits with  the OBscore
>> structure in one context and the same dataset limits with the <MIN><MAX>
>> structure and absolutly no linkage between these two ways of providing the
>> same concept. And what will happen  if people would occur to provide
>
> Again, they *do not* provide the same concepts.  One is dataset
> properties, the other properties of the pair of (service, dataset).
> There's any number of things that can happen to the dataset
> properties, even plain ones like a wavelength range -- perhaps I
> won't let you cut out near the ends of my spectrum?  Perhaps there's
> additional pixels for calibration that I can show you in a datalink
> parameter?  And again, there's a wealth of parameters not even
> represented in dataset metadata.
>
> These are two different things.  Conceptually.  Therefore, there is
> no repetition, and trying to make the different things look the same
> because there are a few cases it it *seems* they are the same is
> going to make the protocol cumbersome,  complicated and unflexible --
> something rooted in a faulty theory will in general be painful.
>
> Then, on whether proper parameter metadata is required in version
> 1.0:
>
> On Fri, Jan 15, 2016 at 05:07:12PM +0100, François Bonnarel wrote:
>> If after discussion and implementaion people want teh <MIN><MAX> (or the
>> latrenative solution) it would be possible to add them without discarding
>> old services which will only MISS something usefull (and the same for a
>
> EXACTLY my point: This is not about services, this is about clients.
> The clients written against the editor draft won't do anything useful
> with the services that would let them do useful things.  I know I'm
> sounding like a broken record, but we simply MUST design our
> standards much more from the client perspective; client uptake is
> what makes or breaks standards, what makes or breaks the VO.
>
> So, we have to make our design such that 1.0 clients will be able to
> usefully work with all 1.x services.  As they should.
>
> Then, when I was talking about retrieving SODA descriptors from
> datalink documents:
>
> On Fri, Jan 15, 2016 at 04:56:45PM +0100, François Bonnarel wrote:
>> I don't understand this. From the DataLink and SIAV2 specs there is really
>> two different ways you can be driven from discovery response to SODA
>> interface.
>>       One is the one you describe and which CADC is indeed using. the acref
>> field in the Obscore table contain the URL of the {link} table. In that case
>> the format field is marked as "DataLink". But it's not "typical". It is just
>> one of the two ways.
>
> ...and both need to work.  Which means you need to be able to derive
> parameter domains from the datalink document.  If you grant that, the
> question is: do we want to invent a way to embed dataset metadata
> into datalink or perhaps wait until {metadata} comes around and then
> *still* hope someone finds a reliable way to do that derivation (as I
> said, I'm convinced there is none)?  Or do we just do the simple
> thing, which is provide useful parameter metadata in the first place?
>
> Incidentally, as the guy that did the design it I still feel entitled to
> say the DAL-attached descriptor was designed for a few special
> applications, and the general case is a per-dataset descriptor.  But
> ok, that's personal feeling, and has no impact on whether or  not
> per-dataset needs to werk.
>
> Then, on us trying to understand each other's confusion:
>
> On Thu, Jan 14, 2016 at 10:17:10AM +0100, François Bonnarel wrote:
>> On the other side I think it would be an error to put this domain metadata
>> in the {link} resource response. (what you call the "Datalink document"). It
>> will require several "SODA service descriptor" sections if we have several
>> datasets and could be much more complex if we add other kind of services
>> (future Standars or custom services). It could even become a mess if we have
>> several services on several datasets
>
> I've said before that I was skeptical about allowing several datasets
> per datalink document from the start, and since my XSLT-over-datalink
> experiments I'm now convinced we shouldn't have done it, but be that
> as it may, yes, you will have several descriptors per respose
> document.  I see no problem with this.
>
> Conceptually, the tuple (SODA service, dataset) is quite similar to
> the tuple (SSA standard, data collection): Since a SODA service's
> parameters can change with the data set (e.g., POL might be supported
> only for a few datasets served through a given service) much like an
> SSA service will have different parameters depending on what spectra
> are in there, these *are* different services, and you're doing the
> clients a big favour if you don't try to hide this.
>
> Try drawing up the logic a client would have to go through if you
> were to make your worst-case scenario (multiple services per dataset,
> multiple datasets per service) implicit (leaving aside the questions
> just how that would look like).  Uh...
>
>
>   _____ _       ____  ____
> |_   _| |    _|  _ \|  _ \
>    | | | |   (_) | | | |_) |
>    | | | |___ _| |_| |  _ <
>    |_| |_____( )____/|_| \_\
>              |/
>
> The way I see things, I claim the editor draft cannot work for many
> important use cases because it relies on some implicit relationship
> between service parameters and dataset properties, and there's no
> realistic hope to make this relationship, or even the dataset
> properties themselves, explicit in an interoperable fashion in the
> next couple of years.  Hence, we should simply do the straightforward
> and easy thing: proper parameter metadata generation.
>
> My colleagues believe some of these unfulfilled use cases are not
> important or not within our remit, and anyway the relationship between
> dataset and parameter metadata is either trivial or will at least be
> interoperably expressable in the near future.
>
> Since I don't see how to reach a compromise here, I propose to
> revisit the question later, when there's a wider community
> understanding of the issues involved, and perhaps someone else has
> started a SODA client, too.  And meanwhile to turn to some other
> warts of the current draft, which is what I'll do towmorrow.  Ok?
>
>     -- Markus
>
> PS: Hints on how to engage the wider community, in parrticular from
> that wider community, are welcome.
>

-- 
jesuischarlie

Laurent Michel
SSC XMM-Newton
Tél : +33 (0)3 68 85 24 37
Fax : +33 (0)3 )3 68 85 24 32
laurent.michel at astro.unistra.fr <mailto:laurent.michel at astro.unistra.fr>
Université de Strasbourg <http://www.unistra.fr>
Observatoire Astronomique
11 Rue de l'Université
F - 67200 Strasbourg
http://amwdb.u-strasbg.fr/HighEnergy/spip.php?rubrique34