On the relationship of DAL standard input PARAMETERS and IVOA Data models

Mon Feb 8 10:43:09 CET 2016

Dear lists,

I'm a bit reluctant to drag DM into this SODA thing, but there's some
DM material below, so perhaps it's a reasonable thing.  I'd still
suggest to continue discussion, if it turns out to be necessary, on
DAL exclusively.

On Tue, Feb 02, 2016 at 07:09:37PM +0100, François Bonnarel wrote:
>     This email (with some ideas in it partially discussed with Laurent
> Michel last week) is motivated by the discussion on the relationship between
> SODA input parameter Domain metadata and some of the Obscore attributes of
> course.  But the scope is neither limited to SODA nor to Obscore Data Model.
> SSA implemented a view of the Spectrum data model while SIAV2.1 is supposed
> to implement the Cube and dataset models in a near future.

This is more an aside, but since this went via DM: Since your
proposal is built on utypes that, if they are anything, mean a
flattening of the data model, it is highly unlikely that it would
work for Cubes, the metadata of which very certainly cannot be
sensibly flattened out.

Note, however, that that is not the main reason why I think the
proposal will not work.

>      In the case  of Data discovery services like SIAV2.0, the Obscore data
> model is not only present in the query response as a description of the
> available dataset, it is also underlying the discovery process itself. The
> process of finding out datasets included in a given region of the parameter
> space is actually the process of looking for datasets for which the support
> or bounds are included in the ROI on all considered axes.
> 
>      Cuting-out or selecting some values on some axes, like SODA is supposed
> to do, is actually forcing the generation of sub-datasets whose bounds or
> support are matching the INput parameter values as much as possible.

For a very straightforward cutout, this might be a reasonable model,
but it is patently clear that SODA will need to do much more -- with
our spectra, for instance, we're offering changes in FLUXCALIB and
let clients select output formats (which is really much more sensible
than using SSA's FORMAT parameter, in particular if you're supporting
five or six output formats).  These are no cutouts, and there is no
indication of possible values for these parameters in the discovered
dataset metadata.  Nor, actually, anywhere else but in the metadata
produced by the SODA service *on a particular dataset* -- to see why
I've made that last qualification, consider the case of a service
that offers spectra calibrated to the continuum.  There might be
datasets for which no solution for the continuum has been found --
and for those, it'd be highly preferable if the service would make
clear that it cannot do FLUXCALIB=NORMALIZED.

>      An important point is that the direct relationship between the SODA
> INPUT parameters and the Obscore model attributes is not for the archived
> discovered dataset it is with virtual dataset. We are actually describing
> the dataset which will come out of the SODA service.

Let me again stress that in general there is no direct relationship
between SODA parameters and Obscore attributes, least of all in the
sense of subsetting a range.  In addition to the parameters mentioned
in my last paragraphy, you're actually including some of those in
your use cases: rebinning, for instance, or summation along an axis.
So, perhaps unfortunately, this premise is wrong.

>      The relationship between standard parameters and the underlying model
> also explain why it was important to keep consistency between SIAV2 and SODA
> (previously AccessData) INPUT parameters. Same parameter, same model
> concept. But operated in a different way!

If follows from what I just wrote that it is nowhere near that easy.
The relationship between a dataset property and the properties of the
SODA parameters is IMHO far to complex to be modelled explicitely.
Try it, as a proof of concept, for the use cases you have in the SODA
document itself.  You'll see it's a nightmare, and whoever complained
about PDL being too complex for them will be in for quite a few
sleepless nights.

> If we were operating only a SODA service, while another institution will
> operate the Data discovery service, we are able to provide the correct bound
> values for the whole archived dataset and can inform this other institution
> to adopt these correct values in their discovery service. If we have SODA,
> whe also have the dataset home and we are the source of "true" dataset
> metadata.

Well, I don't doubt that dataset metadata propagation is feasible on
the server side.  What I'm worrying about is the client side.  Even
if dataset metadata were useful for the operation of a SODA service
(and I still maintain it's not in general), there'd still be the
problem of how to transport that metadata along in SAMP or even
within an application.

> In the case of resampling or regriding of the dataset or other Server-side
> accessdata recomputing (some functionalities which will come in SODA1.1)
> some other standard parameters will be in charge : spatial, spectral or time
> resolution, WCS-Mapping. All will be relying on other Obscore or

How does knowning the spectral resolution of a dataset help figuring
out what kind of regridding the SODA service actually supports?

>       For the current standard parameters (POS, BAND,TIME, POL), it would be
> usefull to relate in some way the input PARAMETERS   to the corresponding
> Obscore model attributes. A month ago, I have proposed a "ref/ID" mechanism,
> which has been considered by Markus on this list as too heavy. Maybe it is.

No, I wasn't worried about it being to heavy.  I said it 

(a) didn't solve the problem (essentially with the same arguments as
here, which I think stand undisputed)

(b) overrides specified behaviour (the meaning of PARAM/@ref is
already defined by datalink in a different manner)

(c) cannot work in what I maintain is the primary way of deployment
(SODA descriptor in a datalink document) because there's nothing you
can reference.

> So let's try another solution.
> One could also imagine to give the Obscore model attribute utype as a the
> [...]
> utype PARAM attribute in the SODA service descriptor ,like this
>                <PARAM name="POS" ucd="pos" unit="deg" datatype="char"
> arraysize="*" utype="obscore:Char.SpatialAxis.Coverage.Support.Area"   />

On utype see above; of my concerns with the @ref proposal only (b) is
addressed, so (a) and (c) still stand.

> CUSTOM parameters :
>     In this general  view, CUSTOM Parameters are Service parameters which
> are not relying on an IVOA datamodel such as Obscore, spectrum or soon
> Cube/ Dataset. In that case they will have no corresponding utype and will
> not be associated  with dataset metadata.
>      Actually if there is no utype in the PARAM definition, client software
> should discover the domain metadata using <VALUES> <MIN/> <MAX/> </VALUES> .
> So the solution Markus is proposing in its version of the draft
> (http://docs.g-vo.org/SODA-r3192.pdf).

So, you're multiplying the implementation effort for the client in
that is has to support two separate ways of figuring out parameter
metadata, one of them involving reference resolution with something
other than ID/ref, while the service now has to bother not only with
MIN/MAX and ID references but additionally with keeping utype
references and consequently utype uniqueness (for extra fun, imagine
this in a TAP response)?  For what advantage, preferably technical
advantage?

I may have sounded a little exasperated because all this complication
is being introduced while there's a perfectly feasible, proven (it's
been used for four years or so between Splat and DaCHS instances at
several places) and, I claim, theoretically sound way:

   Have a SODA service properly declare its parameter metadata

[as laid out it
http://mail.ivoa.net/pipermail/dal/2016-January/007232.html]

This works, doesn't depend on defining data models -- and frankly,
given my experience with STC[1], I feel that's a *big* advantage --,
is theoretically simple (a service is a function of the dataset
returning a function with 0..n arguments that returns a bytestream)
and I've not yet heard a technical argument against it.  The
philosophical argument ("it's repeating stuff") I've disputed saying
that dataset metadata and service metadata are two different things
that only in very few and simple cases happen to look similar, and so
far nobody has contradicted that.

If all this doesn't convince you, I can only repeat: Write a client
against it (and then a validator).  Clients are the scarce commodity
in the VO.  Let's all make sure it's a joy to write them and never
pass another standard that's not been proven in some way in a client.

Cheers,

          Markus

[1] I've put out http://www.ivoa.net/documents/Notes/VOTableSTC/ in
2010, and there've been previous attempts at doing proper, DM-based
STC annotation.  Net effect 6 years later: you be the judge.