WD-SIA-2.0, going forward

Douglas Tody dtody at nrao.edu
Tue Jan 7 12:18:05 PST 2014


On Tue, 7 Jan 2014, Patrick Dowler wrote:
>
> On 07/01/14 10:21, Douglas Tody wrote:
>> My summary at this point would be that we have agreement on adding
>> additional query parameters, and on an integrated SIA service, but one
>> which is composed of separate capabilities described in more than one
>> document for logistical reasons
>
> If by "integrated SIA service" you mean that the implementer supports 
> multiple capabilities in a single service, then yes we agree: an implementer 
> can do that. The multiple capabilities could include:
>
> * query
> * metadata
> * datalink
> * access data
>
> for which the current plan is 3 documents. I expect the AccessData document 
> to undergo several revisions as we include more access features and it may 
> also be the right place to concentrate on self-describing custom services, in 
> which case it could be somewhat heavy/abstract reading (we have to keep that 
> reasonable to support take-up).
>
> And by integrated, I also include the flexibility feature that an implementer 
> can implement a single web service resource and just use different REQUEST 
> values *or* they can implement different services (as described in a previous 
> message).

Ok, good.  Of these, only query, metadata, and accessData are
image-specific.  Datalink is a generic mechanism, although it could be
used to invoke image-specific methods.  So I think Datalink is just
something that SIA (and other data access services) uses, but this is
not an issue since we can still integrate the image-specific
functionality in a service implementation.

>> The main issue still uncertain is what I call AVDG in the email below
>> (automated virtual data generation via queryData).  To me this is an
>> essential capability for a range of image analysis use cases, to be able
>> to easily get image data optimized for analysis.  It can however be an
>> optional advanced capability.  Without this, SIA queryData is just a
>> static thing, offering little value over just using ObsTAP.  Thoughts?
>
> AVDG = automatic virtual data generation (I removed some of the original 
> message)
>
> The current minimal draft(s) support discovery via SIAv2 query or ObsTAP, 
> both of which can provide direct download or send the caller to a datalink 
> service to do multiple downloads and/or service invocations.
> For a service invocation to an AccessData capability, the caller could 
> perform the ROI cutouts from a discovered dataset. So some of that generates 
> new data (maybe on-the-fly or async)
>
> I don't see anything obvious that prevents a provider from returning records 
> (discovered datasets) for things they can create rather than have in hand, so 
> nothing so far stops a provider from including records for virtual data in 
> the results of a query, with URLs to "execute" the generation (details TBD).
>
> So lets say one performs SIAv2 queries and finds a mix of archival data that 
> overlaps the ROI and virtual data that would exactly match the ROI (from one 
> or more services). For the archival data, one could use datalink and find out 
> that accessdata service is available that can extract the exact ROI -- so 
> that data is usable. For some other archival data, maybe the only available 
> option is download of the entire dataset so that data is not usable for their 
> purpose. This can be determined with the existing features; it would 
> admittedly not be scalable if too much data had to be discarded at such a 
> later stage.
>
> So, the question is this: Given that records for virtual data can be included 
> already, is AVDG also just an optimisation of this decision making and 
> filtering of discovered records? By just, I don't imply anything about the 
> importance -- only that the same result could be achieved. We definitely need 
> to make usage practical and scalable once we define the actual functionality, 
> but optimisations tend to couple things more tightly. You will recall that 
> one of the diagrams from the last closing plenary (and the WD) shows some 
> possible "optimisation" links just like these...

The main thing missing at this point is a "mode" parameter of some sort,
to give the client control over automated virtual data generation.  Also
possibly some clarity on the use of POS to define the ROI for image
generation, and a description of the AVDG capability at the registry /
VOSI level so that a client can determine if a service has this
capability.

For mode, what I had before was "archival" (whole images), "cutout" (do
not interpolate or otherwise recompute pixels), and "match" (do
everthing the service can do to match the ideal image, e.g.,
reprojection, mosaicing, etc.).

We don't want to just always do AVDG as it is expensive and will slow
down the query, so a mode parameter or some such is needed to enable the
feature.  Whether or not new pixels are computed is critical or
irrelevant depending upon the client application, so control is needed.

Other than that we appear to have the critical elements, since virtual
images can be described in the query response and the provided acref can
point to AccessData to generate the image.  DataID.CreationType, which
we already have in the data model, can tell whether the described image
is archival, a cutout, etc.

Most of this logic applies to Spectra and TimeSeries as well, and
would want to be included in a future version of these services.

 	- Doug


More information about the dal mailing list