WD-SIA-2.0, going forward

Tue Jan 7 12:06:53 PST 2014

This should really be on the mailing list, so I am posting there as well.

On 07/01/14 10:21, Douglas Tody wrote:
> My summary at this point would be that we have agreement on adding
> additional query parameters, and on an integrated SIA service, but one
> which is composed of separate capabilities described in more than one
> document for logistical reasons

If by "integrated SIA service" you mean that the implementer supports 
multiple capabilities in a single service, then yes we agree: an 
implementer can do that. The multiple capabilities could include:

* query
* metadata
* datalink
* access data

for which the current plan is 3 documents. I expect the AccessData 
document to undergo several revisions as we include more access features 
and it may also be the right place to concentrate on self-describing 
custom services, in which case it could be somewhat heavy/abstract 
reading (we have to keep that reasonable to support take-up).

And by integrated, I also include the flexibility feature that an 
implementer can implement a single web service resource and just use 
different REQUEST values *or* they can implement different services (as 
described in a previous message).

> The main issue still uncertain is what I call AVDG in the email below
> (automated virtual data generation via queryData).  To me this is an
> essential capability for a range of image analysis use cases, to be able
> to easily get image data optimized for analysis.  It can however be an
> optional advanced capability.  Without this, SIA queryData is just a
> static thing, offering little value over just using ObsTAP.  Thoughts?

AVDG = automatic virtual data generation (I removed some of the original 
message)

The current minimal draft(s) support discovery via SIAv2 query or 
ObsTAP, both of which can provide direct download or send the caller to 
a datalink service to do multiple downloads and/or service invocations.
For a service invocation to an AccessData capability, the caller could 
perform the ROI cutouts from a discovered dataset. So some of that 
generates new data (maybe on-the-fly or async)

I don't see anything obvious that prevents a provider from returning 
records (discovered datasets) for things they can create rather than 
have in hand, so nothing so far stops a provider from including records 
for virtual data in the results of a query, with URLs to "execute" the 
generation (details TBD).

So lets say one performs SIAv2 queries and finds a mix of archival data 
that overlaps the ROI and virtual data that would exactly match the ROI 
(from one or more services). For the archival data, one could use 
datalink and find out that accessdata service is available that can 
extract the exact ROI -- so that data is usable. For some other archival 
data, maybe the only available option is download of the entire dataset 
so that data is not usable for their purpose. This can be determined 
with the existing features; it would admittedly not be scalable if too 
much data had to be discarded at such a later stage.

So, the question is this: Given that records for virtual data can be 
included already, is AVDG also just an optimisation of this decision 
making and filtering of discovered records? By just, I don't imply 
anything about the importance -- only that the same result could be 
achieved. We definitely need to make usage practical and scalable once 
we define the actual functionality, but optimisations tend to couple 
things more tightly. You will recall that one of the diagrams from the 
last closing plenary (and the WD) shows some possible "optimisation" 
links just like these...

-- 

Patrick Dowler
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9E 2M7

250-363-0044 (office) 250-363-0045 (fax)