[data-cube] Comparing protocols

Douglas Tody dtody at nrao.edu
Mon Oct 28 19:52:09 PDT 2013


There are two main aspects to cube data access and analysis: simple data
discovery and retrieval, and interactive analysis of large cubes.  Both
are important, however the discussion below addresses only simple
discovery and retrieval.

The TAP/ObsTAP and SIAV2 approaches have different strengths for simple
discovery and retrieval of whole images/cubes.  TAP is data type
agnostic and better for general (non image-specific) archive data
browsing and discovery, which is adequate for simple discovery and
retrieval of whole image datasets.  SIAV2 provides a richer Image data
model and more powerful discovery for image datasets.  The parameter
interface is simpler and more powerful for simple discovery queries, but
ADQL is more powerful for complex adhoc queries.  As Bob notes, a small
data provider that only needs to put up an image data collection or two
is more easily served via SIAV2, which does it all with a single service
optimized for image data.  Larger providers can more easily deal with
the complexities of TAP/ObsTAP - and should probably support both TAP
and SIAV2 if they have the resources to do so.

However what is being missed here is the requirement for direct access
to large data cubes for interactive analysis.  Very large cubes, i.e.,
tens of GB, or up to the Terabyte scale or larger, are impractical to
download, or even deal with locally at all by a client unless they have
an unusual compute capability locally.  Practical analysis requires
remote access to the dataset.  Only SIAV2 is capable of providing this
capability.  It also provides an enhanced, image-specific data discovery
capability, hence covering the entire range of capabilities required for
image data.  Simply stated, SIAV2 (or some comparable image-specific
interface) is required to support the potentially very large cubes
provided by modern instruments, particular the very large cubes coming
soon from radio instruments.

We really need both - TAP for general archive browsing and data queries,
possibly augmented by a generic AccessData / cutout capability for
simple cutouts.  SIAV2 for enhanced discovery for image-only data
collections, but most notably for direct access to remote image datasets
for distributed/scalable image analysis.  That said, 2D images of modest
size still dominate, and a simple image discovery/access protocol
building upon the very successful SIAV1 will enhance community take-up.
SIAV2 provides both a simple protocol for basic discovery and retrieval,
plus capabilities for advanced distributed data access to arbitrarily
large image datasets.

 	- Doug



On Tue, 29 Oct 2013, Robert J. Hanisch wrote:

> Thus far it appears to be equally easy to build GUIs for either of the
> protocols being discussed for SIA V2.  CADC and JVO have done it using
> the ObsTAP/Datalink approach, VAO has done it with the SIAP V2 approach.
>   Arnold and Jonathan's points are certainly relevant, but in the case of
> SIA V2, the bigger impact is on data providers.  Do they have SIA V1
> services that can be fairly easily upgraded to V2?  Do they implement
> ObsCore and ObsTAP?
> 
> For these protocols to be successful they need significant take-up on the
> data provider side.  Otherwise there is little motivation to implement
> clients, and the ease of use for building clients becomes a red herring.
>  In any case, it seems to be a wash, client-side.
> 
> Bob
> 
> From: <Tedds>, "Jonathan A. (Dr.)" <jat26 at leicester.ac.uk>
> Date: Sunday, 29 September 2013 4:40 AM
> To: Arnold Rots <arots at cfa.harvard.edu>
> Cc: data-cube <data-cube at usvao.org>, "dm at ivoa.net" <dm at ivoa.net>, DAL
> mailing list <dal at ivoa.net>
> Subject: Re: [data-cube] Comparing protocols
>
>       Anyone working as and with end users would have to second
>       these excellent points made by Arnold. Rather like the
>       initial Research Data Alliance Working Groups, which I have
>       more involvement with than IVOA these days, it is being
>       pointed out that an emphasis on technical solutions alone and
>       in isolation will not have the desired effect. The difficult
>       balance is between catering for the diversity of end user
>       requirements while at the same time actually getting
>       something done. The RDA will tend to emphasise the latter.
>       IVOA has been successful at doing likewise, albeit it's never
>       a quick process! Bioscientists appear to be presiding over a
>       Darwinian evolution of overlapping standard schemes through
>       their much higher numbers. RDA certainly presents an
>       opportunity for IVOA to look at other disciplines and compare
>       approaches so it was good to see it represented at the 2nd
>       RDA Plenary a couple of weeks ago. A little more involvement
>       in Interest and Working Groups would be of mutual benefit.
> 
> Cheers,
> Jonathan
> 
> On 28 Sep 2013, at 20:41, "Arnold Rots" <arots at cfa.harvard.edu>
> wrote:
>
>       With apologies if you receive multiple copies of this
>       message.
> 
> It occurred to me that the discussion we had yesterday on
> the relative merits of SIAP, ObsTAP, and DataLink only had
> moderate relevancy and lost sight of the bigger picture.
> 
> The problem is that within the IVOA people and groups have
> been designing protocols that make sense within their own
> context, but very little attention has been paid to the
> end-to--end
> use case scenarios - with the emphasis on "end-to-end."
> 
> The question of how flexible or easy to use a particular
> interface
> protocol is really needs to be assessed in the context of the
> full
> scenario that real-life users follow.
> I must admit that it is not clear to me how either
> ObsTAP/DataLink
> or SIAP fit into the various scenarios and what their effect
> would be
> on the the total number of steps that users have to go
> through in
> order to get their data.
> And the issue is, of course, that there is no single use case
> scenario.
> 
> There are users who will simply be interested in retrieving
> their, let's
> say, ALMA observations. How easy and how many steps does it
> take
> to get where they want to get, using the different protocols?
> Then there are users who will get there through the VAO
> Portal.
> And those who enter through Aladdin, and so on.
> In how many scenarios do we envision users to start querying
> and
> retrieving data through IVOA protocols and how well or how
> poorly
> does that work depending on which protocol is chosen?
> And how does that depend on the users' objectives?
> I would like to see flow diagrams for the different cases to
> get a better
> sense of the ramifications of choosing one protocol over
> another
> in the context of the larger picture of the full end-to-end
> scenario.
> 
> Just quibbling over the relative merits of protocols in the
> limited
> context of their own characteristics does not address the
> real issues.
> We really need to focus on the users' perspective, minimizing
> steps and increasing protocols' ability to support intuitive
> use.
> If we don't do that, we relegate ourselves to irrelevancy.
> To complicate the issue further, it is, of course, not the
> user-friendliness
> of the protocol per se that matters. What really counts is
> the interface
> through which the users use the protocols.
> Which protocols make it easiest to develop user-friendly GUIs
> while
> at the same time supporting those who swear by the Command
> Line?
> 
> 
> Finally a comment on one of my favorite subjects:
> distinguishing
> between the spectral and redshift/Doppler velocity axes.
> None of the protocols currently supports this and that is a
> problem.
> It means that users in their queries cannot indicate whether
> they
> are interested in multi-band image cubes or in cubes where
> the
> third axis is Doppler velocity, they cannot express whether
> they
> want spectra for, say, SED or line equivalent width analysis,
> or
> Doppler profiles.
> It is going to annoy users no end if they get offered large
> numbers
> of datasets that they are not interested in and thought they
> didn't
> ask for.
> And note that making this distinction means that it allows
> one to
> construct hypercubes that contain Doppler velocity profiles
> in multiple
> spectral lines.
> 
> Cheers,
>
>   - Arnold
> 
> --------------------------------------------------------------------------------
> -----------------------------
> Arnold H. Rots 
> Chandra X-ray Science Center
> Smithsonian Astrophysical Observatory                   tel: 
> +1 617 496 7701
> 60 Garden Street, MS 67 
> fax:  +1 617 495 7356
> Cambridge, MA 02138 
> arots at cfa.harvard.edu
> USA 
> http://hea-www.harvard.edu/~arots/
> --------------------------------------------------------------------------------
> ------------------------------
> 
> 
>


More information about the dal mailing list