WD-AccessData-1.0-20140312

Thu Mar 20 18:08:59 PDT 2014

On Fri, 21 Mar 2014, Mark Allen wrote:

> An important part of the effort to make multi-dimensional data 
> accessible and useable within the VO is the engagement with projects 
> that produce (or will produce) data cubes. To name some of these 
> projects that participated in the focus sessions and following 
> discussions:
>
> ALMA
> LOFAR
> ASKAP
> (SKA)
> JVLA
> CGPS (survey)
> CALIFA
> MUSE
> (ESO IFUs)
> (JWST IFUs)
>
> Data from some of these have been shown in demonstrations and are being 
> used to guide developments within VO projects. These are real data 
> cubes. Further examples welcome.

OK - I was little bit concerned whether the real format of the above 
projects was seen to judge the suitability of proposed trio (SIAP2
accessdata datalink) of standards (with the current parameter 
restrictions).

As I said in the special session it was shown more of the nice images and 
wishes than real data. And the only real data were presented on ALMA by 
JVO visage - but (I have seen it at Hawaii as well) at least still there 
the "energy" axis was represented by the individual channel ID - not real 
"energy" span in barycentric wavelength in m.

And to be honest the IFUs so far in VO (I mean the Euro3D and he tools of 
Igor and Ivan) were rather presented as individual spectra referenced by
fiber id (or slit ID)

So it seems to be rather sparse cube in some axes in comparison to e.g. 
more densely covered "energy" axis.

In fact the CALIFA as well is more dense in "energy" axis then in spatial 
axes. But I can say (after downloading) that the reduced product is realy 
the datacube with NAXIS3

So I apologize for a little simplified view.

> The current efforts on the standards are to satisfy minimal requirements 
> on discovery and access. Of course, doing science with data cubes 
> requires much more, and we will need to consider the roles of services, 
> user interfaces and tools. I think that some of your comments are mixing 
> these things together, in particular for BAND.

My objection comes not because of datacubes (of e.g. IFU) but as it is 
(SIAP2) declared as image access protocol or in some sense image 
extraction (accessdata) protocol.

The BANDNAME is crucial for science discovery in multicolour photometry 
surveys as well as the fiberID is crucial for multiobject spectra (look in 
SDSS - the plate-fiber pair is crucial primary metadata - not the position 
which is derived.

If you want the insist on simple numerical energy band as primary 
information do not name it "Image protocol" but SDAP - simple datacube 
access protocol - and then everyone would know he should not expect the 
server of images (in different filters) here.

  The standard needs to be 
> uniform and simple, but tools or services could offer any number of ways 
> for an astronomer to specify a wavelength/frequency/energy range. For 
> example, I think that a user interface could use a look-up of the SVO 
> filter service to do for BAND what the NED/CDS sesame name resolver does 
> for coordinates, presenting a way for the astronomer to deal with filter 
> names, but using the standard that speaks only in metres (or Hz).

As I said - it is not uniquely mappable and will require much more effort 
to investigate the data before publishing. Moreover it will not allow you 
to select exactly what you want and know.

The examaple with Sesame is perfect example how the generalized view 
without deeper investigation makes you troubles to achieve scientific 
goals.

With some little simplification suppose the case you want to search for 
spectrum of double star. You have (at least) two spectra on one chip. Both 
may be extracted and put is separate 1D spectra. But the RA DEC in FITS 
header belongs to only one position, the DATE-OBS is the same.

Obiously you know during the reduction which spectrum belongs to what 
star and you name the OBJECT accordingly.

For SSA you are obliged to use POS query - but it returns spectra of both 
stars obviously even with very small SIZE circle.
When you have 100+ spectra of both stars how do you isolate only the one 
of the (e.g.) secondary component which has Balmer line emission and is 
the only interesting ?
This is our case with HR1847A and B

Fortunately in SSA there is the TARGETNAME parameter (optional) so I can 
query by TARGETNAME and ignore the POS. And although there may be some 
ambiguity in names (different spaces etc ...) I say in SSA 
TARGETNAME=HR1847B and I will fulfill my science goal.
Of course if I do not know the TARGETNAME immediately - so first I perform 
discovery query in large circles probably using SESAME names etc ....
But once I know it I can precisely get what I want.

I agree the protocol should be simple but IMHO if it is orthogonally 
designed it should allow me to isolate by proper combination of parameters 
the individual entities (e.g. spectra, images). In case of SSA I have the 
combination of POS, SIZE (still yeilding the ambiguity) + TARGETNAME.
In case of SIAP2 I need the BAND (with wider range) + BANDNAME (or better 
BANDID) to get exactly THAT image.

Does anybody already see how it is important?

In terms of "ontology" or semantics:
The object of my investigation is some entity described by extremely large 
number of different atributes characterizing one or another property.
I will assign to it a label. So when I want to select it I will use the 
label.

If it is possible I can map one label to another in a unique - 
bijective way. So I can use this new label  to point 
to it as well. If this label is Halpha or 6562.8e-10m (in air) its fine 
(but I must use some tollerance).

But if I want to map symbol to interval - it is not bijective anymore. As 
in wider interval may be more narrower subintervals (narrow-band 
photometry images). Stating the wider interval (490e-9/700e-9m) will 
return me ALL the subband images and I cannot imagine how to select only 
Johnson V filter image.

If there are thousands of images I am deemed to die after manual 
selection or get very angry on whole stupid VO garbage ;-)