datatypes (effects all 3 WDs to some extent)

Patrick Dowler patrick.dowler at nrc-cnrc.gc.ca
Wed Mar 19 13:02:18 PDT 2014


This is very long so if you are inclined to TL;DR then now is the time 
to bail out :-)

--

The current WD-SIA-2.0 and WD-AccessData-1.0 make use of serialised 
values (for parameters) that are ad-hoc in nature. We did more or less 
the same thing in ADQL-2.0 with the POINT, CIRCLE, BOX, POLYGON, and 
REGION functions -- they implicitly defined some datatypes. When it came 
to VOTable serialisation, we adopted a (non-normative) STC-S encoding 
into a FIELD of datatype="char" arraysize="*" and had to add on the 
xtype="adql:POLYGON" (eg.).... all doing what we intended but admittedly 
a pretty big snowballing kind of hack.

Well, it kind of works but there are lots of issues and complications 
with this in TAP, we gained a lot of valuable experience, and we will 
discuss that in Madrid.


For now, the SIA and AccessData drafts make use of a much-simplified but 
still ad-hoc datatypes. There is no generic region that people might 
think is a base class, only CIRCLE, RANGE (not box), and POLYGON. And 
there are no coordinate/reference system metadata in the values (such as 
STC-S would permit). Back in Heidelberg and Hawaii we generally agreed 
that these pure geometry-values were the way to go in future.

The WDs also implicitly defile "interval" datatypes for numeric 
intervals: the range query eg BAND=500e-9/700e-9. It looks simple enough 
(syntax details not important now, just that there is a syntax), but 
Markus has pointed out that if you try to describe the BAND parameter 
using the mechanism in DataLink you would say that it is
datatype="double" and a relatively naive client cannot tell that they 
can call it with this range syntax. Generic server-side code also cannot 
be written to accept that syntax since it would sensible try to parse a 
double.  Really,  we are passing in a numeric interval as the value of 
the query parameter, so add that to the list of implicit datatypes. In 
future, one might think about describing the spectral or time coverage 
using such a numeric interval instead of two scalar values, just like we 
describe the spatial footprint using geometry.

So, we have a mismatch between the datatypes we are trying to use and 
our ability to describe them. How far back does that go? Well, ADQL-2.0 
also uses datatypes but doesn't define them... And VOTable defines some 
datatype descriptions but not these ones (xtype hack noted). I would say 
that current DMs don't define these things as datatypes per se: they do 
have structures that accomplish the same sorts of things within the 
domain but that is not the same as defining datatypes.

*** What to do?

Ideally, we would have a document that defines datatypes and serialised 
values. Then ADQL-2.x would refer to that document, VOTable-1.x (x>3) 
would need to support serialisation of values of those datatypes. 
TAP-1.x would not have to say as much about datatypes as it does now, so 
it would get stripped down a bit. Other DAL and DM documents could use 
the datatypes by reference to a definitive document.


Right now though, we need to do the final step (other DAL documents):

Option #1:  use values of these datatypes in an ad-hoc way; if we do 
that right (by keeping it simple) then once they are formally defined we 
could simply strip these docs down in a future version... essentially 
test-drive the concepts in the current work. We would need a way to 
describe them in service descriptor, which means defining some new 
values for xtype in the DataLink document.

Option #2: The alternative for right now would be to *not* use any 
ad-hoc datatypes and just use simple parameters that don't imply such. 
This would mean, for example, that we would have to adopt either FOO_MIN 
and FOO_MAX or FOO and FOO_SIZE ways of specifying a range. For a 2D 
construct like spatial axes, we would only be able to specify ranges 
along the axes using 4 different parameters. This would be describable 
correctly using the DataLink service descriptor mechanism and would 
probably define a pattern for such simple parameter usage. Note that 
even the old POS parameter does not fit here. I don't now how this would 
effect evolution of the standards.(eg. Could we add those datatypes 
later in a minor rev? major rev? Haven't thought it through)


My take:

Both of these can work for the base datacube use cases from CSP. #2 
offers a little less power for spatial axes.

We would get the equivalent features for the energy and time axes as 
long as all these parameters are limited to single-value only. You just 
cannot make sense of a request with multiple values of FOO_MIN and 
FOO_MAX (for example) because you cannot pair them up as intended. In 
principle #1 makes sense with multiple values of a parameter -- passing 
multiple structured values intact is the main feature of these ad-hoc 
datatypes.

I haven't really thought it through for #2, but I think both can be made 
to work with UPLOAD of tables. Well, one could not make any use of the 
s_region column of ObsCore in #2, or even LONG_MIN and LONG_MAX, but you 
could use s_ra, s_dec, and s_fov with LONG and LONG_SIZE and LAT_and 
LAT_SIZE... that feels like really being painted into a corner though 
and I don't like it. Some other output used as input could just as 
easily not work with _SIZE and require _MIN and _MAX. Then we'd be 
stuck. I think with #2 clients would have to fabricate or reformat 
tables to be usable and tables output from other services would 
generally not be usable as-is

---------------

Summary: ad-hoc datatypes used in parameters cannot currently be 
described using VOTable PARAM elements. We need to either:

#1 temporarily use ad-hoc datatypes anyway, describe with the xtype 
work-around provided, eventually formally define the datatypes and make 
them fully supported in VOTable, OR

#2 don't use ad-hoc datatypes and stick with what can be described now

I hope I have faithfully captured the issues that Markus has diligently 
and persistently raised. We cannot really proceed without a decision 
that puts this topic to rest (and potentially makes work in other 
documents).


-- 

Patrick Dowler
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9E 2E7

250-363-0044 (office) 250-363-0045 (fax)


More information about the dal mailing list