datatypes (effects all 3 WDs to some extent)
Patrick Dowler
patrick.dowler at nrc-cnrc.gc.ca
Wed Mar 19 13:02:18 PDT 2014
This is very long so if you are inclined to TL;DR then now is the time
to bail out :-)
--
The current WD-SIA-2.0 and WD-AccessData-1.0 make use of serialised
values (for parameters) that are ad-hoc in nature. We did more or less
the same thing in ADQL-2.0 with the POINT, CIRCLE, BOX, POLYGON, and
REGION functions -- they implicitly defined some datatypes. When it came
to VOTable serialisation, we adopted a (non-normative) STC-S encoding
into a FIELD of datatype="char" arraysize="*" and had to add on the
xtype="adql:POLYGON" (eg.).... all doing what we intended but admittedly
a pretty big snowballing kind of hack.
Well, it kind of works but there are lots of issues and complications
with this in TAP, we gained a lot of valuable experience, and we will
discuss that in Madrid.
For now, the SIA and AccessData drafts make use of a much-simplified but
still ad-hoc datatypes. There is no generic region that people might
think is a base class, only CIRCLE, RANGE (not box), and POLYGON. And
there are no coordinate/reference system metadata in the values (such as
STC-S would permit). Back in Heidelberg and Hawaii we generally agreed
that these pure geometry-values were the way to go in future.
The WDs also implicitly defile "interval" datatypes for numeric
intervals: the range query eg BAND=500e-9/700e-9. It looks simple enough
(syntax details not important now, just that there is a syntax), but
Markus has pointed out that if you try to describe the BAND parameter
using the mechanism in DataLink you would say that it is
datatype="double" and a relatively naive client cannot tell that they
can call it with this range syntax. Generic server-side code also cannot
be written to accept that syntax since it would sensible try to parse a
double. Really, we are passing in a numeric interval as the value of
the query parameter, so add that to the list of implicit datatypes. In
future, one might think about describing the spectral or time coverage
using such a numeric interval instead of two scalar values, just like we
describe the spatial footprint using geometry.
So, we have a mismatch between the datatypes we are trying to use and
our ability to describe them. How far back does that go? Well, ADQL-2.0
also uses datatypes but doesn't define them... And VOTable defines some
datatype descriptions but not these ones (xtype hack noted). I would say
that current DMs don't define these things as datatypes per se: they do
have structures that accomplish the same sorts of things within the
domain but that is not the same as defining datatypes.
*** What to do?
Ideally, we would have a document that defines datatypes and serialised
values. Then ADQL-2.x would refer to that document, VOTable-1.x (x>3)
would need to support serialisation of values of those datatypes.
TAP-1.x would not have to say as much about datatypes as it does now, so
it would get stripped down a bit. Other DAL and DM documents could use
the datatypes by reference to a definitive document.
Right now though, we need to do the final step (other DAL documents):
Option #1: use values of these datatypes in an ad-hoc way; if we do
that right (by keeping it simple) then once they are formally defined we
could simply strip these docs down in a future version... essentially
test-drive the concepts in the current work. We would need a way to
describe them in service descriptor, which means defining some new
values for xtype in the DataLink document.
Option #2: The alternative for right now would be to *not* use any
ad-hoc datatypes and just use simple parameters that don't imply such.
This would mean, for example, that we would have to adopt either FOO_MIN
and FOO_MAX or FOO and FOO_SIZE ways of specifying a range. For a 2D
construct like spatial axes, we would only be able to specify ranges
along the axes using 4 different parameters. This would be describable
correctly using the DataLink service descriptor mechanism and would
probably define a pattern for such simple parameter usage. Note that
even the old POS parameter does not fit here. I don't now how this would
effect evolution of the standards.(eg. Could we add those datatypes
later in a minor rev? major rev? Haven't thought it through)
My take:
Both of these can work for the base datacube use cases from CSP. #2
offers a little less power for spatial axes.
We would get the equivalent features for the energy and time axes as
long as all these parameters are limited to single-value only. You just
cannot make sense of a request with multiple values of FOO_MIN and
FOO_MAX (for example) because you cannot pair them up as intended. In
principle #1 makes sense with multiple values of a parameter -- passing
multiple structured values intact is the main feature of these ad-hoc
datatypes.
I haven't really thought it through for #2, but I think both can be made
to work with UPLOAD of tables. Well, one could not make any use of the
s_region column of ObsCore in #2, or even LONG_MIN and LONG_MAX, but you
could use s_ra, s_dec, and s_fov with LONG and LONG_SIZE and LAT_and
LAT_SIZE... that feels like really being painted into a corner though
and I don't like it. Some other output used as input could just as
easily not work with _SIZE and require _MIN and _MAX. Then we'd be
stuck. I think with #2 clients would have to fabricate or reformat
tables to be usable and tables output from other services would
generally not be usable as-is
---------------
Summary: ad-hoc datatypes used in parameters cannot currently be
described using VOTable PARAM elements. We need to either:
#1 temporarily use ad-hoc datatypes anyway, describe with the xtype
work-around provided, eventually formally define the datatypes and make
them fully supported in VOTable, OR
#2 don't use ad-hoc datatypes and stick with what can be described now
I hope I have faithfully captured the issues that Markus has diligently
and persistently raised. We cannot really proceed without a decision
that puts this topic to rest (and potentially makes work in other
documents).
--
Patrick Dowler
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9E 2E7
250-363-0044 (office) 250-363-0045 (fax)
More information about the dal
mailing list