ADQL 2.1: TIMESTAMP() vs CAST()
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Wed Oct 9 15:07:29 CEST 2019
Dear DAL,
[Sorry for a longish mail right before the interop. To make up for it,
there's some actual prose at the end.]
At an informal meeting of several people involved with implementing ADQL
and writing the standard, a question came back up we've been trying to
decide for some time. It is about how we'd like to do type conversions
in ADQL in general. The two options we were discussing were:
* constructor functions: TIMESTAMP('2019-10-10") [in ADQL 2.1],
INTERVAL(4, 5), BIGINT(30), DOUBLE PRECISON(magv) [later]
* cast: CAST('2019-10-10', TIMESTAMP), CAST(30, BIGINT),
CAST(magv, DOUBLE PRECISON) [in ADQL 2.1],
CAST([4,5], INTERVAL) or CAST('4 5', INTERVAL) [later]
Since at this point ADQL 2.1 forsees both TIMESTAMP and CAST but doesn't
properly specify either in the grammar, it'd be great if we made up our
mind which way we'd like to go, because clarly it'd be nice if we'd be
halfway consistent about how we want to do type conversions ("casting"
from here on), and if we didn't specify two ways to accomplish the same
thing.
Up font: I like the CAST way better. I hope that's not going to show in
the following treatment.
On constructor functions
========================
On the plus side for constructor functions I see that python programmers
are used to them and that they're possibly a bit nicer to write. Also,
and perhaps most pertinently, we already have them to a certain degree
for geometries (POINT(ra, dec) and friends; allowing construction from
DALI literals as in POINT('23 42') would be an obvious extension).
On the minus side I'd say that it's not what SQL does in general and
that the grammar will become a good deal more ambiguous once we allow
general types in there (or so I guess). I also suspect it'll be more
effort grammar-wise in the end.
If we went this way, we'd have to add grammar rules to the 2.1 grammar,
because that's still missing. Following SQL92, this would involve
changing the value expression from
<value_expression> ::= <numeric_value_expression>
| <string_value_expression>
| <boolean_value_expression>
| <geometry_value_expression>
to
<value_expression> ::= <numeric_value_expression>
| <string_value_expression>
| <boolean_value_expression>
| <datetime_value_expression>
| <geometry_value_expression>
and then define
<datetime_value_expression> ::= TIMESTAMP( <string_value_expression> )
This only gives us TIMESTAMP, of course. The other conversion functions
[BIGINT(20) and friends] we'd postpone to later versions.
On cast
=======
On the plus side for cast I'd say that it's what SQL92 does for type
conversions and we'll get all kinds of conversions essentially for free.
And syntax changes remain rather localised. Also, an enumeration of the
type literals ADQL knows might come in handy in other places, too.
On the minus side, it's not what we've been doing for geometries. Also,
it's essentially impossible with cast to catch certain forbidden
conversions or operations on the grammar level. If that's an actual
disadvantage I doubt, though.
CAST currently is mentioned in the spec like this:
The ADQL CAST() function does not replicate the full functionality and
range of types supported by common RDBMS implementations of CAST.
The ADQL CAST() function only supports type conversion between the
standard numeric datatypes. The CAST() function does not support casting
to or from the character, binary, datetime or geometric datatypes.
The rounding mechanism used when converting from floating point values
(REAL or DOUBLE) to integer values (SHORTINT, INTEGER or BIGINT)
is implementation dependent.
When converting a numeric value to a datatype that is too small to
represent the value, this SHOULD be treated as an error. Details of the
mechanism for reporting the error condition are implementation dependent.
-- which already points at some other difficulties you're encountering
when doing type conversions (but these would apply to constructor
functions).
If we wanted to replace TIMESTAMP, we'd have to write instead of the
second paragraph quoted above:
The ADQL CAST function makes no guarantees about what can be cast to
what. Implementors SHOULD enable casting string representations of
DALI literals to their corresponding database types when they support
the latter. In particular, CAST('2019-10-19T12:42:31', TIMESTAMP)
should reliably turn a DALI timestamp into a database timestamp.
or something like that.
As with timestamp, CAST is not yet in the grammar. In that case, you'd
say (mostly stolen from SQL 92):
<value_expression_primary> ::= [...]
| <cast_specification>
<cast_specification> :== CAST ( <cast_operand> AS <data_type> )
<cast_operand> ::=
<value_expression>
| NULL
[I'd not veto forgetting about NULL and inlining the value expression;
I don't think there's many cases for casting NULL outside view
definitions, and these we don't to with ADQL]
<data_type> ::=
CHAR [ <left paren> <length> <right paren> ]
| VARCHAR <left paren> <length> <right paren>
| NUMERIC
| INTEGER
| SMALLINT
| BIGINT
| REAL
| DOUBLE PRECISION
| TIMESTAMP
<length> ::= <unsigned integer>
The <data_type> is massively simplified from SQL92, which in particular
supports lots of aliases for what's written here, and we should say
that implementors are of course allowed to support more. And that users
must be prepared to get execution time errors if they try to convert
things that cannot be converted (e.g., CAST('foo', SMALLINT)) or that
overflow (e.g., CAST(3000000, SMALLINT). An interesting question is
whether we should add POINT, CIRCLE, and POLYGON here right away.
Perhaps even REGION?
So... Opinions? Which way would *you* like to go? Or would you go with
Ralph Waldo Emerson ("A foolish consistency is the hobgoblin of little
minds") and have both [full disclosure: It's what DaCHS does right now,
but, now going with Herman Melville, I would prefer not to] ?
-- Markus
More information about the dal
mailing list