ADQL-2.1 internal draft

Walter Landry wlandry at caltech.edu
Thu Jun 11 00:05:31 CEST 2015


Marco Molinaro <molinaro at oats.inaf.it> wrote:
> Hi Walter, hi DAL,
> 
> 2015-06-10 1:53 GMT+02:00 Walter Landry <wlandry at caltech.edu>:
>> Marco Molinaro <molinaro at oats.inaf.it> wrote:
>>> Dear DAL members,
>>> a first internal draft of the ADQL-2.1 document in available online,
>>> linked on the IVOA TWiki at
>>>
>>> http://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL
>>
>> Here are some of my thoughts.
>>
>> 1) Coordinate systems are deprecated in the geometric functions (BOX,
>>    CIRCLE, etc.).  Why don't we just declare new functions that do not
>>    take the first argument?  So instead of
>>
>>      BOX('', 25.4, -20.0, 10, 10)
>>
>>    it would be
>>
>>      BOX(25.4, -20.0, 10, 10)
>>
>>    The functions have different arity, so it is easy to distinguish
>>    between them in the parser.  In general, empty strings feel odd to me.
> 
> It could be odd, but this solution was taken upon back-compatibility
> constraints.
> That's why from ADQL-2.1 on, until a major revision, the first string,
> if not empty can be ignored by servers
> and client are encouraged to pass an empty one. I.e. that parameter is
> deprecated from revision 2.1 on.

I am not suggesting removing the 2.0 version with a coordinate system.
I am suggesting adding an overload and not having an implicit meaning
for an empty string.  That would retain the property of the current
proposal that all 2.0 queries are valid in 2.1, but not all 2.1
queries are valid in 2.0.

<snip>

> Pasting here also the other two points you made, i.e. LOWER/UPPER and ILIKE.
> Probably there was not full discussion on them.
> 
> LOWER/UPPER Initially they were set as optional for this revision, but
> there was also the point made that it would be better to have them
> mandatory...and also to have only one of them to help with tables
> indexing.
> Probably this is something to discuss.

If anything we should be normalizing to upper case.  There are some
letters that do not round trip properly through lower case.

  Start: Greek Rho Symbol (U+03f1) ϱ
  Uppercase: Capital Greek Rho (U+03a1) Ρ
  Lowercase: Small Greek Rho (U+03c1) ρ

  http://stackoverflow.com/a/14128850/1446838

As someone who would have to implement this, I would probably
implement both regardless of what the spec says.  It just feels like
an obvious hole otherwise.

> The UTF-8 or not in using them I don't think it came into play. What
> the TAPNotes Note says about is:
> ---
> ADQL currently has no facility reliably allowing case-insensitive
> string comparisons. This is particularly regrettable since UCDs and at
> least the majority of the defined utypes are to be compared
> case-insensitively.
> 
> Thus, we propose the addition of a string function LOWER and the
> case-insensitive variant of LIKE, ILIKE. Since case folding is a
> nontrivial operation in a multi-encoding world, ADQL would only
> require standard behaviour for the ASCII characters (which would
> suffice for UCDs and utypes) and only recommend following algorithm R2
> in section 3.13, "Default Case Algorithms" of [std:UNICODE] outside of
> ASCII.
> ---
> 
> Thus the level where these function take a role are somehow already limited.

This still does not specify whether it is UTF-8, UTF-16, or UCS-32.  I
think we should just choose one, with my vote being UTF-8 since ASCII
is unchanged.

Cheers,
Walter Landry


More information about the dal mailing list