ADQL and ORDER BY

Patrick Dowler pdowler.cadc at gmail.com
Thu Feb 25 21:18:45 CET 2021


I agree that it is non-trivial thing and hard to retrofit. Fortunately for
us, I ran into this in a new system where we could tear down the DB and
create it with explicit encoding (UTF-8) and collation (C)... we got the
default from the system (containers) which was en_US.UTF-8...

other PG databases we have behind TAP (argus and youcat services) happen to
be in en_CA.UTF-8 and they have a lot of content so changing is painful. It
just so happens that we don't have operational s/w that relies on character
sorting to be consistent with s/w or other DBs with a different collation
so we never noticed. Not sure we'll "fix" them (personally, I would but it
is a lot of work)

While googling, I found this very good document:
https://gist.github.com/rraval/ef4e4bdc63e68fe3e83c9f98f56af7a4

--
Patrick Dowler
Canadian Astronomy Data Centre
Victoria, BC, Canada


On Thu, 25 Feb 2021 at 01:46, Grégory Mantelet <
gregory.mantelet at astro.unistra.fr> wrote:

> Hi Pat,
>
> ADQL does not specify anything about collation.
>
> According to my experience, DBMS support several collations so that the DB
> manager can choose which one is needed on specific tables.
>
> SQL-92 does not seem to enforce one in particular ; it seems to say that
> the default collation (collating sequence) is implementation dependent.
> Based on that, I don't think ADQL should define a default collation.
>
> Besides, generally the databases published in TAP (or any other ADQL
> services) already exist, and generally either with a default collation or
> with a specific collation required for a precise usage in the DB. So, we
> certainly do not want to make implementer change their tables' collation,
> which generally means rebuild the table and indexes....quite an important
> task if the table is large.
>
> However, maybe a collation should be recommended on some standardized
> tables/schemas like 'obscore' and 'registry' so that ensuring having a
> common behavior when querying them ; so, for a better interoperability. But
> that would be up to the corresponding IVOA standards/documents, I think.
>
> Tell me if I missed something.
>
> Cheers,
> Grégory
>
>
> On 24/02/2021 22:33, Patrick Dowler wrote:
>
>
> I have just discovered (to my horror) that some of our databases sort in
> different order than others depending on the LC_COLLATE value...
> specifically thus is noticeable with underscore which comes before letters
> with C but after letters with en_CA or en_US... ouch.
>
> Does ADQL specify this? Which locale (?) is assumed to be correct for
> ADQL?
>
>
> gory details:
>
> in java, underscore comes before letters:
> groovy:000> "FOO_bar".compareTo("FOObar");
> ===> -3
> groovy:000> "FOObar".compareTo("FOO_bar");
> ===> 3
> so java is equiv to C even when system locale is en_CA
>
> In PG server 1:
> cvomaqdb=> show lc_collate;
>  lc_collate
> -------------
>  en_US.UTF-8
>
> cvomaqdb=> select 'FOO_bar' < 'FOObar';
>  f
>
> cvomaqdb=> select 'FOObar' < 'FOO_bar';
>  t
>
> In PG server 2:
> content=> show lc_collate;
>  lc_collate
> ------------
>  C
>
> content=> select 'FOO_bar' < 'FOObar';
>  t
>
> content=> select 'FOObar' < 'FOO_bar';
>  f
>
>
> --
> Patrick Dowler
> Canadian Astronomy Data Centre
> Victoria, BC, Canada
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20210225/7a93a9f1/attachment.html>


More information about the dal mailing list