<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">Hi Pat,<br>
<br>
ADQL does not specify anything about collation.<br>
<br>
According to my experience, DBMS support several collations so
that the DB manager can choose which one is needed on specific
tables. <br>
<br>
SQL-92 does not seem to enforce one in particular ; it seems to
say that the default collation (collating sequence) is
implementation dependent. Based on that, I don't think ADQL should
define a default collation.<br>
<br>
Besides, generally the databases published in TAP (or any other
ADQL services) already exist, and generally either with a default
collation or with a specific collation required for a precise
usage in the DB. So, we certainly do not want to make implementer
change their tables' collation, which generally means rebuild the
table and indexes....quite an important task if the table is
large.<br>
<br>
However, maybe a collation should be recommended on some
standardized tables/schemas like 'obscore' and 'registry' so that
ensuring having a common behavior when querying them ; so, for a
better interoperability. But that would be up to the corresponding
IVOA standards/documents, I think.<br>
<br>
Tell me if I missed something.<br>
<br>
Cheers,<br>
Grégory<br>
<br>
<br>
On 24/02/2021 22:33, Patrick Dowler wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAFK8nrpyoXXXSS7DBVA4p=zs7Uvp95AdEyPQwmoMvP3r24PHmw@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">I have just
discovered (to my horror) that some of our databases sort in
different order than others depending on the LC_COLLATE
value... specifically thus is noticeable with underscore
which comes before letters with C but after letters with en_CA
or en_US... ouch.</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">Does ADQL
specify this? Which locale (?) is assumed to be correct for
ADQL? <br>
</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">gory details:</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">in java,
underscore comes before letters:<br>
</div>
<div class="gmail_default" style="font-size:small">groovy:000>
"FOO_bar".compareTo("FOObar");<br>
===> -3</div>
<div class="gmail_default" style="font-size:small">groovy:000>
"FOObar".compareTo("FOO_bar");<br>
===> 3</div>
<div class="gmail_default" style="font-size:small">so java is
equiv to C even when system locale is en_CA<br>
</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">In PG server
1:</div>
<div class="gmail_default" style="font-size:small">cvomaqdb=>
show lc_collate;<br>
lc_collate <br>
-------------<br>
en_US.UTF-8<br>
<br>
cvomaqdb=> select 'FOO_bar' < 'FOObar';<br>
f<br>
<br>
cvomaqdb=> select 'FOObar' < 'FOO_bar';<br>
t</div>
<div>
<div dir="ltr" class="gmail_signature"
data-smartmail="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div><br>
</div>
<div>
<div style="font-size:small" class="gmail_default">In
PG server 2:</div>
<div style="font-size:small" class="gmail_default">content=>
show lc_collate;<br>
lc_collate <br>
------------<br>
C<br>
<br>
content=> select 'FOO_bar' < 'FOObar';<br>
t<br>
<br>
content=> select 'FOObar' < 'FOO_bar';<br>
f<br>
</div>
<br>
</div>
<div><br>
</div>
<div>--<br>
</div>
<div>Patrick Dowler<br>
</div>
Canadian Astronomy Data Centre<br>
</div>
Victoria, BC, Canada<br>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>