<div dir="ltr"><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">I agree that it is non-trivial thing and hard to retrofit. Fortunately for us, I ran into this in a new system where we could tear down the DB and create it with explicit encoding (UTF-8) and collation (C)... we got the default from the system (containers) which was en_US.UTF-8... <br></div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">other PG databases we have behind TAP (argus and youcat services) happen to be in en_CA.UTF-8 and they have a lot of content so changing is painful. It just so happens that we don't have operational s/w that relies on character sorting to be consistent with s/w or other DBs with a different collation so we never noticed. Not sure we'll "fix" them (personally, I would but it is a lot of work)<br></div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">While googling, I found this very good document: <a href="https://gist.github.com/rraval/ef4e4bdc63e68fe3e83c9f98f56af7a4">https://gist.github.com/rraval/ef4e4bdc63e68fe3e83c9f98f56af7a4</a><br></div><div class="gmail_default" style="font-size:small"><br></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div>--<br></div><div>Patrick Dowler<br></div>Canadian Astronomy Data Centre<br></div>Victoria, BC, Canada<br></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 25 Feb 2021 at 01:46, Grégory Mantelet <<a href="mailto:gregory.mantelet@astro.unistra.fr">gregory.mantelet@astro.unistra.fr</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div>Hi Pat,<br>
<br>
ADQL does not specify anything about collation.<br>
<br>
According to my experience, DBMS support several collations so
that the DB manager can choose which one is needed on specific
tables. <br>
<br>
SQL-92 does not seem to enforce one in particular ; it seems to
say that the default collation (collating sequence) is
implementation dependent. Based on that, I don't think ADQL should
define a default collation.<br>
<br>
Besides, generally the databases published in TAP (or any other
ADQL services) already exist, and generally either with a default
collation or with a specific collation required for a precise
usage in the DB. So, we certainly do not want to make implementer
change their tables' collation, which generally means rebuild the
table and indexes....quite an important task if the table is
large.<br>
<br>
However, maybe a collation should be recommended on some
standardized tables/schemas like 'obscore' and 'registry' so that
ensuring having a common behavior when querying them ; so, for a
better interoperability. But that would be up to the corresponding
IVOA standards/documents, I think.<br>
<br>
Tell me if I missed something.<br>
<br>
Cheers,<br>
Grégory<br>
<br>
<br>
On 24/02/2021 22:33, Patrick Dowler wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">I have just
discovered (to my horror) that some of our databases sort in
different order than others depending on the LC_COLLATE
value... specifically thus is noticeable with underscore
which comes before letters with C but after letters with en_CA
or en_US... ouch.</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">Does ADQL
specify this? Which locale (?) is assumed to be correct for
ADQL? <br>
</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">gory details:</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">in java,
underscore comes before letters:<br>
</div>
<div class="gmail_default" style="font-size:small">groovy:000>
"FOO_bar".compareTo("FOObar");<br>
===> -3</div>
<div class="gmail_default" style="font-size:small">groovy:000>
"FOObar".compareTo("FOO_bar");<br>
===> 3</div>
<div class="gmail_default" style="font-size:small">so java is
equiv to C even when system locale is en_CA<br>
</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">In PG server
1:</div>
<div class="gmail_default" style="font-size:small">cvomaqdb=>
show lc_collate;<br>
lc_collate <br>
-------------<br>
en_US.UTF-8<br>
<br>
cvomaqdb=> select 'FOO_bar' < 'FOObar';<br>
f<br>
<br>
cvomaqdb=> select 'FOObar' < 'FOO_bar';<br>
t</div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div><br>
</div>
<div>
<div style="font-size:small" class="gmail_default">In
PG server 2:</div>
<div style="font-size:small" class="gmail_default">content=>
show lc_collate;<br>
lc_collate <br>
------------<br>
C<br>
<br>
content=> select 'FOO_bar' < 'FOObar';<br>
t<br>
<br>
content=> select 'FOObar' < 'FOO_bar';<br>
f<br>
</div>
<br>
</div>
<div><br>
</div>
<div>--<br>
</div>
<div>Patrick Dowler<br>
</div>
Canadian Astronomy Data Centre<br>
</div>
Victoria, BC, Canada<br>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</blockquote></div>