VOResource 1.1 and i18n

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Wed Jul 20 16:36:11 CEST 2016


Dear Registry crowd,

Here's the next part of my series of open issues for VOResource 1.1:
Language and script.

Right now, the status is that VOResource says nothing about this, but
you're really expected to have the natural language pieces
((resource/column/table/schema/capability/...)description,
resource/title, content/subject, and a few more) in English.  You're
also expected to have names in Latin transliteration, where it's not
quite clear whether Latin characters with diacriticals (ä, ç, ø) and
painful ligatures like ß count or not.

My current plan for VOResource 1.1 is:

(1) essentially codify the status quo ("natural-language content is
expected in English"; suggestions for what to say about
transliteration are welcome).  As long as we have strong cultural
biases, let's at least be honest about them.

(2) say that registry extensions (the edu IG will want this) may use
the xml:lang mechanism from the XML spec, but the elements then have
to be repeatable, and a version without xml:lang and English content
should always be provided (that's going to be a tough nut for
RegTAP, I suppose).


So, that still outlaws non-latin names, and it doesn't let people use
national languages in normal resource records.  As someone from a
region that's pretty much favoured by this arrangement (though the
transliteration rule, strictly interpreted, would require
transliteration of ä,ü,ö and, thank God, ß...), I'd like to gather
some opinions on whether that's reasonable.

I'd be particularly curious about opinions from non-Latin-writing
countries (and if you're too shy to speak up in public, I'll take
private mail, too).

Here's a few questions in this context:

(1) Is the restriction to English an actual problem for research-level
resources?

(2) What about the requirement for latin transliteration?  Have you
had to transliterate things when writing your resource records, and
was that a problem?  Would a plausible alternative, allowing
non-latin names *in addition* to transliterated names, really help?

(3) xml:lang says its values are governed by RFC 3066.  That's stuff
like bai-de vs. bai-at (baiuvarian dialects as spoken in Germany or
Austria, respectively).  I'm a bit afraid that this level of detail
might render the whole thing hard to use.  Does anyone have practical
experience with RFC 3066 language tags and hence advice on what
practices VOResource could recommend to help interoperability?  Can
we require the language tags to be lowercase in the VO?  I really
don't want another case-insensitive type in the VO...

(4) VOResource is also silent so far about en-us vs. en-uk.  It would
*probably* be preferable if people knew if they had to search for
colour or color.  Should we recommend something?  Just incidentally,
there's a type vr:Organisation in VOResource, which would suggest
British spelling is in effect for our class names.  I'm sure you
could find a counterexample, though.

(5) Should we say something about transliterating non-ASCII
"borderline latin" (the ä, ç, ø problem)?  It would certainly be
useful if people knew if they had to look for Zwolf, Zwölf, or Zwoelf
when the search for our PDL standard...

Cheers,

         Markus


More information about the registry mailing list