VOResource 1.1 and i18n

Accomazzi, Alberto aaccomazzi at cfa.harvard.edu
Thu Aug 4 14:10:01 CEST 2016


Hi Markus,

As you can imagine we at ADS have had to struggle with similar issues.  As
you mention, the goals of faithful representation of a record (in
particular a person's name) and the one of discoverability run against each
other.

On Thu, Aug 4, 2016 at 5:20 AM, Markus Demleitner <
msdemlei at ari.uni-heidelberg.de> wrote:

>
>   Several VOResource elements contain names.  Again, for reliable global
>   discoverability, such names must be given in (common) English
>   transliteration where their original form uses non-Latin scripts.
>   Latin letters with diacritics should also be transliterated.
>

The transliteration of Latin letters with diacritics seems a bit harsh to
me.  If it's there to make sure that searches containing non-diacritic
terms match the original strings with diacritics, there are other ways to
do this (downgrade everything to ascii when indexing and searching).  Since
this is a problem that only affects searchable registries, I would
investigate if the technologies currently used to host these databases
allow for that, in which case there should be no extra work involved in
keeping the diacritics in.


> I feel a bit bad about codifying this amount of cultural bias, but
> I'm convinced that for reliable discovery, we'll have to say
> something pretty close to that.
>

Agree that in principle it seems unfair, but in practice nobody has
complained so far when we suggest to our chinese and japanese colleagues to
give us abstracts in english...


> In particular on the question of names, I'm really uncertain, though.
> It seems patently wrong to me to have no place for names in, say,
> Cyrillic or Chinese or Japanese scripts.  At least for elements with
> an explicit name element (creator, contributor, contact), it would
> not be hard to add an additional element (perhaps originalName?) that
> could legally contain non-latin letters.  I'd be happy to introduce
> them if people asked for them and would volunteer to put out records
> using them.
>

We have a field in our bibliographic data that can be used to retain the
author name in its native script.  Although at the moment we don't do
anything useful with it the plan is to expose it and use it for indexing to
help with disambiguation.  I don't think you should worry about
disambiguation now but it seems like a good idea to capture the faithful
representation of somebody's name, so I'd vote for that.

BTW I noticed that Datacite says nothing of the sort and one of their
examples has a name in chinese script in the <creator> field:
https://schema.datacite.org/meta/kernel-3.1/example/datacite-example-complicated-v3.0.xml
The schema however allows for a title and a translated title (which is in
english).

-- Alberto



>
> Any takers?
>
>        -- Markus
>



-- 
Dr. Alberto Accomazzi
Principal Investigator
NASA Astrophysics Data System - http://ads.harvard.edu
Harvard-Smithsonian Center for Astrophysics - http://www.cfa.harvard.edu
60 Garden St, MS 83, Cambridge, MA 02138, USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/registry/attachments/20160804/b7cb1b2d/attachment.html>


More information about the registry mailing list