VOUnit encodings

Chalk, Stuart schalk at unf.edu
Thu Jul 26 14:45:24 CEST 2018


Markus

>From someone outside the VO community, I thank you very much for the detailed and very useful response to my email. I have to agree that I am surprised that there is a tolerance for unknown units.  Do you think that the community would ever re-evaluate this decision?  It would be much better if the parsers would flag an unknown unit as such and provide suggested units.

This becomes an important issue when you take into consideration the current move toward FAIR data (https://www.go-fair.org/) where the expectation is that FAIR data can be reused by other researchers and thus needs (in part) to have standardized units.

Anyway, as soon as my project at NIST is ready for public access I will send another message.

Regards,
Stuart

On Jul 26, 2018, at 4:12 AM, Markus Demleitner <msdemlei at ari.uni-heidelberg.de<mailto:msdemlei at ari.uni-heidelberg.de>> wrote:

Hi Stuart,

On Wed, Jul 25, 2018 at 06:00:04PM +0000, Chalk, Stuart wrote:
I am using the VOUnit encoding specification and wondering if there
are list of units encoded in the VOUnit spec anywhere.
If anyone has any info on available lists of units please send me
an email directly.

VOUnit has a grammar to generate unit strings, and the way that
grammar is written, you can generate infinitely many valid units
strings.  So, there can be no exhaustive list of VOUnits.

On the other hand, there is a reasonably simple way to obtain a list
of valid VOUnit strings that are out in the wild.  First, you'll need
a VOUnit parser; unity (https://bitbucket.org/nxg/unity) has parsers
in a couple of languages, and
http://svn.ari.uni-heidelberg.de/svn/gavo/python/trunk/gavo/base/unitconv.py
contains a pyparsing-based grammar relatively easily lifted for
python use (I'll happily assist removing the minor DaCHS dependencies
if there's interest).

Second, the wide majority of unit strings used in VO metadata[1] is
available from the registry -- try

 select distinct unit from rr.table_column

on RegTAP mirror (if you don't have a TAP client, just retrieving

 http://reg.g-vo.org/tap/sync?FORMAT=text/plain&LANG=ADQL&QUERY=select+distinct+unit+from+rr.table_column

will do as well).

Now, people are naughty, and so many of these units are
non-compliant.  That's why you want the parser.

As usual, things become complicated when you encounter the real
world.  VOUnit -- IMHO unfortunately -- encourages parsers to accept
"unknown units".  You'll want to filter them out in some way.

Based on DaCHS and pyVO (both are Debian-packaged), you can write
this:

 import pyvo
 from gavo import api

 for row in pyvo.dal.TAPService("http://dc.g-vo.org/tap"
 ).run_sync("select distinct unit from rr.table_column").table:
 unit = row["unit"]
 try:
 tree = api.parseUnit(unit)
 if "U?(" in repr(tree):  # Filter "unknown units"
 raise api.BadUnit("Unknown unit used")
 except api.BadUnit:
 pass
 else:
 print(unit)

I'm surprised myself that the output is just a measly 2.5k, so I'm
attaching what I just got.

Some of the units get me curious, though.  m/ms, for instance, begs
the question why they didn't go for the much more common km/s...

       -- Markus


[1] Well, people might disagree here, because I'm sure there's lots
of FITS files you can pull through VO facilities that contain a
plethora of unit strings; but it was fun stating it this way.
<goodunits.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/semantics/attachments/20180726/eef2d8a5/attachment.html>


More information about the semantics mailing list