Units parser in Java and C

Norman Gray norman at astro.gla.ac.uk
Thu Jul 21 09:59:04 PDT 2011


Greetings, all.

At the Naples interop, Sébastien gave a presentation on the range of current standards for unit strings <http://www.ivoa.net/internal/IVOA/InteropMay2011Semantics/VOUnits_Semantics.pdf>.  During that discussion, or after it, I volunteered to look at defining grammars for these syntaxes, with a view to defining a consensus grammar for them in future.

So I've done so.

I've developed explicit grammars, and recommended unit lists, for the three unit specifications mentioned in Sébastien's presentation, namely FITS, OGIP and CDS.  These are implemented by a C library and a Java class library, which can parse unit strings in an indicated syntax (rejecting malformed ones), and write them out in another syntax or in LaTeX.  See:

    http://www.astro.gla.ac.uk/users/norman/ivoa/unity-0.1.tar.gz

for a download.  This is an alpha release, for comments.  I haven't tried porting it to other unixes than OS X (I don't imagine it'd be difficult).

Please take a look, let me know of any build, functionality or usability problems, and suggest where we might go from here.

I've sent this just to the semantics group at present.  Should it be advertised further afield?

Best wishes,

Norman


--------

The following is from the README:


This is the unity library, which is able to parse scientific unit
specifications using a variety of syntaxes.

THIS SHOULD BE REGARDED AS ALPHA-QUALITY SOFTWARE AT PRESENT.
The implementation and interface may change between versions without notice.

The recognised syntaxes are:

fits:
    FITS v3.0, section 4.3, W.D. Pence et al., A&A 524, A42, 2010.
    doi:10.1051/0004-6361/201015362 

ogip:
    OGIP memo OGIP/93-001, 1993
    ftp://legacy.gsfc.nasa.gov/fits_info/fits_formats/docs/general/ogip_93_001/ogip_93_001.ps

cds:
    Standards for Astronomical Catalogues, Version 2.0, section 3.2, 2000
    http://cdsweb.u-strasbg.fr/doc/catstd-3.2.htx

The grammars are available in src/grammar

The grammars are implemented by (at present) two libraries, one in C
and one in Java.  See src/c/docs and src/java/docs for documentation.

Each of the implementations supports reading each of the three
grammars, and writing output in the three syntaxes, plus LaTeX output
(supported by the LaTex siunitx package.

If you want to experiment with the library, build src/c/unity:

    % ./unity -icds -oogip 'mm2/s'
    mm**2 /s
    % ./unity -icds -ofits -v mm/s
    mm s-1
    check: all units recognised?           yes
    check: all units recommended?          yes
    check: all units satisfy constraints?  yes
    % ./unity -ifits -ocds -v 'merg/s'
    merg/s
    check: all units recognised?           yes
    check: all units recommended?          no
    check: all units satisfy constraints?  no
    % ./unity -icds -ofits -v 'merg/s'
    merg s-1
    check: all units recognised?           no
    check: all units recommended?          no
    check: all units satisfy constraints?  yes

In the latter cases, the -v option _validates_ the input string
against various constraints.  The expression mm/s is completely valid
in all the syntaxes.  In the FITS syntax, the erg is a recognised
unit, but it is deprecated; although it is recognised, it is not
permitted to have SI prefixes.  In the CDS syntax, the erg is neither
recognised nor (a fortiori) recommended; since there are no
constraints on it in this syntax, it satisfies all of them (this
latter behaviour is admittedly slightly counterintuitive).


Pre-requirements
----------------

To build from a distribution, the only pre-requirements are a C and a
Java compiler.

To build from a source checkout, you need
  * autoconf
  * bison or byacc (original yacc might work), and flex or lex
  * byaccj (http://byaccj.sourceforge.net/)
  * jflex (http://jflex.de/)
  * doxygen if you wish to build the documentation


Building
---------

The usual:

    % ./configure
    % make
    % make check

If you're building from a source checkout, you'll need to start with 'autoconf'.


Limitations
-----------

No mathematical functions in FITS or OGIP parsers

No [log] in CDS parser

The CDS specification permits non-round factors (that is, factors
which aren't a power of ten).  These are not permitted in this CDS
parser, partly because they're arguably quantities rather than units,
but more practically because it significantly complicates the implementation.

The software has been developed on OS X, so definitely builds there.
I have as yet made no serious attempt to port the library to a
different platform, but I don't expect major problems.


Norman Gray
http://nxg.me.uk


-- 
Norman Gray  :  http://nxg.me.uk
School of Physics and Astronomy, University of Glasgow, UK



More information about the semantics mailing list