From norman at astro.gla.ac.uk Thu Jul 21 09:59:04 2011 From: norman at astro.gla.ac.uk (Norman Gray) Date: Thu, 21 Jul 2011 17:59:04 +0100 Subject: Units parser in Java and C Message-ID: <080AEAE2-0947-49DC-98DD-80665E3EE00E@astro.gla.ac.uk> Greetings, all. At the Naples interop, S?bastien gave a presentation on the range of current standards for unit strings . During that discussion, or after it, I volunteered to look at defining grammars for these syntaxes, with a view to defining a consensus grammar for them in future. So I've done so. I've developed explicit grammars, and recommended unit lists, for the three unit specifications mentioned in S?bastien's presentation, namely FITS, OGIP and CDS. These are implemented by a C library and a Java class library, which can parse unit strings in an indicated syntax (rejecting malformed ones), and write them out in another syntax or in LaTeX. See: http://www.astro.gla.ac.uk/users/norman/ivoa/unity-0.1.tar.gz for a download. This is an alpha release, for comments. I haven't tried porting it to other unixes than OS X (I don't imagine it'd be difficult). Please take a look, let me know of any build, functionality or usability problems, and suggest where we might go from here. I've sent this just to the semantics group at present. Should it be advertised further afield? Best wishes, Norman -------- The following is from the README: This is the unity library, which is able to parse scientific unit specifications using a variety of syntaxes. THIS SHOULD BE REGARDED AS ALPHA-QUALITY SOFTWARE AT PRESENT. The implementation and interface may change between versions without notice. The recognised syntaxes are: fits: FITS v3.0, section 4.3, W.D. Pence et al., A&A 524, A42, 2010. doi:10.1051/0004-6361/201015362 ogip: OGIP memo OGIP/93-001, 1993 ftp://legacy.gsfc.nasa.gov/fits_info/fits_formats/docs/general/ogip_93_001/ogip_93_001.ps cds: Standards for Astronomical Catalogues, Version 2.0, section 3.2, 2000 http://cdsweb.u-strasbg.fr/doc/catstd-3.2.htx The grammars are available in src/grammar The grammars are implemented by (at present) two libraries, one in C and one in Java. See src/c/docs and src/java/docs for documentation. Each of the implementations supports reading each of the three grammars, and writing output in the three syntaxes, plus LaTeX output (supported by the LaTex siunitx package. If you want to experiment with the library, build src/c/unity: % ./unity -icds -oogip 'mm2/s' mm**2 /s % ./unity -icds -ofits -v mm/s mm s-1 check: all units recognised? yes check: all units recommended? yes check: all units satisfy constraints? yes % ./unity -ifits -ocds -v 'merg/s' merg/s check: all units recognised? yes check: all units recommended? no check: all units satisfy constraints? no % ./unity -icds -ofits -v 'merg/s' merg s-1 check: all units recognised? no check: all units recommended? no check: all units satisfy constraints? yes In the latter cases, the -v option _validates_ the input string against various constraints. The expression mm/s is completely valid in all the syntaxes. In the FITS syntax, the erg is a recognised unit, but it is deprecated; although it is recognised, it is not permitted to have SI prefixes. In the CDS syntax, the erg is neither recognised nor (a fortiori) recommended; since there are no constraints on it in this syntax, it satisfies all of them (this latter behaviour is admittedly slightly counterintuitive). Pre-requirements ---------------- To build from a distribution, the only pre-requirements are a C and a Java compiler. To build from a source checkout, you need * autoconf * bison or byacc (original yacc might work), and flex or lex * byaccj (http://byaccj.sourceforge.net/) * jflex (http://jflex.de/) * doxygen if you wish to build the documentation Building --------- The usual: % ./configure % make % make check If you're building from a source checkout, you'll need to start with 'autoconf'. Limitations ----------- No mathematical functions in FITS or OGIP parsers No [log] in CDS parser The CDS specification permits non-round factors (that is, factors which aren't a power of ten). These are not permitted in this CDS parser, partly because they're arguably quantities rather than units, but more practically because it significantly complicates the implementation. The software has been developed on OS X, so definitely builds there. I have as yet made no serious attempt to port the library to a different platform, but I don't expect major problems. Norman Gray http://nxg.me.uk -- Norman Gray : http://nxg.me.uk School of Physics and Astronomy, University of Glasgow, UK From norman at astro.gla.ac.uk Fri Jul 22 05:34:49 2011 From: norman at astro.gla.ac.uk (Norman Gray) Date: Fri, 22 Jul 2011 13:34:49 +0100 Subject: Units parser in Java and C In-Reply-To: <080AEAE2-0947-49DC-98DD-80665E3EE00E@astro.gla.ac.uk> References: <080AEAE2-0947-49DC-98DD-80665E3EE00E@astro.gla.ac.uk> Message-ID: <84BE0C6A-8828-4EB2-BF48-F3555995C854@astro.gla.ac.uk> Greetings, all. On 2011 Jul 21, at 17:59, Norman Gray wrote: > I've developed explicit grammars, and recommended unit lists, for the three unit specifications mentioned in S?bastien's presentation, namely FITS, OGIP and CDS. These are implemented by a C library and a Java class library, which can parse unit strings in an indicated syntax (rejecting malformed ones), and write them out in another syntax or in LaTeX. I've made some porting fixes. The download URL is now http://www.astro.gla.ac.uk/users/norman/ivoa/unity/ This should be fairly stable in the short term. Best wishes, Norman -- Norman Gray : http://nxg.me.uk School of Physics and Astronomy, University of Glasgow, UK