VOEvent Update: JSON and data models
Tim Staley
tim at timstaley.co.uk
Sun Oct 15 20:23:39 CEST 2017
Hi VOEvent'ers,
I've now left astronomy for pastures green so I have no proverbial skin
in this game, but I do have some idle Sunday-afternoon musings. Please
excuse the 'stream of consciousness' composition style.
In short, I'd cautiously agree with the decision to move to JSON, but
with a bunch of caveats - a few observations on pros / cons are below.
*Readability / hackability: Win to JSON*
I've never really bought the 'human readability' argument, since XML
dropped into a browser is quite easy to work with (maybe it's a
command-line-only thing) but I'll certainly agree that JSON is easier to
work with for a quick-and-dirty data-grab script, using e.g. the builtin
Python library. +1 for JSON. And it's also more compact (less open/close
tag overhead) so you gain maybe 30-50% on data-rates, though that's
unlikely to matter much for most.
* Data-storage: win to JSON*
As a bonus, this actually fits rather well with using Postgres as a
data-store, since there's the potential to exploit the JSONB data-type
and run in-JSON queries directly (though I've no idea how fast / slow
that is in practice).
*Binary data: Win to JSON?*
It's fairly standard practice to base64-encode binary blobs in JSON
(although I guess you could do the same in XML) - so this could be a
useful intermediate step if you wish to hold off on adopting AVRO.
* Validation-in-transit: Caution here!*
Now, a couple of downsides. First, the flipside to the compactness - by
dropping the open/close tag verbosity you also lose a bit of
self-validation. So if you lose a few bytes in the middle of a long list
of elements, you might not even notice the data-corruption immediately
(also potentially a problem with XML, but silent failures are less
likely due to tag mismatches). So, you might like to kill two birds with
one stone and consider signing / validating your packets for both
authentication and validation purposes (almost certainly using some
pre-existing library to do so,
https://github.com/matrix-org/python-signedjson, being the first Google
result).
* Validation-at-rest: This is a can of worms*
More likely to cause issues in practice is that you're starting from
scratch again with regards to implementing data-schemas. At this stage -
the VOEvent standard now being 11 years old according to wikipedia - I
think it's reasonable to give up on the Betamax standard that is XML and
accept that more people using JSON and therefore being willing to share
their transients is probably the way to go, but if you are concerned
about data-validation and reuse and discoverability (and I know that all
you Virtual Observatory folks are for sure) then the lack of schema
should concern you. Maybe you can translate the XML schema to a JSON
schema and thus keep the language independence (supported in Python via
e.g. https://pypi.python.org/pypi/jsonschema). More to the point, even
with a schema (XML, JSON, whichever) it's almost always possible to
generate schema-compliant but semantically nonsense data-packets via
some simple user-error, so your best bet is (IMHO) a good user-library
that makes generating packets as easy, consistent and foolproof as
possible. If the community can coalesce around a single, collaborative
library then this serves as a 'defacto schema by implementation', since
any schema described in text is likely to still result in edge cases
where the final decision lies in implementation details. So, perhaps a
potential candidate for a new Astropy affiliated library? (There's
likely some good synergy with the 'units' functionality therein.)
* Hidden cost of rewrites / tower-of-babel effect *
My biggest issue with all this is that you're at risk (CF
https://xkcd.com/927/) of creating a 'VOEvent-JSON standard' which in
practice is used as a rough guideline but can't really be relied on to
be adhered to correctly because everyone's just writing their own ad-hoc
JSON entries. So then you'll get a lot of cases of people *expecting*
their packets to behave consistently when really they should be treating
it as a bag of unvalidated JSON until proven otherwise. Perhaps that's a
price worth paying for getting more people on-board. There's also the
risk of being constantly in flux - will anyone put effort into this
generation of JSON tools if the LSST AVRO standard is about to become
the New Hot Thing?
Of course, I have something of a sunk-cost attachment to the XML
standard, having contributed development effort and made heavy use of
the VOEvent-XML tools for the 4 Pi Sky project, so while trying to be as
objective as possible let me just link to the existing tools as a point
of reference to future developers. My feeling is that apart from the
XML-specific voevent-parse, these could quite likely be retooled for
JSON (or whatever serialization standard, really) if someone wanted to
take on the effort:
Authoring: http://voevent-parse.rtfd.io/
Distribution: http://comet.readthedocs.io/en/stable/
Storage / retrieval: http://voeventdb.readthedocs.io/en/latest/
Best of luck!
-Tim
On 15/10/17 16:02, Roy Williams wrote:
> Dear Colleagues
> We respectfully submit an IVOA Note about some proposed improvements to the VOEvent standard. We would appreciate some discussion on this email address and at the upcoming meeting in Chile.
> Thank you for your attention
> Roy Williams, Scott Barthelmy, Eric Bellm, Matthew Graham, Rob Seaman
>
> ==================
>
> VOEvent Update: JSON and data models
> Author(s):
> Roy Williams, Scott Barthelmy, Eric Bellm, Matthew Graham, Rob Seaman
>
> UTL:
> http://ivoa.net/documents/Notes/VOEventJSON/index.html
>
> Abstract
> We propose an extension of the VOEvent format, to translate the packet from XML to JSON – with no semantic change. We also propose to use the VOEvent data model system to define three data-model Groups: “Light Curve”, “Associated Sources”, and “Followup Imaging”. This straightforward update of VOEvent simplifies the syntax and provides simple, standard representation of common astronomical datasets.
More information about the voevent
mailing list