VOEvent Update: JSON and data models

Tim Staley tim at timstaley.co.uk
Sun Oct 15 20:23:39 CEST 2017


Hi VOEvent'ers,

I've now left astronomy for pastures green so I have no proverbial skin 
in this game, but I do have some idle Sunday-afternoon musings. Please 
excuse the 'stream of consciousness' composition style.

In short, I'd cautiously agree with the decision to move to JSON, but 
with a bunch of caveats - a few observations on pros / cons are below.

*Readability / hackability: Win to JSON*
I've never really bought the 'human readability' argument, since XML 
dropped into a browser is quite easy to work with (maybe it's a 
command-line-only thing) but I'll certainly agree that JSON is easier to 
work with for a quick-and-dirty data-grab script, using e.g. the builtin 
Python library. +1 for JSON. And it's also more compact (less open/close 
tag overhead) so you gain maybe 30-50% on data-rates, though that's 
unlikely to matter much for most.

* Data-storage: win to JSON*
As a bonus, this actually fits rather well with using Postgres as a 
data-store, since there's the potential to exploit the JSONB data-type 
and run in-JSON queries directly (though I've no idea how fast / slow 
that is in practice).

*Binary data: Win to JSON?*
It's fairly standard practice to base64-encode binary blobs in JSON 
(although I guess you could do the same in XML) - so this could be a 
useful intermediate step if you wish to hold off on adopting AVRO.

* Validation-in-transit: Caution here!*
Now, a couple of downsides. First, the flipside to the compactness - by 
dropping the open/close tag verbosity you also lose a bit of 
self-validation. So if you lose a few bytes in the middle of a long list 
of elements, you might not even notice the data-corruption immediately 
(also potentially a problem with XML, but silent failures are less 
likely due to tag mismatches). So, you might like to kill two birds with 
one stone and consider signing / validating your packets for both 
authentication and validation purposes (almost certainly using some 
pre-existing library to do so, 
https://github.com/matrix-org/python-signedjson, being the first Google 
result).

* Validation-at-rest: This is a can of worms*
More likely to cause issues in practice is that you're starting from 
scratch again with regards to implementing data-schemas. At this stage - 
the VOEvent standard now being 11 years old according to wikipedia - I 
think it's reasonable to give up on the Betamax standard that is XML and 
accept that more people using JSON and therefore being willing to share 
their transients is probably the way to go, but if you are concerned 
about data-validation and reuse and discoverability (and I know that all 
you Virtual Observatory folks are for sure) then the lack of schema 
should concern you. Maybe you can translate the XML schema to a JSON 
schema and thus keep the language independence (supported in Python via 
e.g. https://pypi.python.org/pypi/jsonschema). More to the point, even 
with a schema (XML, JSON, whichever) it's almost always possible to 
generate schema-compliant but semantically nonsense data-packets via 
some simple user-error, so your best bet is (IMHO) a good user-library 
that makes generating packets as easy, consistent and foolproof as 
possible. If the community can coalesce around a single, collaborative 
library then this serves as a 'defacto schema by implementation', since 
any schema described in text is likely to still result in edge cases 
where the final decision lies in implementation details. So, perhaps a 
potential candidate for a new Astropy affiliated library? (There's 
likely some good synergy with the 'units' functionality therein.)

* Hidden cost of rewrites / tower-of-babel effect *
My biggest issue with all this is that you're at risk (CF 
https://xkcd.com/927/) of creating a 'VOEvent-JSON standard' which in 
practice is used as a rough guideline but can't really be relied on to 
be adhered to correctly because everyone's just writing their own ad-hoc 
JSON entries. So then you'll get a lot of cases of people *expecting* 
their packets to behave consistently when really they should be treating 
it as a bag of unvalidated JSON until proven otherwise. Perhaps that's a 
price worth paying for getting more people on-board. There's also the 
risk of being constantly in flux - will anyone put effort into this 
generation of JSON tools if the LSST AVRO standard is about to become 
the New Hot Thing?

Of course, I have something of a sunk-cost attachment to the XML 
standard, having contributed development effort and made heavy use of 
the VOEvent-XML tools for the 4 Pi Sky project, so while trying to be as 
objective as possible let me just link to the existing tools as a point 
of reference to future developers. My feeling is that apart from the 
XML-specific voevent-parse, these could quite likely be retooled for 
JSON (or whatever serialization standard, really) if someone wanted to 
take on the effort:

Authoring: http://voevent-parse.rtfd.io/
Distribution: http://comet.readthedocs.io/en/stable/
Storage / retrieval: http://voeventdb.readthedocs.io/en/latest/

Best of luck!

-Tim


On 15/10/17 16:02, Roy Williams wrote:
> Dear Colleagues
> We respectfully submit an IVOA Note about some proposed improvements to the VOEvent standard. We would appreciate some discussion on this email address and at the upcoming meeting in Chile.
> Thank you for your attention
> Roy Williams, Scott Barthelmy, Eric Bellm, Matthew Graham, Rob Seaman
>
> ==================
>
> VOEvent Update: JSON and data models
> Author(s):
> Roy Williams, Scott Barthelmy, Eric Bellm, Matthew Graham, Rob Seaman
>
> UTL:
> http://ivoa.net/documents/Notes/VOEventJSON/index.html
>
> Abstract
> We propose an extension of the VOEvent format, to translate the packet from XML to JSON – with no semantic change. We also propose to use the VOEvent data model system to define three data-model Groups: “Light Curve”, “Associated Sources”, and “Followup Imaging”. This straightforward update of VOEvent simplifies the syntax and provides simple, standard representation of common astronomical datasets.




More information about the voevent mailing list