VOEvent at Heidelberg InterOp

Fri May 17 05:35:01 PDT 2013

Hi John,

I'm also traveling, responses will be intermittent.

> I'm currently on the train home from the (ongoing) InterOp in Heidelberg. I hope that Matthew (or possibly his successor as TDIG chair, depending on the outcome of today's Exec meeting) will provide some commentary on the general TDIG relevant discussion at the meeting, but there were a few VOEvent-specific items which I thought might be worthy of further discussion.

The word was that you were to be TDIG chair.  You might want to contact the Exec :-)

> the latest draft of the VOEvent Transport Protocol document is available from <http://tinyurl.com/20130513vtp>.

Will read and comment in a week or so.

> With that out of the way: firstly, Mario worried about the relatively heavyweight nature of the XML serialization, pointing to a particular example from the IVOA website where a several-hundred-character VOEvent provides about 40 characters of actually useful information. In the LSST world of 2e6 events/night, that's obviously a substantial overhead. This naturally makes me recall previous discussions of alternative VOEvent serializations (JSON, anybody…?).

The complicated part here is not the serialization, but conveying knowledge of the survey-specific schema.  For a long project like LSST it is also reasonable to expect the schema to evolve.  What is the status of the registry work?  JSON has been mentioned.  We could also look into XML compression options - the corollary of Mario's observation about the size versus content of the messages is just that the raw entropy is low.

> Of course, by ~2020 (indeed, by 2013) shifting a couple of million messages a night, even if they contain kilobytes of XML overhead, doesn't seem prohibitively expensive – and even if that were an issue within some part of the LSST system, they could presumably define their own internal event representation, and only reserialize it to XML for broadcast to the rest of the world.

Rather, VOEvent should seek to remain the standard for community-wide event representation, both for internal and external purposes.  Evolution is inevitable whether or not VOEvent remains the go-to format.

However, from the first VOEvent workshop we have discussed LSST use cases and the natural idea has been to package the events from each two-image LSST "visit" into some composite data structure that should certainly be layered on VOEvent (perhaps a table).  LSST has only ever obligated themselves to publishing the full firehose and relying on external projects and institutions to filter the events into different streams.

> Another issue Mario raised was that of embedding richer content, such as thumbnail images, into VOEvent packets.

This was also a feature mentioned (by among others, me) early on.  Perhaps it will receive more attention now.  There is a bit of tension between these two notions, of course.  (Streamlining the format while adding rich content :-)

> The argument here is that the existing reference mechanism isn't necessarily scalable to the volumes LSST needs: they don't want to have to field 2e6 * n_subscribers * n_references call-back requests every night, and, further, worry about the additional latency for event consumers (who, rather than immediately making a decision on whether to perform follow-up, now have to request additional information and wait for that to be delivered before they can proceed).

I think I recall Tim Axelrod making this same point at the first VOEvent workshop in 2005.  There's no reason we couldn't put more effort into some enhanced referencing mechanism at this point.

> While I see the argument, my gut is rather sceptical of the above: it seems to me that, rather than event authors fielding millions of callbacks, they can easily and (relatively…) cheaply make their content available on distribution networks to which the number of requests is fairly trivial (let S3 take the strain!), and rather focus on driving down latency by keeping the VOEvent packets themselves small and distribution networks fast.

There certainly have been numerous changes to the computing landscape since 2005.

> Indeed, the above makes me wonder if there should be a standardized upper limit on the size of VOEvent messages being distributed over VTP.

Rather, perhaps recommendations for best practices.  We should be skeptical of hard limits.

> Of course, such a limit already exists in that we use a 32 bit integer to specify the size of the packet being transmitted.

That should be generous enough :-)  I would say a reasonable goal should be that VOEvent software and formats should only require 32-bit hosts.  (All those legacy facilities on mountaintops.

> But should we mandate that any brokers signing up to the "VOEvent backbone" are obliged to carry messages up to that size? Or up to some other limit?

We need to give thought to the general issue of service requirements.

> Thoughts on any of the above?

LSST as an operational facility is several years away.  LSST as a project has been ongoing for years, and will kick into high gear very soon.  The pertinent deadline for addressing some of these issues to a solid conceptual stage (and soon after, to a proof of concept) is more like months than years.  They are preparing for design reviews right now.

Addressing the concerns expressed by Mario would be a good topic for Hotwired3 (http://hotwireduniverse.org).

Rob