queries on VOEvent collections

Roy Williams roy.williams at ligo.org
Mon May 23 08:39:13 PDT 2011


Post about VOEvents and their collections, from an informatics point of 
view, from http://news.skyalert.org/?p=86. How can we better promote 
interoperability of VOEvent services? How do you want to interact with a 
repository? Your thoughts most welcome.

Events, Portfolios, and Mental Models
-------------------------------------
The Recommendation of VOEvent 2.0 draws ever closer, after successfully 
running the gauntlet at the spring meeting of the International Virtual 
Observatory Alliance. Our thoughts now move beyond the standard to the 
exciting science that can be done now that interoperability is solved: 
we can have multiple authors and multiple software contributing to the 
rapidly-evolving picture of an astronomical transient, with machines 
fusing that data to make rapid, accurate decisions. We will need to 
think of how VOEvents are authored, forwarded, selected, stored, 
queried, and mined. In each of these cases, we wish to provide the most 
appropriate ‘representation’ of the data in the VOEvent. Below are some 
suggestions for this representation.

We want to keep the semantics of VOEvent as much as possible, so that 
the same data is available in all the representations as much as 
reasonable. We can focus on individual events, or extend the 
representation for the VOEvent aggregates that we have come to call 
‘portfolio’. A portfolio is a collection of multi-sourced VOEvent 
packets whose subject is the same astronomical transient, and they are 
associated through citation of one by another — the observation is a 
VOEvent, combined with followups or classification results also 
formatted as VOEvents. The VOEvent is an observation of something: it is 
that something that brings multiple events together.

(1) XML API
A single VOEvent, is an XML file. This representation carries the most 
fidelity to the intent of the original author, even though some links 
may be replaced for caching. A portfolio is a collection of VOEvent 
files that are mutually connected through citations in a graph, and it 
can be stored as a zip or tar etc. Querying is through Xpath, Xquery, or 
XML libraries like lxml or Jax. Custom API can be made made from the 
VOEvent schema through code binding.

(2) Dictionary API
Each event can be thought of as a key-value dictionary, one for each 
piece of data extracted from the event XML. Some keys are mandated by 
the VOEvent schema, (eg AuthorName, ISOtime) , and others come from the 
Group and Param name combinations specific to that stream, with the 
value an int, float, or string. Internal tables can be handled by 
allowing the value of such a key to be a vector — the values from the 
table column. A portfolio can then be a union of these dictionaries, 
each representing an event; to prevent name collision, each key would 
also contain the name of the event it comes from. This representation is 
natural for presentation templates and dictionary expressions: a table 
column such as e['lightCurve']['Vmag'] can become a python list of 
numbers. This representation is effective when a single portfolio is to 
be examined in detail, or annotated, perhaps classification of light 
curves or analysis of ephemeris. It can also be a table structure, with 
each row of the table having Stream, Group, Param, and Value, with 
queries that select on these tables.

(3) Relational table API
Here we are not representing a single astronomical transient but rather 
astro-informatics, with many transients in a table, searching, 
selecting, sorting, visualizing, and clustering. The columns of the 
table come from the stream of which the events are instances. Each 
VOEvent is translated to a row in the table (internal tables are not 
shown in this representation). Some columns are defined in the VOEvent 
standard, such as sky position and time; others are part of the stream 
definittion (params). We can also represent a collection of similarly 
structured portfolios; perhaps an event from stream A together with one 
from stream B, one the observation and the other the classification. For 
portfolios, there is a choice about joins: we can work with multiple 
tables, each representing the events from a given stream, and portfolio 
relationships represented as foreign keys; alternatively, we can fuse 
all stream tables into one, so that each portfolio is represented by a 
single record of this super-table. Working in this representation, we 
use SQL queries that depend on Param and other values from the events.


More information about the voevent mailing list