queries on VOEvent collections
Roy Williams
roy.williams at ligo.org
Mon May 23 08:39:13 PDT 2011
Post about VOEvents and their collections, from an informatics point of
view, from http://news.skyalert.org/?p=86. How can we better promote
interoperability of VOEvent services? How do you want to interact with a
repository? Your thoughts most welcome.
Events, Portfolios, and Mental Models
-------------------------------------
The Recommendation of VOEvent 2.0 draws ever closer, after successfully
running the gauntlet at the spring meeting of the International Virtual
Observatory Alliance. Our thoughts now move beyond the standard to the
exciting science that can be done now that interoperability is solved:
we can have multiple authors and multiple software contributing to the
rapidly-evolving picture of an astronomical transient, with machines
fusing that data to make rapid, accurate decisions. We will need to
think of how VOEvents are authored, forwarded, selected, stored,
queried, and mined. In each of these cases, we wish to provide the most
appropriate ‘representation’ of the data in the VOEvent. Below are some
suggestions for this representation.
We want to keep the semantics of VOEvent as much as possible, so that
the same data is available in all the representations as much as
reasonable. We can focus on individual events, or extend the
representation for the VOEvent aggregates that we have come to call
‘portfolio’. A portfolio is a collection of multi-sourced VOEvent
packets whose subject is the same astronomical transient, and they are
associated through citation of one by another — the observation is a
VOEvent, combined with followups or classification results also
formatted as VOEvents. The VOEvent is an observation of something: it is
that something that brings multiple events together.
(1) XML API
A single VOEvent, is an XML file. This representation carries the most
fidelity to the intent of the original author, even though some links
may be replaced for caching. A portfolio is a collection of VOEvent
files that are mutually connected through citations in a graph, and it
can be stored as a zip or tar etc. Querying is through Xpath, Xquery, or
XML libraries like lxml or Jax. Custom API can be made made from the
VOEvent schema through code binding.
(2) Dictionary API
Each event can be thought of as a key-value dictionary, one for each
piece of data extracted from the event XML. Some keys are mandated by
the VOEvent schema, (eg AuthorName, ISOtime) , and others come from the
Group and Param name combinations specific to that stream, with the
value an int, float, or string. Internal tables can be handled by
allowing the value of such a key to be a vector — the values from the
table column. A portfolio can then be a union of these dictionaries,
each representing an event; to prevent name collision, each key would
also contain the name of the event it comes from. This representation is
natural for presentation templates and dictionary expressions: a table
column such as e['lightCurve']['Vmag'] can become a python list of
numbers. This representation is effective when a single portfolio is to
be examined in detail, or annotated, perhaps classification of light
curves or analysis of ephemeris. It can also be a table structure, with
each row of the table having Stream, Group, Param, and Value, with
queries that select on these tables.
(3) Relational table API
Here we are not representing a single astronomical transient but rather
astro-informatics, with many transients in a table, searching,
selecting, sorting, visualizing, and clustering. The columns of the
table come from the stream of which the events are instances. Each
VOEvent is translated to a row in the table (internal tables are not
shown in this representation). Some columns are defined in the VOEvent
standard, such as sky position and time; others are part of the stream
definittion (params). We can also represent a collection of similarly
structured portfolios; perhaps an event from stream A together with one
from stream B, one the observation and the other the classification. For
portfolios, there is a choice about joins: we can work with multiple
tables, each representing the events from a given stream, and portfolio
relationships represented as foreign keys; alternatively, we can fuse
all stream tables into one, so that each portfolio is represented by a
single record of this super-table. Working in this representation, we
use SQL queries that depend on Param and other values from the events.
More information about the voevent
mailing list