Describing events and streams

Fri May 5 11:45:26 CEST 2017

Hi all,

Looking ahead to the time-domain sessions in Shanghai, I'd like to raise 
some questions which I hope may act as catalysts for further discussion.

I'd like start by asking people to give a short description of the type 
of events they are interested in, either as producers or as consumers.

In particular, it would be interesting to hear from event consumers on 
how they would describe the type of events they are interested in.

Not just the type of astronomical event that you are interested in (SN, 
GRB etc), but also the contents of the VOEvents that you are interested 
in receiving (everything, everything that matches [criteria], everything 
apart from [criteria] ...).

As the bandwidth and data rate of event streams increases, it will 
become impractical for an individual scientist to request and process 
all of the events from a primary source like LSST.

The current solution to this is that there will be a number of brokers 
who will consume primary streams from the large surveys and then publish 
smaller filtered streams containing specific types of events.

I'm interested in how we describe these secondary event streams.

How important does provenance become when we start to have third or 
fourth generation streams further processing and filtering the events 
from multiple sources, including other brokers?

What criteria will people want to use to select new streams?

Will people stick to using streams from the well known projects, or will 
they want to explore a range of available streams to find what they 
want?

How does this change the use case for third party brokers?

If someone is interested in specific types of astronomical events, what 
criteria will they use to choose which event streams they listen to?

Is the reputation of the primary source important?

Is the algorithm used to classify events important?

Perhaps they choose stream A because it is from a well known source with 
a good reputation, and stream B because it uses a new filtering 
algorithm they think is interesting?

How can we describe the filtering algorithm for stream B in a way that 
users can look for similar streams using similar algorithms?

Is response time important?

Is classification accuracy important?

A new filtering algorithm may be much better at detecting specific types 
of astronomical events, but it requires a larger set of historic 
measurements to make the assessment, which increases the latency between 
the first event and the classification result.

In some cases latency might not be an issue, preferring classification 
accuracy over fast response time.

In other cases, fast response time is vital, and the event consumer has 
to be willing to cope with a corresponding drop in accuracy.

How do we describe things like classification accuracy, false positive 
rates, etc?

Is this up to the event provider to measure and publish their own 
accuracy statistics or would some form of third party rating mechanism 
be useful, possibly based on feedback on results from event consumers?

What happens when the processing behind an event stream changes?

We are used to dealing with static archives of data, that produces fixed 
data releases.

A specific data release of an archive is fixed. Running the same query 
will give the same results today and tomorrow.  An event stream will 
change over time, not only because tomorrow will have different data 
flowing through it, but the processing pipeline that generates the 
events may change.

How do we describe changes to an upstream processing pipeline in a way 
that is useful to end users?

Some of these may be non-issues, some of these may need to wait until we 
build some prototypes and experiment with them.

The answers some of these questions may influence how we design the 
event networks and brokers.

Interested to hear what you think.

Thanks,
Dave

--------
Dave Morris
Research Software Engineer
Wide Field Astronomy Unit
Institute for Astronomy
University of Edinburgh
--------