Shanghai Interop: Last Call

Thu May 11 07:49:42 CEST 2017

Dear all,

A final reminder that there will be a TDIG session focusing on VOEvents
and transient alerts at next week's Interop. We'll convene in Room E at
16:00 on Monday (15 May). We'll hear Pierre Le Sidaner discuss the work
he and his colleagues have done on setting up contemporary VOEvent
networks, while Maria Patterson will talk about the alerting system she
is setting up for ZTF and LSST. We'll follow that up with a discussion
focused around responding to the needs expressed by those speakers and
the topics suggested by Dave last week (attached below for reference).

NB: there is still space in the session to add more presentations. If
you'd like to contribute, please drop me a note.

In addition, there two joint sessions planned with the DAL and DM
working groups on Thursday (18 May) in Room D, the first at 14:00 and
the second at 16:00. These will focus on developing a data model for
time series data. There's a varied programe of talks as well as time for
discussion: please come along and make sure your use cases are
represented.

Looking forward to seeing everybody in a few days,

John

----- Forwarded message from Dave Morris <dave.morris at metagrid.co.uk> -----

Date: Fri, 5 May 2017 10:45:26 +0100
From: Dave Morris <dave.morris at metagrid.co.uk>
Subject: Describing events and streams
To: voevent at ivoa.net

Hi all,

Looking ahead to the time-domain sessions in Shanghai, I'd like to raise
some questions which I hope may act as catalysts for further discussion.

I'd like start by asking people to give a short description of the type
of events they are interested in, either as producers or as consumers.

In particular, it would be interesting to hear from event consumers on
how they would describe the type of events they are interested in.

Not just the type of astronomical event that you are interested in (SN,
GRB etc), but also the contents of the VOEvents that you are interested
in receiving (everything, everything that matches [criteria], everything
apart from [criteria] ...).

As the bandwidth and data rate of event streams increases, it will
become impractical for an individual scientist to request and process
all of the events from a primary source like LSST.

The current solution to this is that there will be a number of brokers
who will consume primary streams from the large surveys and then publish
smaller filtered streams containing specific types of events.

I'm interested in how we describe these secondary event streams.

How important does provenance become when we start to have third or
fourth generation streams further processing and filtering the events
from multiple sources, including other brokers?

What criteria will people want to use to select new streams?

Will people stick to using streams from the well known projects, or will
they want to explore a range of available streams to find what they
want?

How does this change the use case for third party brokers?

If someone is interested in specific types of astronomical events, what
criteria will they use to choose which event streams they listen to?

Is the reputation of the primary source important?

Is the algorithm used to classify events important?

Perhaps they choose stream A because it is from a well known source with
a good reputation, and stream B because it uses a new filtering
algorithm they think is interesting?

How can we describe the filtering algorithm for stream B in a way that
users can look for similar streams using similar algorithms?

Is response time important?

Is classification accuracy important?

A new filtering algorithm may be much better at detecting specific types
of astronomical events, but it requires a larger set of historic
measurements to make the assessment, which increases the latency between
the first event and the classification result.

In some cases latency might not be an issue, preferring classification
accuracy over fast response time.

In other cases, fast response time is vital, and the event consumer has
to be willing to cope with a corresponding drop in accuracy.

How do we describe things like classification accuracy, false positive
rates, etc?

Is this up to the event provider to measure and publish their own
accuracy statistics or would some form of third party rating mechanism
be useful, possibly based on feedback on results from event consumers?

What happens when the processing behind an event stream changes?

We are used to dealing with static archives of data, that produces fixed
data releases.

A specific data release of an archive is fixed. Running the same query
will give the same results today and tomorrow.  An event stream will
change over time, not only because tomorrow will have different data
flowing through it, but the processing pipeline that generates the
events may change.

How do we describe changes to an upstream processing pipeline in a way
that is useful to end users?

Some of these may be non-issues, some of these may need to wait until we
build some prototypes and experiment with them.

The answers some of these questions may influence how we design the
event networks and brokers.

Interested to hear what you think.