Time Domain DAL/DM/TDIG

Petr Skoda skoda at sunstel.asu.cas.cz
Tue Jul 11 18:22:12 CEST 2017


Hi all,

I could not resist  to comment this interesting discussion.  Namely after 
spending  two days at EWASS exoplanet session, giving talk in local radio 
about exoplanets and preparing foreword for a book about exoplanets ;-)
I had to study a current state of the art.

But first comment on KDD involvement:

I am sure that Kai will not comment on metadata as I know after many of 
our discussions he is not interested AT ALL in any metadata about the 
object. The only important item is some ID of given feature vector.
The re-assignment of interesting feature vector back to physical reality 
is not in domain of KDD - but logical continuation of scientific work.
The only utmost interest from KDD point of view is a statistic 
chracterization (namely uncertainities) of the input data -- and Kai 
started to think in probality distribution functions (PDFs) - so the task 
of KDD IG is to think how to replace catalogue number (+ stat error) by a 
full PDF. It is also requirement of LSST.

Our TS data model is prepared for it (and some LSST people had seen it and 
agree - of course it will be discussed still)


But most crucial part of exoplanet bussines is that no data are sure 
(with few exceptions). So far most data were obtained from KEPLER (about 
4500 exoplanets) and lot of data is still not processed. The transit 
method relies on Bayesian analysis of extremely noisy time series which 
almost always yields multiple explanations. many discoveries are 
accelerated by the aim to be first to announce exoEarth, which IMHO make 
some data about exoplanets more speculative than other fundamental 
observations of universe ...

So neither planetary characteristics (rocky, gaseous, mass, diameter, 
revolving and rotation period, chemical composition ...) nor existence of 
multiplanetary exosolar systems can be taken for sure and catalogized as a 
one commonly accepted truth.

Moreover number of confirmed exoplanets (about 3500) is still very low and 
can fit in Excell spreadsheet. - Such tables will be sure often updated .

I consider it nonsense to speculate about metadata structure for 
hierchical storage of planetary system parameters associated with every 
star's light curve !


In SSAP there is a class and subclass parameter - but for many uses cases 
it is not filled. And even here (e.g. LAMOST) it is dangerous to claim the 
object as QSO or STAR for sure but dfinitely the spectra classification - 
as M star, B star etc .. is always wrong for a considerable part of 
pbjects. But it is helpful to have such 
rough classification originating from pipelines available in the object 
catalogue ....

So I would suggest to concentrate on ONLY observable characteristics of a 
light curve (but also spectra) .

Coordinates are good, however for a lot of objects (and namely exoplanets) 
they may not be known - it is part of secret and strongly protected before 
the authors are sure.  So you hardly will find the HD number of the star - 
like HD20794, or other publically known catalogue (Gliese 667, 581) 
always people talk about Corot 7, Kepler 11,22, 62 (and planets like 62e, 
62f .) , Trapist 1g etc ...

In addition most of the coordinates of exoplanets used to be erased from 
FITS headers as well as the exact time of observation (MACHO) (or it was 
e.g. 
rounded to full minute or even hour). In spectra from OHP Sophie the time 
information was modified in spectra of exoplanet hosting stars 
intentionally.

In addition the exact possition is difficult to state also for multiple 
stars on slit of spectrograph or double blob on CCD frame - so in most 
interesting objects the coordinate is not sufficient to identify the 
object (even for binary stars - not only exoplanets)

Concerning characteristics of activity - its another problem but key 
problem for the future ... most stars have stronger activity either time 
variability or RV variability than expected ExoEarth signature.
SO I can imagine the planet hunter wants to query the database to select 
calm - low activity G2 class main sequence stars (or e.g. M red dwarf ) to 
have new candidates to investigate ...
But the activity index is shaky = e.g. how to describe solar activity ?
What about rapid flare outbursts observed (but very hardly ) on some late 
type stars... What about Be stars and QSO outbursts ?

If we return only to observable data:

We may think about amplitude, periodicity - min max value etc ...

So again statistics parameters - But period - typically many periods are 
associated with one star....

There is a one issue i was thinking about already many years (IMHO 2007 
IVOA in Cambridge when I had seen first idea of the time series in VO 
(perhaps from Roy Williams) :

In fact the usage of time series is twofold - something (magnitude, RV 
..., flux..) depending on time - product of pipeline processing and

periodogram - filtered by various methods - and from it are derived 
periods. And for lot of research goals it is important to fold the light 
curve with given periods - its a daily bread of variable star researchers 
....

So the important question is the storage of arbitrary number of periods in 
a light curve and the client which will make the folding ...

than data are represented not dependent on time but on circular phase 
(typically extended from 0 to 2 but also different combinations.

so important metadata for the light curve might be also method used for 
period estimate ...
(similar to method of planet discovery...)

But in general I think that light curve should contain just data obtained 
and the rest should be in some catalogue and combining TAP and sparse cube 
client should allow most use cases ....
(as shown by Laurent)


Petr

I fully agree with Matthew



On Mon, 10 Jul 2017, Matthew Graham wrote:

>
> Object classification is the endpoint of a significant scientific 
> workflow and will not necessarily be available for most time series.

> So these 
> have to be optional and not really in a minimal list.

> I think we need to be quite careful here about feature creep from 
> specific scientific subdomains because that makes things easier for 
> them.




More information about the dal mailing list