A Virtual Observatory Data Model

Ed Shaya edward.j.shaya.1 at gsfc.nasa.gov
Mon May 12 08:52:41 PDT 2003


Frank,
    I think this is the right direction.  I just want to add a few items 
that the scientist, even the non-instrument scientist, needs to have but 
I think are missed here.
1. resolution - Although photons fall into bins or pixels, the 
resolution tells one the probability that a photon in a particular bin 
could have fallen into a nearby bin.
2. crosstalk - There is some tendency for the liberated electrons to 
leak to the adjacent pixel.  So, even when pixel size is much larger the 
spatial resolution the photon has a distinct probability of registering 
in the wrong pixel.

I believe that time-series data is also covered by this model.
VOCLASS = 4DBIN.TIMESERIES

Finally, what do we do about observations on a moving target (asteroid, 
planet, comet, perhaps even high proper motion star)?   I think there 
should be an option to substitute an object name for the two spatial 
coordinates (unless you want to define a path on the celestial sphere).

Ed


Frank Valdes wrote:

>                        [NOAO Data Products Program]
>
>                      A Virtual Observatory Data Model
>
>                              Francisco Valdes
>                              fvaldes at noao.edu
>                                May 9, 2003
>
>COVER LETTER
>
>The following (also http://iraf.noao.edu/projects/vo/dal/datamodel.html) is
>a contribution for the data model discussions at the working groups meeting
>next week in Cambridge. It is an extension of some earlier ideas ( [1] , [2]
>) on including celestial coordinates in the WCS for 1D and 2D spectra and on
>the question about whether accessing spectra and images in the prototype VO
>framework requires different protocols. Because of the deadline imposed by
>the meeting the discussion is abbreviated in some areas. My hope is
>emphasize the general philosophy and approach. There are some important
>ideas which I support in the Spectral Data Models draft from Jonathan
>McDowell and Steve Lowe but there are some philosophical differences which I
>wanted to offer. Primarily the ideas of treating images and spectra as
>projections of a more general class and simplifying as much as possible by
>limiting VO data to "calibrated" forms which don't require complex metadata
>to interpret.
>
>Because I decided it was more valuable to try and build a consistent
>discussion from my perspective I did not have time to also critique the
>McDowell and Lowe concepts. But it makes sense that before doing that one
>really needs to reach concensus on whether to treat spectra of various
>dimensions separately or whether to work towards an integrated
>spectrum/image data model.
>
>Good luck in the meetings. I'm sorry I can't be there.
>
>Frank Valdes
>
>   * [1] Spectral WCS Conventions
>     About FITS WCS and how 1D and 2D spectra can include celestial
>     coordinates.
>   * [2] Incorporating Spectra in the Next Phase of the Virtual Observatory
>
>1. What is a virtual observatory data model?
>
>The first hurtle to overcome in defining virtual observatory (VO) data
>models is to understand what they are and what they are not. In the
>discussion given here a VO data model is the SIMPLEST abstraction of
>physically calibrated, wavelength regime and detector technology independent
>astronomical data.
>
>We emphasize simplest because a key part of the VO concept is that users,
>called VO observers, should not need to be experts in every regime of
>astronomy and instead only be educated astrophysicists. The science done by
>VO observers generally involves data from various telescopes and various
>energy subdisciplines. The reason for striving towards the simplest
>description is to allow concensus and interoperability between a wide
>variety of data providers.
>
>The other side of the question, which should be a mantra of sorts, is:
>
>    "VO data models are not FITS or file formats"
>    "VO data models are not archived data"
>    "VO data models are not instrumental data"
>
>2. Celestial Sphere Binned Photon Observations - 4DBIN
>
>This document defines a broad class of astronomical data called "Celestial
>Sphere Binned Photon Observations". Note that the detailed definition of the
>class identified by this label is more specific than the literal
>interpretation of the words. The definition of the class flows from the name
>as follows.
>
>     Celestial Sphere
>          Restricts the class to data about the two dimensional
>          celestial sphere. There are two spatial parameters specifying
>          the longitude and a latitude in some specified celestial
>          system.
>     Photon
>          Restricts the class to data about the photon energies as
>          described by an energy parameter.
>     Binned
>          Restricts the class to data about the number of photons
>          arriving over finite regions, called bins, of the parameter
>          domain. A way to look at this is that photon events are
>          indistinguishable within a bin. A further restriction is that
>          the bins are rectangular so they may be described by a center
>          and width in each parameter.
>     Observations
>          Restricts the class to data about photons over a time
>          described by time parameter. Observation evokes the idea of
>          detecting photons over an integration period, though
>          simulation and model results can be cast into simulated
>          observations.
>
>This definition of the class has four parameters; celestial position,
>energy, and time. This forms a continuous space or domain which is divided
>into a set of bins that are not necessarily uniformly distributed or of
>equal size. Each bin is associated with the number of photons it contains.
>The number of photons may be expressed in various ways such as number,
>energy, and flux.
>
>This class may be thought of data obtain through the following process.
>Photons of various energies are detected as a function of time coming from
>points on the sky. Each photon is tagged by four numbers from a four
>dimensional continuous space. The numbers are a latitude and longitude on
>the celestial sphere from which the photon arrives, the energy of the
>photon, and the time. The continuous space is divided into a set of discrete
>regions or bins which are indexed in some fashion. The photons are counted
>in each bin. The details of the continuous energy, position, and time
>parameters are lost and only the bin index and bin counts are retained.
>
>This definition makes a notable distinction between the measured quantity,
>the photons, and the sampling, the bins. This distinction is often confused
>or lost. The photons, sometimes thought of as the "z" axis in an image, is
>the scientific content which is conveyed in standard physical units. The
>sampling or binning is variable and dependent on the way the data was
>obtained. The VO infrastructure or the data providers may "convert" units
>for the photon values and "resample" the bins at the request of the VO
>observer.
>
>To identify data which falls into this class we define a top level tag
>
>        VOCLASS = 4DBIN
>
>2.1 What is the difference between VO data and observational data?
>
>A key aspect of virtual observatory photon binned data is that the primary
>bin values be calibrated to standard physically meaningful units. There are
>two important reasons for this. One is to allow VO observers to easily
>intercompare data with only simple physical unit conversions. The other is
>to simplify the data model and limit metadata which must be supplied to
>allow meaningful interpretation.
>
>This does provide a small burden on the data providers above what has been
>typical. For instance, optical imaging often provides data in digital units
>with the conversion to photons implicit in a gain and a magnitude zeropoint.
>For VO data the data provider does the gain multiplication and conversion of
>the magnitude system to photon based units so that non-optical astronomers
>don't need to understand the detector technology, many of the ideas of
>magnitudes, and the metadata doesn't need to include a gain and magnitude
>zeropoint.
>
>In order to provide a "caveat emptor" option to the VO observers and data
>providers, a top-level metadata declaration is whether the primary data
>values meet the VO standard for this class:
>
>        4DBIN.CALIBRATED = [yes|no|relative]
>
>By asserting "no" the data may be useful but would require the VO observer
>to calibrate it themselves in some way. The "relative" calibration is a way
>to assert that the data is proportional to photon counts and that the
>response to photon fluxes is independent of position (after taking
>differences in bin sizes into account). Therefore, relative comparison
>between different bins is scientifically meaningful even though an absolute
>calibration is not defined.
>
>Note that the first sentence of this section refers to the "primary photon
>bin values". The reason for this is that the observational and calibration
>characteristics appear in the ancillary data and metadata. This is primarily
>contained in the uncertainties but some other useful information may be
>provided in exposure maps and data quality flags.
>
>2.2 What is an image and a spectrum?
>
>In as much as astronomers define and distinguish between "images" and
>"spectra", an image is a subclass with only a single energy bin, a single
>time bin, and multiple bins in both spatial parameters. The energy bin is
>often fairly wide but not always. A spectrum also has only a single time
>bin, but has more than one energy bin, and one or more spatial bins.
>
>Astronomers also typically discriminate between spectra having a single
>spatial bin, called a "one-dimensional spectrum", and multiple spatial bins,
>often called a "data cube". The special case of spatial bins restricted to a
>curve on the celestial sphere is called a "slit spectrum".
>
>In this document there is no distinction made between spectra and images.
>However, one could choose to subclass the metadata concepts. A subclass
>means using implicit and explicit conventions and defaults. The subclasses
>might be:
>
>        VOCLASS = 4DBIN.IMAGE
>        VOCLASS = 4DBIN.1DSPECTRUM
>        VOCLASS = 4DBIN.SLITSPECTRUM
>        VOCLASS = 4DBIN.DATACUBE
>
>3. Metadata
>
>Data from 4DBIN Class fundamentally consists of a set of numbers related to
>photon counts. To make sense of this set of numbers requires metadata or
>conventions which describe the relationship between photon counts and the
>bin value, define the bins, the uncertainties in the values, and associated
>attributes.
>
>As a thought experiment, which we use to identify the metadata through a use
>case, suppose one is given the set of numbers {0,6,7,2,5,3,1,4}. What do we
>need to understand something about the photons observed on the sky? Along
>these lines the minimal metadata necessary should be separated from optional
>metadata. Here we suggest the minimal description is provided by section 3.1
>on the bin geometry and section 3.2 on the bin values.
>
>First we need a top level piece of metadata defining the class and
>conventions. This type of metadata is sometimes associated with a name, such
>as FITS (with SIMPLE=T). For this document we define this metadata class
>domain
>
>        VOCLASS = 4DBIN
>
>3.1 Bin Geometry
>
>The metadata for the bin geometry describes the mapping from the continuous
>four dimensional photon parameter space to the discrete indexed bins. As
>noted in section 2, the bins are required to be described by a center and
>width along each of the four parameter dimensions. This constitutes the bin
>geometry.
>
>The first thing we need is a definition for the indexing of the data bin
>values. There are two straightforward ways to do this. One is to use the
>ordinal of the data value set. The other is to arrange the values into an
>array. For the 4DBIN class the array is required to be four dimensional.
>
>        4DBIN.INDEXING = ordinal
>        4DBIN.INDEXING = array(N1,N2,N3,N4)
>
>3.1.1 Ordinal or tabular indexing
>
>The first method is completely general while the second requires the number
>of data values to be the product of the array dimensions. At this point the
>two indexing schemes seem pretty much the same. The distinction comes in how
>the indices are used to map to the bin geometries in the four dimensional
>parameter space. In practice, the ordinal indexing is used with a table and
>the array is used for gridded bins.
>
>In the ordinal indexing the metadata includes a table of bin geometry
>values. The table is a set of numbers ordered such that each sequential set
>of eight values define a line and the line number corresponds to the data
>value with matching ordinal. For example, the first eight numbers apply to
>the first data value, the second eight to the second data value, and so
>forth. The eight values are the bin centers in longitude, latitude, energy,
>and time followed by bin widths.
>
>In the simple 1D spectrum example we might have
>
>  0 : 12h10m15s 32d15m10s 4001A 2003-05-07T12:10:15 1arcsec 1arcsec 1A 300s
>  6 : 12h10m15s 32d15m10s 4002A 2003-05-07T12:10:15 1arcsec 1arcsec 1A 300s
>
>3.1.2 Array or raster indexing
>
>For the array indexing we use a metadata description along the lines of the
>FITS WCS. This is a complex description which we only touch on here with
>attention to the restrictions imposed by the 4DBIN class. The metadata
>components would include many of the basic elements of the FITS WCS
>metadata. Besides the actual formalism for evaluating the bin centers and
>widths another key piece of metadata is the units of the four parameters.
>
>The main restriction on the FITS WCS formalism as it applies to the 4DBIN
>class is that the axes ordering is required to be latitude, longitude,
>energy, and time and so the FITS WCS is always a WCSDIM of 4. The FITS WCS
>does not currently explicitly define time coordinates. But for the main data
>types of interest, images and spectra with a single time bin, we simply use
>a linear WCS.
>
>The bin centers are a direct analog to the pixel centers in the FITS WCS.
>There is a linear mapping from the array index to an intermediate WCS
>coordinate. There is potentially a distortion transformation to an ideal
>intermediate coordinate. For calibrated data typical of the VO this should
>not be required except possibly to describe the path of a slit spectrum on
>the sky. Finally there is a projection or standard non-linear transformation
>to the final coordinates.
>
>One new feature of the FITS WCS formalism is use of a lookup table. This
>allows for bin centers which are not uniformly arrayed in the parameter
>space. It can provide similar information to the ordinal description.
>
>The concept of bin widths is only implicit in the FITS WCS formalism. For
>the array indexing metadata model defined here, the bin widths are computed
>from the WCS using the idea that the WCS functions are continuous in the
>index space. So the bin edges are computed by adding and subtracting
>one-half to the integer indices and evaluating the parameter value at those
>points. The WCS formalism is more general than simple rectangular bins so
>this computation is done by varying only the index of one parameter. The
>width of the bin is average difference from the integer index center and the
>two half index values.
>
>3.2 Bin Values
>
>Section 2.1 declares that calibrated 4DBIN data be in certain physical units
>directly related to the photons and the bin sizes. The primary metadata for
>the bin values is then the units. For example,
>
>    4DBIN.VALUES.UNITS = ergs/s/cm^2/A
>    4DBIN.VALUES.UNITS = photons
>    4DBIN.VALUES.UNITS = Jy
>
>The definition of the allowed units also needs to provide standards such as
>calibrations to above the atmosphere.
>
>When there is a significant variation in the detection of photons across an
>energy bin, such as occurs with a filter in a broadband image, the
>calibration must be referenced to the filter system.
>
>    4DBIN.FILTER = Johnson(B)
>
>Background contributions need to be described by primary metadata.
>
>    4DBIN.VALUES.BACKGROUND = Subtracted using nearby simultaneous observations
>    4DBIN.VALUES.BACKGROUND = Subtracted by CCD shuffling
>    4DBIN.VALUES.BACKGROUND = None subtracted
>
>3.4 Uncertainties
>
>For identification purposes, such as finding sources or redshifts, and when
>the magnitude of the signal is high, such as continuum shapes over decades
>of energy, the uncertainties about the data bin values may not be important.
>In other words, there a a number of uses for calibrated VO data that just
>depend on the data units and the the binning.
>
>But for detailed measurements where detection and instrumental effects are
>important, a significant piece of metadata are the uncertainties. There are
>two approaches which might be provided by the data model. The more rigorous
>approach would be to give statistical information about each bin (possibly
>including covariances).
>
>The statistical description of the uncertainties implicitly carries
>information about exposure times, rejected data in combined observations,
>variable sensitivities, and so on. Other attribute metadata may explicitly
>provide the means to separate these implicit contributions to the total
>uncertainties.
>
>The other is to provide a functional description. This is only really useful
>if the data is relatively homogeneous so that variable DQE, bin sizes, and
>backgrounds are not present. A typical model describes the variances as a
>function of the data values. For instance,
>
>        V = A + B N ...
>
>where N is the binned photon number.
>
>3.5 Attributes
>
>This section on attributes is a catch-all for all the rest of the metadata.
>This is all to be defined. However a quick list of common useful attributes
>is given below.
>
>     label/title
>          a label or title provided by the observer
>     object ID
>          a standard object id
>     instrument
>          details of the telescope and instrumentation
>     conditions
>          information about the observing conditions
>     calibrations
>          details of the calibrations
>
>     data quality
>          a table of data quality indicators for:
>             o uncalibrated bins due to vignetting or masking
>             o poorly calibrated bins
>     exposure map
>          a table of effective exposure times
>     exposure filter
>          a table describing chopping, shuffling, sequences of combined
>          exposures, etc. This is a filter function for the time
>          dimension of a bin.
>
>  
>



More information about the dal mailing list