VI ) Provenance and Characterization 2

bonnarel at bonnarel at
Fri Oct 31 12:29:43 PDT 2008

Follow up of recent Obs DM task group compilation

*Step 1 A recent Note I wrote to summarize the cureent ideas and use cases
NB: Fabien proposed a note on a staistical vision of computing 
characterization which I am adressing here. It belongs to him to send 
it to the DM list or
publish it as a Note.

Observation, Provenance and characterisation 2                          
                                                     USE CASES AND 
We discussed Use cases for Extension of characterization data model 
towards a full observation model (including a characterization version 
2, Provenance, DatasetID curation and Access reference) We had to 
define priorities on that.
In Trieste (and before) we listed the following features/use cases  
necessary to consider now.

o	Transmission curves of 2D images
o	PSF or beam profile for Spectra, Images, radio data ( see a recent 
use case described by Anita. I will send it to you).
o	Linkage of Characterization, DataSetId and curation + whatever other 
Obs package  in the xml (or other) serialisation. Requset coming from 
the footprint community and from VOSPACE conceptors.
o	complex data : resolution and sampling changes on multiple parts 
o	complex data : support estimated for different parts
o	combining data: resolution for a coadded file (estimated from 
individual detailled characterization). This can be seen as an example 
of “data with ancestors”
o	combining data: transmission curve from a coadded file...
o	reduction stuff,
o	various displays and views of  a science dataset, associated to the 
simple retrieval (preview, descriptions, extractions).
o	Characterization of Theory data and linkage to SimDB data model

High Priority was set on
a ) Transmission curves
b ) fine resolution description including variability maps, PSF and mappings
c ) explicit demand for linkage of Characterization and  DatasetID
d ) linkage to dataset progenitors or derived products (cutouts, views, 
composition, compressed images).

How do we organize these features using
a )  the general observation container frame
b )the concepts of characterization level 4, characterization of 
complex data and Provenance
c ) ascendant compatibility with current IVOA char

Let's start from what has been described in the Char Dm document and 
interprate it
Level 4
Level 4 is an a priori estimation of how an input signal will be 
affected by the observation process. Damping and smearing are the main 
Damping of an input signal in the parameter space is “sensitivity” and 
is the ground for definition of coverage (support is where sensitivity 
is significant). Smearing of the signal directly gives what we call 
resolution. It varies in width but also in shape with the position in 
parameter space.
Level 4 or fine level description of sampling maybe a little bit more 
cumbersome to define. Because anything we use so far in charac is 
defined in terms of a given sampling. Generally the same sampling will 
be used for data and Damping or smearing information. But in which 
terms has the sampling information to be represented? If it is 
irregular we may have difficulties to describe it in its own terms. The 
exact form of the sampling function, if not trivial will have to be 
described in a continuous mode: in practice with a much higher sampling.
It is obvious how we can derive support, and then lower levels from the 
level 4 for a simple dataset. But what about a complex one? In that 
case the inferred support may be made of different subparts, each of 
them with very different ranges of resolution or sampling. We may be 
able to partition our observation in "sub-observations" in that case. 
The really specific part for each sub-observation will be a set of 
coverage+ resolution+sampling for each set in the partition.
This naturally introduces characterisation of complex data as global 
for level 1 and 2 but for each part of the partition we also need 
something specific going to level 3 and linking together. So, sampling, 
resolution and coverage:  that is a full IVOA char for each part !!!!

Characterisation of a dataset is not all what we want to know about an 
observation or dataset because char is a static view of a dataset which 
is actually part of a process of information transformation through the 
observing path and data processing. So for a full knowledge of these 
data, linkage to previous stages of the data process will be necessary. 
The previous stages will be again described in term of 
characterization. In addition we have to describe the process leading 
the previous stages to the dataset : this is actually the provenance 
Level 4 has been described above as the damping and smearing of the 
signal by the observing process. Actually this definition lack to show 
that this is very statistical in nature. This is basically what Fabien 
thought to be required for a correct definition of Char
So we can see char as the probability distribution of observing events 
for a given input signal. Hereafter and differently from Fabien, I will 
consider the mapping function as a deterministic process outside the 
scope of characterisation. Generally this mapping is well described 
through WCS or WCS-like techniques.
Basically I have input signals coming from the physical outside 
multiaxis world, and the observing process theoretically maps this at a 
given position in the internal grid (contrarily –as far as I 
understand- to Fabien I do think the observing device has internal 
axes). Each signal is made of a set of events (eg photons) and we can 
study the distribution of these events around the theoretical mapping 
position. This distribution shows both the damping (by its amplitude) 
and the smearing (by its shape and width) affecting the signal during 
the observing process. So it is both resolution and sensitivity or 
variablity map for all the axes together
If we want to address the things in practice what we need is 
decomposition of the probability distribution along the axes, and into 
the charac properties( resolution, sampling and coverage).
We may write the distribution at a given position as the product of an 
amplitude function and a dimensionless "PSF". This will split 
sensitivity/coverage from Resolution.
We may also project the distribution on each of the output axis . And 
this will give us the distribution for each axis, as a function of all 
the physical axes. Some variable may vanish if axes are independent.
TRANSMISSION curve use case
The spectral axis for an image is unsampled
We need to know the spectral shape of what can be considered as a 
spectral « pixel ». It is a peculiar case of spectral sampling "level 
4" as discussed above. This description may be considered as constant 
for any position in space, time, polarisation, flux. So what we will 
need in that case is a single function.
Some (like Doug in private) may argue that this is not part of 
characterisation but is actually a description of the Filter. Actually 
although the Filter response is only one of the origins of the 
Observation response function. In some cases we may have only the 
Filter response, which definitely belongs to Instrumental Provenance. 
This is true, but actually the description can probably be the same in 
both cases and can be hooked at different places.
Some other argue this belongs to the Photometry model which is also 
true, but the same remark can apply.
It looks very much like a spectrum but with a few restrictions (NO 
Target, no Location in space, no location in time, etc ...) .... So a 
serialisation of transmission curve values using spectrum datamodel in 
eg VOTABLE will be rather well adapted.

We can hook to such a transmission curve from our level 4 structure but 
with a few additional information.
We may say that:
-       It is a simple function of lambda (no map of values, or matrix 
or map of matrices for the spatial axis)                                
                            - -       It is a Dataset, a parametric 
function or a set of moments (just choose),
-         we may give the URI in case of Dataset
-        We may give the Model/serialisation: here Spectrum / VOTABLE.
-         we give (already in char version 1) additional documentation.
-         what else ?

beam profile / psf use case / changing resolution:
Why describe this ? Answer such questions as:
Which part of the extended object is observed in the spectrum ?
Can we separate merged objects by deconvolution?

What we need is again a function giving the PSF. It can be constant or it can
be varying on the spatial ,spectral or time axes...
The shape may also be indifferent and only the PSF WHM  given everywhere.

To describe this again we may use a functional or a matrix view.
  The WHM is a moment of the functional view.
In the serialisation we have to say:
The variation along the other axes ( "sampling" of the PSF variations)
If it is either a
matrix representation or
a functional one (nature, parameter)
one based on moments ( WHF + sometimes higher orders)

Common features in the two preceding use cases:
We give a variation domain on the other axes: may be constant as in 
curve or single PSF across the field
                                 or variable
We give the description type :map ("matrix") , functional (nature, 
parameters) or moments (including the case with one single moment = 
local sampling period or PSF / LSF WHM)
Provenance of complex data: Combination of observation
Beside Carac of the combined observation we need a new "Provenance" box 
Pointers towards the members Metadata (eg their own carac and 
provenance, etc....)
We describe the nature of the provenance : here combination...
We describe the algorithm : coaddition, drizzling, statistical fusion , 
etc ...
(we may give some parameters links to weight maps, etc ...)
Level 3 or 4 of the combined observation charac may sometimes be
inferred from the level 3 or 4 of the individual data
*Step 2 Anita's question
How do you see the Observation model being used?

One thing I was thinking of, is that in general, data published to the 
VO should have had the instrumental signature removed; i.e., 
Observation should be for reference only... but of course, sometimes 
you need to recalibrate etc.  Hence you need to find out what was the 
state of the instrument when the data were taken.

I recall that Andreas and Alberto have explained that in the context of 
the ESO archive, and for e.g. the VLA you would want to know what 
configuration the array was in; for ALMA you might want the water 
vapour radiometry records for that day...  I think that this is getting 
too much to model, especially as you will almost certainly be using a 
specialised dedicated pacakge to handle the information.

So I propose that we add somewhere a field for a link or links to 
ObservationalConditions (or a better name), probably under DataID

Or is this dealt with somewhere else? (my answer in previous posting)
*Step 3 Igor's answer
> How do you see the Observation model being used?
> One thing I was thinking of, is that in general, data published to 
> the VO should have had the instrumental signature removed; i.e., 
> Observation should
This is what everybody's talking about, but this is, unfortunatly, an 
idealisation. It cannot be done for the real datasets, only for 
simulated ones. One can't, say, "remove the instrumental effects" from 
direct images by increasing the spatial resolution to the 
Delta-function PSF, converting the filter transparency curve into the 
reference one etc. The same applies to spectroscopic data and to 
anything else, meaning that "removal of the instrumental signature" is 
simply unachievable. And I would add that it's absolutely unnecessary.

Therefore, the only solution is to give the thorough description of all 
the instrumental effects in sufficient details to make science with the 
data. This description than can be applied to the models which are used 
to interpret the observations. As far as I know, this is very common in 
X-ray and Gamma-ray observations that one has to apply the response 
function to the model and not to ``remove'' it from the data.

> be for reference only... but of course, sometimes you need to 
> recalibrate etc.  Hence you need to find out what was the state of 
> the instrument when the data were taken.
Therefore, I don't think that your conclusion about "the reference 
only" is correct. The "observation" metadata is absolutely required for 
the data analysis.

> I recall that Andreas and Alberto have explained that in the context 
> of the ESO archive, and for e.g. the VLA you would want to know what 
> configuration the array was in; for ALMA you might want the water 
> vapour radiometry records for that day...  I think that this is 
> getting too much to model, especially as you will almost certainly be 
> using a specialised dedicated pacakge to handle the information.
Most of the things you're mentioning here belong to the "provenance". 
However, there are other things which one should be able to learn from 
it. For example, what was the proposal (link to its abstract, perhaps) 
and who was the PI, what instrument was used, how the data were reduced 
etc. These things go to another components than Char or Prov.
*Step 4 Anita to Igor
> How do you see the Observation model being used?
> One thing I was thinking of, is that in general, data published to 
> the VO should have had the instrumental signature removed; i.e., 
> Observation should
This is what everybody's talking about, but this is, unfortunatly, an 
idealisation. It cannot be done for the real datasets, only for 
simulated ones. One can't, say, "remove the instrumental effects" from 
direct images by increasing the spatial resolution to the 
Delta-function PSF, converting the filter transparency curve into the 
reference one etc. The same applies to spectroscopic data and to 
anything else, meaning that "removal of the instrumental signature" is 
simply unachievable. And I would add that it's absolutely unnecessary.

Therefore, the only solution is to give the thorough description of all 
the instrumental effects in sufficient details to make science with the 
data. This description than can be applied to the models which are used 
to interpret the observations. As far as I know, this is very common in 
X-ray and Gamma-ray observations that one has to apply the response 
function to the model and not to ``remove'' it from the data.

> be for reference only... but of course, sometimes you need to 
> recalibrate etc.  Hence you need to find out what was the state of 
> the instrument when the data were taken.
Therefore, I don't think that your conclusion about "the reference 
only" is correct. The "observation" metadata is absolutely required for 
the data analysis.

> I recall that Andreas and Alberto have explained that in the context 
> of the ESO archive, and for e.g. the VLA you would want to know what 
> configuration the array was in; for ALMA you might want the water 
> vapour radiometry records for that day...  I think that this is 
> getting too much to model, especially as you will almost certainly be 
> using a specialised dedicated pacakge to handle the information.
Most of the things you're mentioning here belong to the "provenance". 
However, there are other things which one should be able to learn from 
it. For example, what was the proposal (link to its abstract, perhaps) 
and who was the PI, what instrument was used, how the data were reduced 
etc. These things go to another components than Char or Prov.
*Step 5 Fabien's comments
>> ... meaning that "removal of the instrumental signature" is simply 
>> unachievable. And I would add that it's absolutely unnecessary.
>> Therefore, the only solution is to give the thorough description of 
>> all the instrumental effects in sufficient details to make science 
>> with the data.
> I disagree - that is _not_ true in a practical sense for many 
> instruments - a lot of us are working very hard to provide 
> science-ready data, or data ready for customisation using generic 
> packages! Even where it is the case, it is not the role of the VO to 
> _do_ the data reduction, but to enable it.

I think the problem is again to define about what we speak. I agree 
with Igor that if we speak about the observed dataset itself, the 
instrumental signature can never be removed. By definition the observed 
data lays in its own data space made of pixels. And you can never take 
it out of this space whatever operations you perform.
However, the description of characterization of the dataset, i.e. the 
description of mapping back projecting the information it contains into 
the real world (the real world is defined here by the char axis) should 
be given with as much detail as possible. When you remove the flat 
field from an image, you just simplify this mapping by precomputing a 
substraction operation (but it adds some new noise in the mapping).
*Step 6 Anita to Fabien
> I think the problem is again to define about what we speak. I agree 
> with Igor that if we speak about the observed dataset itself, the 
> instrumental signature can never be removed. By definition the 
> observed data lays in its own data space made of pixels. And you can 
> never take it out of this space whatever operations you perform.

I take it that you mean 'pixels' as a shorthand for sampling of various 
kinds. Thanks for the good point that we need to define terms, which I 
failed to do - sorry!

To most astronomers, 'removing the instrumental signature' does not 
mean acquiring a perfect reproduction of the sky - even if the spatial 
domain was fully sampled you would be unlikely to cover the entire e-m 
spectrum... What it does mean, is correcting the data for 
instrument-specific artefacts to the point of diminishing returns (as 
in Fabien's example of flat-fielding).  What that point is, depends 
both on the data and on the purpose for which it is to be used.

> However, the description of characterization of the dataset, i.e. the 
> description of mapping back projecting the information it contains 
> into the real world (the real world is defined here by the char axis) 
> should be given with as much detail as possible.

I'd reword that - as much detail as necessary, with links to all the 
detail available. We must avoid making models intimidating.
*Step 7 Juan de Dios comments
> One thing I was thinking of, is that in general, data published to 
> the VO should have had the instrumental signature removed; i.e., 
> Observation should be for reference only... but of course, sometimes 
> you need to recalibrate etc.  Hence you need to find out what was the 
> state of the instrument when the data were taken.

First, I'll take Anita's "instrument signature removal" as "best 
reduced data we can provide giving a sampled physical measurement".

Second, I think we should think about what we want to do with the VO.
In this regard, we need first a data selection part, where Registry and 
Characterisation entries are the main filtering point for datasets.

Once datasets have been selected, and possibly downloaded, or queued 
for download, we need to explore Provenance metadata to know about 
observing conditions. But many times those observing conditions and 
configuration affect things like seeing, or UV coverage, translating 
into the characterisation, or are reflected in additional dampening 
factors (i.e., excessive airmasses in low
elevations) or instrument settings which are known not to be reliable, 
so that they should have been

If working with a small number of datasets, one might want to access 
observing logs for particular observations which are found by the 

> I recall that Andreas and Alberto have explained that in the context 
> of the ESO archive, and for e.g. the VLA you would want to know what 
> configuration the array was in; for ALMA you might want the water 
> vapour radiometry records for that day... I think that this is 
> getting too much to model, especially as you will almost certainly be 
> using a specialised dedicated pacakge to handle the information.

I'm starting to feel that we need to generalise, somehow, some of 
Provenance procedures. But I fear that makes it even more "meta- 
programming" dealing with many of these exceptions.

I'm also starting to believe that we need to have a metric of what a 
good or a bad observation is, with Provenance aiding in finding out 
what went wrong for particularly faulty observations, or for medium- 
grade observations that a pipeline marked as bad, but a trained human 
can treat better.
*Step 8 Anita answers Juan de Dios

> I'm also starting to believe that we need to have a metric of what a 
> good or a >bad
> observation is, with Provenance aiding in finding out what went wrong 
> for >particularly
> faulty observations, or for medium-grade observations that a pipeline 
> marked as >bad, but
> a trained human can treat better.
Yes.  We already have the Registry (self) grades of calibration status
Partly calibrated

And we have the provisions in Char for errors, but what is acceptable 
or typical varies
enormousely from instrument to instrument, or more finely.

So perhaps we need to include the Registry grades in Char (as 
consistently as possible)
plus something like
Self-assessment of data quality:


but I think that the interpretation would ahve to be up to the user.  
Your example is a
good one; pipelining good-quality data may give immediately usable 
results, whilst data
affected by weather might need a human touch.

I think that it is a VO principle that we do not judge data, so we 
would ahve to provide
guidelens but ultimately leave it up to providers, to decide whether 
e.g. radio data
which have been pipelined but may contain rfi, are
Science-ready but Low quality
Partly calibrated but High quality
*Step 9  Igor's comments

So, Anita and Juan de Dios (and perhaps others) -- could you answer me 
a couple of
questions about the quality/level of data reduction. I just want to 
understand what
exactly you mean by "removing instrument signature".

What is, in your opinion, the grade of HST ACS data (direct images) 
provided by the
Hubble Legacy Archive? Is it "science ready" or "partly calibrated"? On 
one side, this is
one of the best examples of high quality data reduction, I'd even say 
almost the highest
possible. On the other hand you have: a) variations of the PSF across 
the FoV due to
distortions introduced by the HST optics; b) filter response which does 
not 100%
corresponds to the photometric standards (Johnson/Gunn SDSS/etc.). 
These two points are
ACS-specific so, strictly speaking, the data do not have "instrument 
signature removed"

The same questions will apply to SDSS spectra. Although they are in 
absolute flux units,
wavelength calibrated. However, there is information about changes of 
the spectral
resolution along the wavelength which is SDSS-specific. Moreover, the 
spectra are
obtained through the fibers with round apertures, the aperture filling 
depedns on the
(varying) seeing conditions, quality of centering (i.e. if there is an 
error in the fiber
position there is flux loss) etc.
*Step 10 Anita answers Igor
And in some cases at least, you have dodgy astrometry

As the discussion moved on, the question is not 'are these data 
perfect', but 'are these
data as good as possible for most purposes' or 'are these data good 
enough to improve if
necessary with non-dedicated tools'.  We have to assume that 
astronomers use critical
judgement.  I usually use HST images to compare morphology, make 
identifications etc. and
so I am not bothered by small photometry issues, but I am bothered by 
astrometry - but
that can be fixed using Aladin, GAIA, or many other tools depending on 
what I am used to
and the method I want to use. It would be useful to have an estimate of 
the systematic
astrometric error in the Char description, but even if that level of 
detail is not
available the argument is the same - the data are fit for a 
non-specialist with generic
astronomical tools.

This is in contrast to data which are presented in, say, raw counts.
*Step 11 Juan de Dios goes on
And for some kind of work, even having the raw counts might be  
useful... but astronomers
needs to know what are they dealing with.  Sometimes we might even use 
tools which know
nothing about astrometry,  but which are good at making sense, fast, of 
data  (say ImageJ, OsiriX, and other "medical-astronomy" tools).
*Step 12 Juan de Dios answers Igor
> What is, in your opinion, the grade of HST ACS data (direct images)  
> provided >by the
> Hubble Legacy Archive? Is it "science ready" or  "partly calibrated"? 
> On one >side, this
> is one of the best examples  of high quality data reduction, I'd even 
> say >almost the
> highest  possible. On the other hand you have: a) variations of the 
> PSF  across >the FoV
> due to distortions introduced by the HST optics; b)  filter response 
> which does >not 100%
> corresponds to the photometric  standards (Johnson/Gunn SDSS/etc.). 
> These two >points are
> ACS- specific so, strictly speaking, the data do not have "instrument 
>  >signature removed"

We are always talking about degrees here: first, you have an image  
characterisation at
the archive level. I think the non-conformance to  photometric 
standards (after all,
filter response; even for  photometric standards we should be, at last, 
be able to get
that  filter response) should go in the archive, if it is observation- 
independent. We
might get, of course, into the realm of non- linearities, but I think 
those should be
left to the specific data  reduction, but with good characterisation of 
each particular
image/ spectra/data cube/single point flux measurement...

Perhaps the problem is we implicitly think "the VO will allow you to  
use the same
reduction tools with all data" (it feels like that could  be an 
ultimate, utopian goal,
and I thought of the VO that way at the  beginning), but we should made 
"the VO will
allow you to get either  reduced data, which you might use right away 
depending of your
application, or raw data, which everyone will need to reduce" use  
cases really easy.

The value of the VO in this latter case would be that
   a) you get extra metadata for refining your query (that is, outside  
the file,
      both at the service and dataset level)
   b) you get extra metadata pointing to the data reduction processes  
      so that you know what has not been performed.
> The same questions will apply to SDSS spectra. Although they are in  
> absolute >flux
> units, wavelength calibrated. However, there is  information about 
> changes of >the
> spectral resolution along the  wavelength which is SDSS-specific. 
> Moreover, the >spectra
> are  obtained through the fibers with round apertures, the aperture  
> filling >depedns on
> the (varying) seeing conditions, quality of  centering (i.e. if there 
> is an >error in the
> fiber position there is  flux loss) etc.

Surveys create so many different data products, that each of them have  
their own share
of problems to be solved... but ultimately you  (astronomer working in 
the SDSS pipeline)
are able to trace back a  given table row, or row set, and trace it to 
the originating
images,  calibrated images, etc. That power should also made available 
outside  the
"sausage machine" (as Robert Lupton calls survey workflows ;-)).

Thus, in the highest quality data providers, this provenance  
information should be
available... and made public. And still there  will be hidden 
information, encoded in the
form of processing rules in  specific data reduction packages, that 
should be made
explicit either  by pointing to the tool, or to the mechanism it uses.

Again, a certain amount of rough data quality assessment mark should  
be given _a
priori_, from a checklist of things provided (this is not  a complete 
list by any means,
just something out of the top of my  head, to get a discussion 
started... or stopeed ;-)):

- Registry entry Char level
- Dataset entries max Char level
- Provenance
   - Calibration and other corrections
     - Raw data for calibration
   - Weather
   - Observing annotations
   - Automatic quality assessment in provenance

Perhaps is not the mark what we need, but whether these things are  
possible or not to
get for a given dataset (collection).

ps. My computer just selected this quote from John Gall, that we  
should all take into

    John Gall: A complex system that works is invariably found to
    have evolved from a simple system that worked. A complex system
    design from scratch never works and cannot be made to work.
    You have to start over, beginning with a working simple system.
*Step 13 FB example for Provenance
     My example is my lawer and I only speak if my lawer is in the room.

     This is very preliminary example for the provenance of a let's say
"CFHTLS mosaic".

     Where is it coming from:
             - The IVOA observation Note for the basic structure (2004-05-16)
             (ObservingConfig and Elements + Processing, 
ProcessingStage, etc...)
             - a rather simple and general description.
             - The need for a generic framework with various possibilities
      to hook standardized or project Specific metadata and documentation.
                eg : filter transmission curve (IVOA standard)
                     algorithm metadata (project specific)
             - need to access progenitor and associated data or their metadata
             - Our top priority use cases : Filter transmission curve,
Mosaic progenitors, confidence map ...

             - I looked rapidly to Juan and Fabien's examples but did not
integrate a lot from them now, but it's surely possible.

       Do you like it ? ( ;-) )

<?xml version="1.0" encoding="UTF-8"?>
                stc description
                      <type>weight map</type>
*Step 14 Anita's answer
Witness statement:


How do we handle
1) Space observatories
2) Arrays

What information does a data user need (which is different from the 
requirements for
planning observations)?

Someone else probably knows what is customary for satellites and other space

The most basic level of detail should be just an indication that it is 
a space-based

2) Interferometry arrays

The basic level should be the nominal centre of the array (a bit hard 
for define for
global VLBI but one could make a guess).

For global VLBI plus one (or more) orbiting antennas, would the best 
solution be a
nominal position for the array _plus_ space-based observatory?

For detailed data reduction of visibility data, an antenna file is 
almost invariably
attatched to the data, (or occasionally via a separate link, for some 
VLBI) in a format
which the required data reduction packages can understand or 
interconvert - thus I cannot
see any need to convert the positions of, potentially > 60 antennas 
into STC.  In some
cases e.g. MERLIN, a static link to a table of antenna positions might 
be useful. In
other cases e.g. VLA, there might be a link to the best-known positions 
which can be an
update on the positions in the antenna table.

So, one requirement is the option of a link to antenna position 
information.  In the case
where that is provided in a single file for the whole array (or at 
least, some form other
than per-telescope) that needs to come here?

Should there be the option of a top level for 'array', with the longest 
baseline as the
diameter, and then optional details of the individual dishes?
The detail might be name, size and location e.g. for Global VLBI, where 
all the dishes
are different and different combinations are used.
In the case of visibility data the locations would be redundant but at 
least a link would
be useful for information for images (although of no relevance to 
reprocessinig FITS
images without going back to visibility data)

However for something like ALMA, 49x12 m  8x10 m (or whatever) might be 
enough, with the
positions in the antenna file.

Grating, Filter etc.
Should these be sub elements of a SpectralConfig (or something) 
element, which could also
for interferometry

Does AssociatedData also cover calibration sources?
*Step 15 Fabien to Anita
> observatoryLocation>
> How do we handle
> 1) Space observatories
> 2) Arrays
> What information does a data user need (which is different from the 
> >requirements for
> planning observations)?>
> Someone else probably knows what is customary for satellites and other space
> observatories.
> The most basic level of detail should be just an indication that it 
> is a >space-based
> observatory.
think the observatory location should be managed just as any other char 
axis. It is a
3D axis on which is defined the central position, a bounding box (a 3D 
volume) etc..

This would also manage the arrays in an elegant way.
*Step 16 Arnold answer

So, that's what ObservatoryLocation does; you can just include it.
For space-based observatories it would be an orbit ephemeris file,
typically a table of state vectors: time, x, y, z, vx, vy, vz.
*Step 17 Anita to Arnold
A 3-D STC cube enclosing the whole array would be a neat way to give 
the coarsest-level
description of a terrestrial array, or even one with an orbiting 
antenna.  However... it
is not the 'natural' way to describe it, in the sense that it is not 
what any known
software would expect, and it does not necessarily give any useful 
information.  It is
certainly an option, but it is not worth forcing such a description if 
it would ever be

For visibility data, only an antenna file in one of a few formats is 
really useful
directly; otherwise a link is probably as good as it gets. It often 
would be useful, even
for images, to have a reference to the antennas used by name (if not a 
full description
of individual size, location etc - however, as I said before, that is 
not always the best
thing to do because the positions may be corrected - hence a link is better).

For spaced-based data, Arnold is right of course for situations where 
you want (or might
want) to re-reduce the data using the ephemeris, or some part of the 
information.  But
what about, say, CADC associations - or some other highly processed 
product, which cannot
be deconstructed into individual pointings? Just a reference, I suppose.
*Step 18 Fabien to Anita
The problem is that we want to find a generic way of describing the 
observer location,
just the same way we want to describe the other axes.
For a software who want to manage observations in a generic way, it is 
excluded to
implement a different approach for every cases.
I believe this is precisely the challenge of the Char WG to find what 
is the universal
way of describing this information.

Giving a 3D bounding box is not enough for re-reducing the data, but it is an
instrument-independent way to give the level 2 char. If you want a more 
description, it is part of level 4 and it is then a different topic.
> PS I am labouring this point because we must not make it harder to 
> publish data >than is
> absolutely necessary!
It's what I am trying to explain since the beginning of the char discussion!!
*Step 19 Arnold again
Arrays are not a problem for ObservatoryLocation:
It accommodates any number of coordinate pairs and each can be labeled
with its (antenna) name.
One should remember, though, to also provide the array's origin or
reference position, since that's what the phase center and timing are
referenced to.

Positions are labeled with uncertainties. This can be used to cover
multiple pointings or data where the exact position has been lost
(e.g., within 8000 km of the geocenter).
*Step 20 Anita to Arnold and Fabien
Yes, I agree that the locations of antennas in arrays *can* be 
described by a large 3D
bounding box and by a series of coordinates, as you point out.  The 
question I am asking,
is what is the use?  A token location for relatively compact arrays 
would be useful, to
give an idea of the horizon, but apart from that, I cannot imagine 
anyone searching on
the basis of antenna location - baseline length, yes, but that is in a 
different model.
It will not be in a format of direct use for data reduction and antenna 
positions with
errors of more than a small fraction of the observing wavelength are 
worse than useless
for data reduction.  For visibility data, a correct antenna table 
should come with the

In the case of an image, the antenna positions alone are not much use 
for reconstructing
the range of spacings in the data, either, since you also need to know 
how long the
source was observed for.  Char will be _very_ useful for providing that 
sort of
information, since it does not normally come 'inside' an image (only 
inside visibility

So I still feel that in terms of usefulness, a token position and a 
reference to a web
location for the appropriate detailed information will usually be the 
best solution.
(Actually, MERLIN is probably the easiest instrument to conform to 
Arnold's model since
we have relatively few, fixed antennas, and a small number of standard 
But that will be up to other interferometry data providers...
Maybe I should stop wittering here and do just that, e.g. have a look 
at the ALMA data

Anyway, the point is that we have got to keep use in mind, not aethetic 
*Step 21 Fabien answers

> Yes, I agree that the locations of antennas in arrays *can* be 
> described by a >large 3D
> bounding box and by a series of coordinates, as you point out.  The 
> question I >am
> asking, is what is the use?  A token location for relatively compact 
> arrays >would be
> useful, to give an idea of the horizon, but apart from that, I cannot 
> imagine >anyone
> searching on the basis of antenna location - baseline length, yes, 
> but that is >in a
> different model.

If you make imaging observations of a solar system object, you would be 
very interested
of the position of the telescope in the solar system (especially if 
it's on a satellite)
to interpret the parallax. My 3D bounding box example is generic to 
handle those cases.
This information is just a level 2 presentation of the antenna table. 
Because you need
more detailed meta data, then you should in principle go to a level 4 
However, a device independent description is I guess currently out of 
reach for Char
definition. Therefore I fully agree that for the moment a reference to 
a web location for
the appropriate detailed information will usually be the best solution. 
But it has to be
clearly understood that this cannot really be thought as being part of the
Characterization model because it is just not standard and generic. 
This would just be a
extra meta-data coming along char in the Observation instance file.

I really think the solution for having something robust and working 
well (and quickly) is
to standardize only descriptors that are really relevant for all 
observations in a
generic way. All the rest should be thought as specific 
archive-dependent meta-data.
Those are very important meta-data as well, and our serialization 
should allow to include

> Anyway, the point is that we have got to keep use in mind, not 
> aethetic >completeness.

I agree with that, however, for a developer, implementing an unclear or 
standard is worst than having no standard at all. E.g. I still wait for the
characterization group to define formally and precisely what bounds, 
resolution and error
really means. Francois asked yesterday in a slide what the resolution 
for polarization
axis means. This is an excellent question, and I believe that until it 
is not properly
answered, there is a risk of defining broken standards.

My suggestion is that in the char2 document we should take all the axes 
descriptors 1 by
1, and try to define them thoroughly. We could maybe setup a wiki page 
structuring the
discussion on that.
*Step 22 Juan de Dios
> I am labouring this point because we must not make it harder to  
> publish data >than is
> absolutely necessary!

Let's take this as a motto for the whole VO ;-)
> A 3-D STC cube enclosing the whole array would be a neat way to give  
> the >coarsest-level
> description of a terrestrial array, or even one  with an orbiting 
> antenna. >However... it
> is not the 'natural' way to  describe it, in the sense that it is not 
> what any >known
> software  would expect, and it does not necessarily give any useful  
> >information.  It is
> certainly an option, but it is not worth forcing  such a description 
> if it >would ever be
> used.
I think we need to be more pragmatic: for compact arrays (even for  
ALMA), a central
position might be enough. After all, I don't think  you will be 
querying the VO for
observatories based on their  location... or at least not right now. 
So, for compact
arrays we have  a level 1 Char position of a single point.

What do we do, then, for VLBI, VLBA, or even wider arrays? I think it  
should be fair to
say there is _no_ level 1 char location, while the  bounds should be an 
STC region. And I
think we should think a little  bit about being able to express "I 
don't know" in more VO
*Step 23 Arnold  on referencing
Referencing is no problem, either, since it is provided in the STC
standard. In the document you will find at least one example of an
ObservatoryLocation being referenced through Xlink - the same will
work for parts of the OL: the array reference center could be present,
the individual antennae Xlink-ed in.
*Step 24 An exemple from FB for characterization 2 information
In my example you will find a modified characterization document 
supposed to be
consistent with characterization version 2 model and schema ( and don't 
look for it, it
doesn't exist yet !!!)

     It tackles several use cases

           a ) complex data. That is where we have segments or parts in 
our observation
(eg the HST WFPC2 case with 4 sub-images with different samplings and 
sizes). I
introduced only three new tags in char to manage that.
            ---> globalChar to encapsulate a global raw level 
characterization of the
whole dataset.
            --> the segment tag to encapsulate a fine level detailled 
char of a subpart.
In addition the number tag allows to identify each segment....

           b ) spectral Response of a 2D image / PSF use case

               I modified our level 4 variationMap and resultionMap or
samplingPrecisionMap to contain actual <Map> elements.
               I adressed the sepctralResponse use case (something very similar
to the transmission curve in nature, BUT SPECIFIC to this observation).
               I proposed to write it in three ways:
                    - either by giving the link to the curve in a spectrumDM
consistent table
                    - or by giving directly in the xml the list of 
moments representing
the curve.
                    - or by giving a functional/parametric 
representation of the

                I didn't do it in the example, but something rather 
similar could
be done for resolution variation maps and psf
               Of course loink to additional documentation is always possible.

<?xml version="1.0" encoding="UTF-8"?>
<characterization xmlns:xsi="" 
                     <coordsystem id="TT-ICRS-TOPO" xlink:type="simple" 
                       <coord coord_system_id="TT-ICRS-TOPO">
                       <limits coord_system_id="TT-ICRS-TOPO">

                     <unit> none</unit>
<!-- none unit is for ISO-8601 format -->
                     <coordsystem idref="TT-ICRS-TOPO" />
                       <coord coordsystem_id="TT-ICRS-TOPO">

                    <coordsystem idref="TT-ICRS-TOPO"/>

                       <coord coord_system_id="TT-ICRS-TOPO">

                     <coordsystem id="TT-ICRS-TOPO" xlink:type="simple" 

                       <coord coord_system_id="TT-ICRS-TOPO">
                       <limits coord_system_id="TT-ICRS-TOPO">
                     <unit> deg </unit>
                     <coordsystem id="TT-ICRS-TOPO" xlink:type="simple" 
                       <coord coord_system_id="TT-ICRS-TOPO">
                       <limits coord_system_id="TT-ICRS-TOPO">
                     <unit> deg </unit>
                     <coordsystem id="TT-ICRS-TOPO" xlink:type="simple" 

                       <coord coord_system_id="TT-ICRS-TOPO">
                       <limits coord_system_id="TT-ICRS-TOPO">
                     <unit> deg </unit>
                     <coordsystem id="TT-ICRS-TOPO" xlink:type="simple" 

                       <coord coord_system_id="TT-ICRS-TOPO">
                       <limits coord_system_id="TT-ICRS-TOPO">
                     <unit> deg </unit>


*Step 24 Igor's comment
Francois -- very interesting example. What I'm not happy with is a 
dependence. Where does the syntax come from? Won't it be wiser to use 
some existing
mathematical XML solutions for this?

> You are probably right. Can you show how it what it would look like ?

I have no time to prepare an example, but you may want to take a look here:

More information about the dm mailing list