SSA working draft

Mon Nov 20 11:50:00 PST 2006

Hi Inga -

Thanks for the timely comments.

> I think it's not necessary to state how a dataset was created in the SSA
> paper. The SSA should not get into the details about how a datacenter may
> choose to serve the datasets. It's up to the datacenters to do that and I
> guess there will be as many different ways as there are datacenters.

The CreationType refers to what the service does to create the data
returned to the client, defined relative to the DataSource.  This is
necessary to describe virtual data, and is a function of the service,
not what the data center does to produce a data product (that is the
more general data provenance problem and would be described elsewhere).

> Same holds for 1.1 where the paper explicitly says that dynamically
> created datasets are the way a service will respond. I don't think this is
> true. It should be left to the discretion of the service provider to
> decide if he wants to deliver dynamically created or static or some
> mixture of files.
>
> 2.1, 3.3.2.5 are other examples where this comes up.

If the service generates cutouts, filters the data, does spectral
extraction, etc., this can *only* be done at access time, because
these are intrinsically on-demand operations driven by parameters
supplied by the client at access time.

If you are referring to pre-generating SSA compliant versions
of archival spectra, e.g., for Echelle data, then you probably
have DataSource=pointed and CreationType=archival.  As you say,
in principle we don't know in this case if the data product was
pre-generated and cached.

> 1.4:
>
> What is the difference between MUST and REQUIRED. It is not explained
> anywhere.

They are the same: "mandatory" or "required" interface elements are
referred to as MUST.  This is covered in section 1.4, but I agree
could be worded better and will revise the text.

> 1.4.1:
>
> Why do you want to single one type of response format out? Why not have
> them all equal?

Not sure what you are referring to here; 1.4.1 describes the levels of
compliance of a service, not response formats.

> Isn't there a gap between query compliant and fully compliant?

Yes, they are not the same.

> What is a service did not implemnt all the "should" elements, but
> still answers with SSA-compliant data?.

Then it is minimally-compliant.  These are somewhat coarse descriptions
of course; one would need to look at the detailed service capabilities
to fully understand what is provided.

> 2.3:
>
> How can you have metadata on virtual data? How should we anticipate all
> possible ways a user may ask for spectral cutouts, extraction etc. to be
> prepared to answer?

This is what on-demand data generation and virtual data are all about.
The service describes the metadata of the virtual data product it would
generate.

There are an infinite number of possible virtual data products.
You don't have to describe them all, rather, given what the client
requested, the limitations of your service, and the characteristics
of the data, you describe what the service would generate to best
match what the client requested.

A simple example is if the client requests a certain bandpass range
and you have a cutout service, the virtual data product would be a
spectrum covering only the given wavelength region (or however close
the service can get given a range of other details).

If the query is detailed enough, the query response may refer to a
single data product.  Hence, the query mechanism may be used not only
for data discovery, but to negotiate with the service on the details
of the data product to be generated.

> 2.4.2:
>
> spectral extraction - you may want to mention grism here as well.

Yes, dynamic extraction of a 1-D spectrum from a Grism plate would
be another example of spectral extraction.

> 3.3.2.3:
>
> I thought that BAND is always a string. How can you have then "If a
> bandpass is spcedified as a string it is..."

BAND is either a numerical bandpass (wavelength in vacuum in meters)
or a bandpass name (unspecified; prior discovery is needed to determine
the possible values).

> 3.3:
>
> REQ is the abbreviation for recommended and then in the table, you use
> REC. This is inconsistent.

Indeed, there is a typo where it says "...recommended parameters by REQ"
In the column label, Req standards for "Required".

> Apertures need not be circular, so you may want to phrase the respective
> sentence different.

Only circular apertures are currently supported for on-demand spectral
extraction and this should be adequate for point-source or compact
objects (even for Grism data).  We could generalize this if needed,
but it complicates the interface.

> 3.3.3.6:
>
> What does a photometric redshift mean in the context of spectra? What is
> it you want to allow the user to do? You confuse them by talking about
> blueshifts and local neighborhood if what you mean is a galactic
> photometric redshift.

I agree, use of the term photometric redshift is confusing here.

The intent here is merely to provide a means to query by an approximate
observed redshift (or blueshift) range for the target object observed.
This is imprecise, but the query can be refined client-side using
the more rigorous and precise metadata which is returned.

> 3.3.6:
>
> "Hence when data model attributes are indicated as mandatory of
> recommended in this document, this overrides any similar requirements
> specified in the Spectrum data model document."
>
> I think these are two different things. The spectrum data model tells how
> to return the data. The SSA talks about how to contruct the answer to a
> request. I think it will happen that the data is returned in whatever
> units are appropriate for wavelength and the VOTable answer to the service
> is in meters. They are independent.

Much of the data model is common to both datasets and the SSA query
response, but as you say the usage is independent, and hence the units,
what is required or recommended, etc., are different in the two cases.

I agree that in general for an actual dataset one would prefer to
not change the units (hence we need flexibility to specify the units,
reference frames, etc. at this level), whereas we can fix the units
in the query response.

> Do you think exposure time should be given in days? I would think that
> second is the fundamental unit to use. And if you think this makes
> observing dates look bad, why not allow both, days and seconds.

All things being equal I would agree, and if we had an
explicit "exposure" attribute it might make sense to define
the units explicitly as seconds.  But what we have instead is
Char.TimeAxis.Coverage.Bounds.Extent, followed by optional Start and
Stop values which do want to be in units of days, and it makes sense
to be simple and consistent with units within Char.  In any case this
is a broader issue of what we do within Char, and not specific to SSA.

> 3.3.11:
>
> I think that we have to solve the problem with spectral resolution. Some
> datasets will have the spectral resolution vay substantially over the
> spectrum. Thus a mean value or value at mean wavelength might be useless
> if the user is interested in the blue or the red part of the spectrum.
> Wouldn't it be easier to have two options for resolution reflecting the
> two fundamentally different ways instruments operate (either constant
> delta lambda or constant lambda/delta lambda)?
>
> The service can then still choose how to answer by using a conversion
> (reference wavelength, etc.), but at least it's clear that the metadata is
> as accurate as we can get it.

At the level of the actual dataset, one can specify the spectral
resolution for each individual data point, so there is no problem for
actual analysis.  For the query we discussed specifying resolution in
Char as L/dL (which would also be more consistent with the SPECRES
parameter), however this causes problems for consistency within Char
and with the data arrays within Spectrum.

Char is all about characteristic values, which are by definition
approximations, to try to simplify things at a higher level.  Everything
is like this if one looks carefully enough; it is the difference between
characterization and calibration.

If this is really a problem at the level of Char, we would need to
generalize the Accuracy model slightly.

> Best wishes,
> Inga

Thanks again Inga - these are the first careful comments we have gotten
back on this!

 	- Doug