Registry science metadata

Anita Richards amsr at jb.man.ac.uk
Wed Apr 30 05:34:10 PDT 2003


Having just glanced at Alberto's mail in respons to Keith, I have realised
my attempt to post some suggestions have not appeared, sorry - this may
answer some of Alberto's points regarding a controlled vocabulary,
separating layers of description etc..

The material below can also be found at
http://wiki.astrogrid.org/bin/view/Astrogrid/DataServiceSchema with links
which help to explain it.

It is intended to describe the science content of the *summary* of
astronomical datasets held by the registry, not the list of UCDs or
evaluations of the contents of the dataset.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dr. Anita M. S. Richards, AVO Astronomer
MERLIN/VLBI National Facility, University of Manchester,
Jodrell Bank Observatory, Macclesfield, Cheshire SK11 9DL, U.K.
tel +44 (0)1477 572683 (direct); 571321 (switchboard); 571618 (fax).

Tentative ammendments to Keith's RegistrySchema

AMSR comments and changes marekd *

Hanisch et al. is the most recent (presently v.6) version of the document
also referred to as
(not)Bob's.

Mostly this is as per Hanisch et al. but some changes - e.g.
- spatial is incorrect for sky coverage/position/resolution, should be
angular
- some things added

SEE NOTES AT END

-----------------------------------------
dataService


Schema: http://www.w3.org/2001/XMLSchema

include: schemaLocation="serviceLocation.xsd"

elements:

CONTENT
"content"               string          (see elements following)*
"facility"              string
"instrument"            string
"format"                string          (VOTable, ascii, FITS etc?)*
"briefsummary"          string
"tablenrows"            integer         (Number of rows in table)*
"tablencols"            integer         (Number of columns in table)*
"tablesize"             decimal         (bytes - size of table excl.
linked nDim data)*
"ndimdatasetsizemin"    decimal         (bytes)*
"ndimdatasetsizemax"    decimal         (bytes)*
"nndimdatasets"         integer         (number of nDim data sets)*
"type"                  string          (archive, survey, catalogue,
bibliography,
                                         journal, library, outreach,
education,
                                         eporesource, integrated,
nameresolver)
"subjectkeyword"        string          (Galaxies, Milky Way, Nebulae,
Planets,
                                         Solar system, Stars)*
---

COVERAGE
"coverage"              string          (see elements following)*
"wavelengthrange"       string          (gammaray, xray, xuv, uv, optical,
ir,
                                         mmwave, radio)
"wavelengthshort"       decimal         (metres)
"wavelengthlong"        decimal         (metres)
"ramin"                 decimal         (degrees)
"ramax"                 decimal         (degrees)
"decmin"                decimal         (degrees)
"decmax"                decimal         (degrees)
"sensitvity"            decimal         (Jansky? also allow Magnitude?
eV?)
"startdate"             decimal         (JD) or (YYYY.DD) or date
(CCYY-MM-DD)*
"enddate"               decimal         (JD) or (YYYY.DD) or date
(CCYY-MM-DD)*
"angularfraction"       decimal         (dimensionless fraction)
"spectralfraction"      decimal         (dimensionless fraction)
"temporalfraction"      decimal         (dimensionless fraction)
"sourcedensity"         decimal         (counts per square degree)
---

RESOLUTION
"resolution"            string          (see elements following)*
"angularresolution"     decimal         (degrees? arcsec?)
"spectralresolution"    decimal         (dimensionless fraction)
"temporalresolution"    decimal         (sec)
---

DATAQUALITY
"dataquality"           string          (see elements following)*
"astrometryerror"       decimal         (degrees? arcsec?)
"photometryerror"       decimal         (Jy? Magnitudes? eV?)
"timingerror"           decimal         (dec)

- - - - - - - - - -
NOTES

Suggested standard units/conventions:

see http://www.iau.org/IAU/Activities/nomenclature/units.html (This is
just for the ResourceMetadata;
for DataSets generally the wider conventions of CDS can be used).

I have suggested units; in some cases I suggest alternatives where the
conversion may be tricky or
where being totally consistent might lead to very small/large Nos (e.g.
degrees for angular position,
but arcsec for error is more usual) - however I would prefer to be
consistent, the first unit before ? is
preferred. Approximate conversions suffice to answer 'Is this catalogue
any use' with 'maybe/no'.

Hanisch et al. use decimal years for date but this is non-standard?
Convention for leap years not
well-known.

This has implications for the user query; for the very first iterations we
may have to force the user to
use standard units but very soon we should be able to interconvert
Jy/Mag/?x-ray units? and
wavelength/freq/eV units etc. For Resource metadata selection this does
not have to be precise.

Should the units be added to the schema?

Data types and null values

Is it simplest if every element should occur at least once, and we use
null values as suggested in the
[[http://cdsweb.u-strasbg.fr/doc/VOTable/votable-1-0.htx][VOTable
documentation]]? e.g. use NaN?
for decimals with no value and NULL for strings? Alternatively, if we need
to use the null value to sort
by, (e.g. (de)prioritise DataSets lacking the relevant ResourceMetadata
entry) xml allows INF and
-INF.


CONTENT

"subjectkeywords" (new element)

One or more keywords taken from the dataset header, e.g. a subset of the
third column on the Vizier
catalogue selection page. See
http://adc.gsfc.nasa.gov/adc/adc_keyword_index.html and
http://vizier.u-strasbg.fr/doc/ADCkwds.htx We should add from the ADC list
or the Vizier
simplification as required, sparingly.

Note planetary nebulae are Nebulae not planets
Galaxies means external galaxies, not the Milky Way

Is there anything equivalent for Solar/STP?

In Hanisch et al. 'subject' is included in curation metadata, but I feel
it fits better in content. However I
don't really mind; this and some other things listed under CONTENT below
should maybe be in
CURATION?

"type" means FITS, ASCII etc?

I am using 'table' to mean data which could be searched directly in a
database or be converted to
VOTable, e.g. a list of sources and properties. Other 'nDim' data which
requires special
viewers/extraction software, e.g. FITS, will always? have an associated
table describing it, e.g. a list
of pointings and other observationsal details.

I think that we can cover whether nDim data are images, spectra etc. by
whether elements like
"decmin" or "spectralresolution" have meaningful values, or the null
value.

COVERAGE

"angularfraction" (a fraction) is for datasets containing images or
imageable data; "sourcedensity"
(sources/deg^2) is for datasets containing lists of sources with
positions. Note that the total
fractional coverage is different from the resolution. In future iterations
AstroGrid want to go for
indexing/matrix representation rather than the shapes suggested by Hanisch
et al.?

RESOLUTION

Note that it is easier to express spectral resolution as (finest channel
width)/central value), e.g.
delta-lambda/lambda, as this avoids unit problems, but this cannot be done
in a universal way for
other sorts of resolution.

We should use the best value in the data for now, and later include
algorithms to allow for e.g.
angular reaolution as a funtion of frequency for multi-frequency data.

DATA QUALITY

Things like angularresolution and sensitivity will initially probably be
given as the best value of all
errors (systematic and random) correctly combined. However in some data
sets these may cover a
wide range. E.g. astrometry error can depend on sensitivity and
resolution; in observing logs
resolution may be frequency-dependent. Ultimately we should be able to
express these things as
functions which are evaluated depending on other bits of the data set or
even the query. E.g. the
MERLIN archive covers frequencies from 0.408 to 22 GHz has a best
resolution of 0.''008 but this is
at 22 GHz; the resolution at 5 GHz is 0."050 and if you want higher
resolution you have to go to e.g.
the EVN archive.

UCDs

Should there also be an element to link to the UCDs for the dataset?

Many of my changes are probably incorrect xml, sorry, but I hope intention
is clear.





More information about the registry mailing list