R(S)M v0.8

Thu Jul 10 06:52:50 PDT 2003

----- Original Message ----- 
From: "Patricio F. Ortiz" <pfo at star.le.ac.uk>
To: "Robert Hanisch" <hanisch at stsci.edu>
Cc: <registry at ivoa.net>
Sent: Thursday, July 10, 2003 9:18 AM
Subject: Re: R(S)M v0.8

>
> Hi Bob,
>
> I've read the document you just sent and have some questions/comments
about it.
>
> The first is that the HTML used makes use of some non-standard
> representation of symbols like '<, >, etc', which makes it a bit
unreadable.
> Instead of < it uses some binary representation which shows here as a
> Sterling-pound symbol :-) :-)

The HTML is what MS Word makes automatically.  I will provide the original
.doc plus PDF formats once this is uploaded to the ivoa.net document
directory.  I would be happy for someone to clean up the HTML if they wish.

> I'll go section by section OK?
>
> {Resource Metadata Concepts} -> {Identity Metadata}
>
> Three elements are listed:
> Title
> ShortName
> Identifier (URI)
>
> my concern goes to mirrored data. What happens with catalogs located in
> several places? Title and shortName are the same, Will the URI be enough
to
> identify these resources as the same? Do we have a mechanism to reliably
> compare two resources and decide that they are the same resource, only
> located in a different location or available through different services
> (DBMSs)?
We have had lengthy debates on this topic.  There is to be a separate
document describing Identifiers and the associated issues regarding
uniqueness, replication, etc.

> {Content Metadata} -> { Coverage Spatial}
>
> Polygon may not replace a box.
>
> box: This is my bias. Box brings to my mind a photographic plate, a CCD,
> etc, that is, a detector. Coordinate extrema don't represent well
> the above mentioned objects,
> plus a box may be skewed respect to a parallel/meridian grid (eg,
> HST detectors), and this would be the case if you keep the same
> box center and change the coordinate system.
>
> According to the definition, a 4 vertices polygon centered in the pole
will
> not describe well a photographic plate or a 1deg CCD mosaic centered
there.
> Aren't we missing somehow projections on a tangent plane?
>
>
> I'm also missing something in the line of splitting the sphere (in
whatever
> coordinate system is adopted) into 3 regions: equatorial(xx), north(yy),
> south(zz)
> where xx represents the distance to the system's equator (many of the
> molecular studies only encompass the galactic plane). yy represents the
> southmost point of a northern hemisphere coverage (Sloan, for instance)
> zz represents the northmost boundary in a southern coverage (eg, LCO
> redshift survey).
We spent over a year developing the space-time metadata, and this document
simply refers to it for these definitions.  Polygon certainly does
everything that Box does, plus more.  Plus, the edges of the polygon can
either be great or small circles (though the notation for this probably
requires XML).

> {Content Metadata} -> { Coverage Spectral}
>
> coverageSpectral: IMHO, what's proposed applies very well
> to a particular data column, however, we may want to talk about
> describing catalogues or tables, and here is where we should adopt
> some convention, perhaps in the line you suggest, ie, that what
> goes as argument of any of the coverageSpectral*Wavelength fields
> should be "lists of elements". Imagine a catalogue with SII, OIII,
> Halpha and a couple of other spectral line fluxes. Either we allow
> multiple coverageSpectral*Wavelength entries or we allow the list
> as an argument to a unique field.
Coverage.Spectral is a list and can take on multiple values.
Coverage.Spectral.Bandpass is also a list, and this is where I would imagine
spectral line names being used.

> Alternatively one could use something in the line of
> coverageSpectralRangeWavelength lambda1i lambda1f, lambda2i lambda2f
> etc.
> with the argument being a list, imagine combined Xray and optical
> catalogues. if lambda_min is in the Xrays and lambda_max is the
> I-band, it does not mean that UV is included in the set.
The explicit spectral wavelength metadata elements came from the AstroGrid
resource metadata.  As the document says, the idea is to be inclusive rather
than overly specific.

> {Content Metadata} -> { Coverage Temporal}
> coverageTemporal:
> StartTime and StopTime are fine for describing single observations,
> but how should we describe the temporal coverage of a mosaic image
> (eg, HDF)? or a composite radial velocity cube observed over
> several years but having a short integration time?
> Even an innocent looking Vmag, B-V, V-I may imply several observing
> intervals.
As an inclusive interval, just like the spectral wavelength ranges.  I do
not think it is practical to get more detailed than this.

> Perhaps we also need the argument to be a list of start/end pairs.
> I know that this only applies to very specific metadata, but it
> would be nice to build the capacity from the beginning to prepare
> for the time we would like to use as much temporal resolution as
> possible.
>
> {Content Metadata} -> { Coverage Object Density}
>
> What is typical? Some catalogues are more or less homogeneous and one
> could take the mean, but others are quite inhomogeneous
> (particularly those reflecting observational biases) Can we have
> something like
> ObjectDensity: mean value, minimum value, maximum value
We could, if people think it's really useful.

> It is conceivable that I could be interested in selecting
> catalogs based on their max-Object-Density being greater than a
> certain limit if I'm looking for highly observed aread. the mean
> will just not cut it.

>
> {Content Metadata} -> { resolution }
> a) why strings?
Should be floats -- sorry.

> b) is this enough to describe interferometry?  (I have no idea)
Not in detail, no, since one can construct images of different resolution
depending on how one tapers the visibility data.  However, a collection of
processed images will have some maximum angular resolution, or typical
angular resolution, and this is what I would expect a publisher to provide.
If the collection is visibility data, then the appropriate angular
resolution value is probably the best that can be attained with proper image
processing.

> {Content Metadata} -> { ContentLevel }
>
> I have a personal bias with the word "general". To me it means
> "the least common denominator", "not particularly specific", as
> in "general use", "general audience". I can't think of something
> better right now though :-(
That's the intent -- a "general audience".

> {Content Metadata} -> { Facility }
> {Content Metadata} -> { Instrument }
>
> I would transform these elements into lists as combinations are
> likely to occur.
Yes, I agree.  Came across this when describing the HST archive, for
example, where Facility = HST and Instrument = FOS, GHRS, FOC, HSP, WFPC1,
WFPC2, NICMOS, STIS, ACS, FGS.  Another way to do this is to say Instrument
= Various, which is some cases may be the only thing you can do.

> That's all folks! :-)
>
> Cheers,
>
> Patricio

A general issue -- we can make the metadata very specific, very detailed.  I
believe we have to strike a balance, though.  If data providers are
confronted with the task of populating hundreds of metadata fields in order
to publish a resource, one of several things will happen:
o  They will not bother to publish the resource at all
o  They will leave many elements blank, making potential queries against
these fields useless
o  They will fill in wrong information, making potential queries against
these fields misleading

I continue to work on the basis that, to first order, the registries should
favor inclusion of resources over exclusion, and that we should collect
enough information to discriminate among resources, but not so much as to be
self-defeating.

Thanks for the careful reading, Patricio, and the feedback.

Bob