Data set metadata schemas
Tony Linde
ael at star.le.ac.uk
Tue Jun 17 13:04:23 PDT 2003
Hi Anita,
Thanks for all that. Are these the sum total of changes made to RSMv7? Or do
you need more time to come up with a full commentary on RSMv7?
Cheers,
Tony.
> -----Original Message-----
> From: owner-registry at eso.org [mailto:owner-registry at eso.org]
> On Behalf Of Anita Richards
> Sent: 17 June 2003 20:06
> To: registry at ivoa.net
> Subject: Data set metadata schemas
>
>
>
>
> I have learnt from the debate on Registry Schema but I can't
> add much except to say that I think the differences will only
> be resolved in use. We should start applying the schema we
> have got to real data sets and science queries (even if these
> have to be carefully selected at first) - and that seems to
> mean starting from
> http://www.ivoa.net/internal/IVOA/IvoaResReg/ResourceServiceMe
> tadataV7.pdf
> (RSMV7). So to that extent I agree with Bob, and maybe that
> is not controversial as I think I am only talking about what
> Tony says it is OK for, i.e. "Sky coverage services" if that
> is taken to include spatial spectral and temporal coverage
> and other metadata tied to data sets. That is, what
> information does the Registry need to hold to select
> potentially useful data sets and use their metadata to select
> the appropriate services to then access the data for
> processing (but I am not commenting on the parts of the
> registry which describe appropriate services).
>
> I would like to understand better how feasible it is to link
> sections, for example an entry in COMMUNITY may be the same
> as one for Contributor in CURATION.
>
> We also need to think a bit more about how to aquire the
> dataset metadata. At least at first, we want to make sure
> this is done in an examplary fashion because we will be
> judged by the results, so it is no good using difficulty in
> getting information as an excuse. In my experience with the
> 4 data sets so far, the relevant information is not held in
> one place, it requires human searching of web-sites and human
> discrimination, for example to decide what is the region of
> regard for a catalogue - PSF (but what about systematic
> errors)? Pixel size (but this is arbitrary in radio images)?
> Largest error given (may be spurious/huge)? Eventually we
> will have algorithms to help decide but these will be evolved
> through experience, not trying to imagine all possible
> circumstances. For data sets which are actively curated, we
> can ask someone to fill in a questionaire, however, again, we
> will only discover what is open to misinterpretation after a
> few rounds with archivists. More seriously, we do not yet
> have the kudos to get people to fill them in unless they are
> already VO enthusiasts and even then they often just point
> you at web sites with (usually) far too much detail.
>
> However, I suggest that we start by designing a plain text
> form which gives examples/selections where appropriate. This
> could be interpreted and written to xml using a perl script,
> which would also catch the commonest ambiguities
> (metre/meter) and unit conversions. We can progress to a web
> form as long as it is really platform-independent and avoids
> problems with over-long selection lists, instability if
> completion is interrupted etc. We are going to have to solve
> this problem anyway of course for user input! The protocols
> for submitting data sets to CDS are one precedent, I would
> welcome comments from people involved with that.
>
> The AstroGrid Registry work-group have created a set of
> Resource Registry schemas for AstroGrid. These are based on
> RSMV7 with a few additions suggested by trying to use them to
> describe four real datasets. I apologise for the baby xml, I
> am trying to learn - all mistakes are my responsibility
> alone. I also apologise for possibly reinventing (but less
> adequately) the schemas linked to
> http://www.ivoa.net/internal/IVOA/IVOARegWp03/MDinXML-Summary.
> html - however I think I am covering a small part of this in
> more detail. I also note that these are based on RSMV6 which
> explaiins some of the differences in organisation.
>
> You can find my schemas for AstroGrid at
> http://wiki.astrogrid.org/bin/view/Astrogrid/RegistryIt02Schem
> a - see a little way down the page:
>
> ------------------------------
> ------------------------------
>
> "Iteration 2 resource registry schema
>
> ...
> resourceRegistry.xsd and the component schemas for describing
> an astronomical/solar/STP resource: identity.xsd,
> curation.xsd, content.xsd, service.xsd." ...
>
> "Examples of the identity, curation and content xml files (in a single
> file) have been prepared for the 1XMM (x-ray sattelite), SURF
> (Solar), USNO-B (reference stars), WFCSUR (Isaac Newton
> Telescope survey) archives."
>
> and
> http://wiki.astrogrid.org/bin/view/Astrogrid/RegistryUnits
> which explains where/why I have added to RSMV7. In summary,
> the differences are:
>
> CURATION
>
> 1) I have added some elements to describe the size of data sets - in
> Mb, and for tabular data nRows/nCols, or nPixels for 'image' data
> (extensible to any number of dimensions). This is to aid
> optimising the order of query execution and in case servers have
> limits on the size of data which can be returned/need to invoke a
> cutout server for images etc.
>
> CONTENT
>
> 2) Added element for UCDs - this will be for dumb matching at first,
> can become more sophisticated or moved to a different level as UCDs
> become more sophisticated.
>
> 3) Added spatial region Healpix - this is the CMB way of indexing the
> sky, added at the request of the Planck people. At the coarsest
> there are 12 regions.
>
> 4) In a future iteration we should extend region of regard to the
> spectral and temporal regimes. NB I don't think this is the same
> as resolution in most cases; for source lists the error may be
> greater than the resolution (e.g. systematic errors due to
> reference source position uncertainty) or less (point source at
> good signal-to-noise); for images the spatial size of a single
> image is the same as the resolution for e.g. 1D radio spectra, but
> not for a radio synthesis or a CCD image.
>
> 5) Added UNKNOWN to cframe types, spectral waveband coverage, might
> want this elsewhere as well. At present this is mainly because I
> do not know how to deal with solar data but it might be a useful
> general distinction between 'exists but unknown' v. 'NULL'.
>
> 6) In a future iteration, add after object count coverage etc., the
> spatial fraction of the BOX etc. covered by images, and similarly
> for spectral and temporal coverage.
>
> 7) Added Resolution (spatial spectral temporal)
>
> 8) Added Data Quality (spatial spectral temporal)
>
> Other future additions
>
> * Allow coverage to include multiple non-contiguous regions in
> spatial spectral and temporal domains, e.g. to allow for discrete
> radio wavebands covering 1.3-1.7 GHz, 4.5-6.7, 21-24 GHz etc.
> These are not bandpasses because any individual observation
> probably only covers a smaller region e.g. 16 MHz within this.
> Similarly for optical observations of variable stars which can
> only be observed when they are in the night sky, etc. This is a
> more spohisticated variant on 6) above.
>
> * Allow element values to be inter-dependent, e.g. the radio
> resolution depends on wavelength and in the above example varies
> by a factor of almost 20.
>
> * We discussed adding (probably to CURATION) a set of elements to
> cover linked data sets, for example if the properties of observed
> sources and the coverage of the facility used are in separate
> tables. However this may be covered by the separate Data
> Collection section? Or is this more like an entire data centre
> e.g. MAST, LEDAS?
>
> SERVICE
>
> 9) Added maximum image size allowed by service to the restrictions.
> There could be more, e.g. maximum time interval to search etc.
>
> best wishes
>
> Anita
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - Dr. Anita M. S. Richards, AVO Astronomer MERLIN/VLBI
> National Facility, University of Manchester, Jodrell Bank
> Observatory, Macclesfield, Cheshire SK11 9DL, U.K. tel +44
> (0)1477 572683 (direct); 571321 (switchboard); 571618 (fax).
>
More information about the registry
mailing list