Data set metadata schemas

Tony Linde ael at star.le.ac.uk
Tue Jun 17 13:04:23 PDT 2003


Hi Anita,

Thanks for all that. Are these the sum total of changes made to RSMv7? Or do
you need more time to come up with a full commentary on RSMv7?

Cheers,
Tony. 

> -----Original Message-----
> From: owner-registry at eso.org [mailto:owner-registry at eso.org] 
> On Behalf Of Anita Richards
> Sent: 17 June 2003 20:06
> To: registry at ivoa.net
> Subject: Data set metadata schemas
> 
> 
> 
> 
> I have learnt from the debate on Registry Schema but I can't 
> add much except to say that I think the differences will only 
> be resolved in use.  We should start applying the schema we 
> have got to real data sets and science queries (even if these 
> have to be carefully selected at first) - and that seems to 
> mean starting from 
> http://www.ivoa.net/internal/IVOA/IvoaResReg/ResourceServiceMe
> tadataV7.pdf
> (RSMV7). So to that extent I agree with Bob, and maybe that 
> is not controversial as I think I am only talking about what 
> Tony says it is OK for, i.e. "Sky coverage services" if that 
> is taken to include spatial spectral and temporal coverage 
> and other metadata tied to data sets. That is, what 
> information does the Registry need to hold to select 
> potentially useful data sets and use their metadata to select 
> the appropriate services to then access the data for 
> processing (but I am not commenting on the parts of the 
> registry which describe appropriate services).
> 
> I would like to understand better how feasible it is to link 
> sections, for example an entry in COMMUNITY may be the same 
> as one for Contributor in CURATION.
> 
> We also need to think a bit more about how to aquire the 
> dataset metadata. At least at first, we want to make sure 
> this is done in an examplary fashion because we will be 
> judged by the results, so it is no good using difficulty in 
> getting information as an excuse.  In my experience with the 
> 4 data sets so far, the relevant information is not held in 
> one place, it requires human searching of web-sites and human 
> discrimination, for example to decide what is the region of 
> regard for a catalogue - PSF (but what about systematic 
> errors)? Pixel size (but this is arbitrary in radio images)? 
> Largest error given (may be spurious/huge)? Eventually we 
> will have algorithms to help decide but these will be evolved 
> through experience, not trying to imagine all possible 
> circumstances.  For data sets which are actively curated, we 
> can ask someone to fill in a questionaire, however, again, we 
> will only discover what is open to misinterpretation after a 
> few rounds with archivists.  More seriously, we do not yet 
> have the kudos to get people to fill them in unless they are 
> already VO enthusiasts and even then they often just point 
> you at web sites with (usually) far too much detail.
> 
> However, I suggest that we start by designing a plain text 
> form which gives examples/selections where appropriate.  This 
> could be interpreted and written to xml using a perl script, 
> which would also catch the commonest ambiguities 
> (metre/meter) and unit conversions. We can progress to a web 
> form as long as it is really platform-independent and avoids 
> problems with over-long selection lists, instability if 
> completion is interrupted etc.  We are going to have to solve 
> this problem anyway of course for user input! The protocols 
> for submitting data sets to CDS are one precedent, I would 
> welcome comments from people involved with that.
> 
> The AstroGrid Registry work-group have created a set of 
> Resource Registry schemas for AstroGrid.  These are based on 
> RSMV7 with a few additions suggested by trying to use them to 
> describe four real datasets.  I apologise for the baby xml, I 
> am trying to learn - all mistakes are my responsibility 
> alone. I also apologise for possibly reinventing (but less 
> adequately) the schemas linked to 
> http://www.ivoa.net/internal/IVOA/IVOARegWp03/MDinXML-Summary.
> html - however I think I am covering a small part of this in 
> more detail.  I also note that these are based on RSMV6 which 
> explaiins some of the differences in organisation.
> 
>  You can find my schemas for AstroGrid at 
> http://wiki.astrogrid.org/bin/view/Astrogrid/RegistryIt02Schem
> a - see a little way down the page:
> 
> ------------------------------
> ------------------------------
> 
> "Iteration 2 resource registry schema
> 
> ...
> resourceRegistry.xsd and the component schemas for describing 
> an astronomical/solar/STP resource: identity.xsd, 
> curation.xsd, content.xsd, service.xsd." ...
> 
> "Examples of the identity, curation and content xml files (in a single
> file) have been prepared for the 1XMM (x-ray sattelite), SURF 
> (Solar), USNO-B (reference stars), WFCSUR (Isaac Newton 
> Telescope survey) archives."
> 
> and
> http://wiki.astrogrid.org/bin/view/Astrogrid/RegistryUnits
> which explains where/why I have added to RSMV7.  In summary, 
> the differences are:
> 
> CURATION
> 
> 1) I have added some elements to describe the size of data sets - in
>    Mb, and for tabular data nRows/nCols, or nPixels for 'image' data
>    (extensible to any number of dimensions).  This is to aid
>    optimising the order of query execution and in case servers have
>    limits on the size of data which can be returned/need to invoke a
>    cutout server for images etc.
> 
> CONTENT
> 
> 2) Added element for UCDs - this will be for dumb matching at first,
>    can become more sophisticated or moved to a different level as UCDs
>    become more sophisticated.
> 
> 3) Added spatial region Healpix - this is the CMB way of indexing the
>    sky, added at the request of the Planck people.  At the coarsest
>    there are 12 regions.
> 
> 4) In a future iteration we should extend region of regard to the
>    spectral and temporal regimes.  NB I don't think this is the same
>    as resolution in most cases; for source lists the error may be
>    greater than the resolution (e.g. systematic errors due to
>    reference source position uncertainty) or less (point source at
>    good signal-to-noise); for images the spatial size of a single
>    image is the same as the resolution for e.g. 1D radio spectra, but
>    not for a radio synthesis or a CCD image.
> 
> 5) Added UNKNOWN to cframe types, spectral waveband coverage, might
>    want this elsewhere as well.  At present this is mainly because I
>    do not know how to deal with solar data but it might be a useful
>    general distinction between 'exists but unknown' v. 'NULL'.
> 
> 6) In a future iteration, add after object count coverage etc., the
>    spatial fraction of the BOX etc. covered by images, and similarly
>    for spectral and temporal coverage.
> 
> 7) Added Resolution (spatial spectral temporal)
> 
> 8) Added Data Quality (spatial spectral temporal)
> 
> Other future additions
> 
>    * Allow coverage to include multiple non-contiguous regions in
>      spatial spectral and temporal domains, e.g. to allow for discrete
>      radio wavebands covering 1.3-1.7 GHz, 4.5-6.7, 21-24 GHz etc.
>      These are not bandpasses because any individual observation
>      probably only covers a smaller region e.g. 16 MHz within this.
>      Similarly for optical observations of variable stars which can
>      only be observed when they are in the night sky, etc. This is a
>      more spohisticated variant on 6) above.
> 
>    * Allow element values to be inter-dependent, e.g. the radio
>      resolution depends on wavelength and in the above example varies
>      by a factor of almost 20.
> 
>    * We discussed adding (probably to CURATION) a set of elements to
>      cover linked data sets, for example if the properties of observed
>      sources and the coverage of the facility used are in separate
>      tables. However this may be covered by the separate Data
>      Collection section? Or is this more like an entire data centre
>      e.g. MAST, LEDAS?
> 
> SERVICE
> 
> 9) Added maximum image size allowed by service to the restrictions.
>    There could be more, e.g. maximum time interval to search etc.
> 
> best wishes
> 
> Anita
> 
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - Dr. Anita M. S. Richards, AVO Astronomer MERLIN/VLBI 
> National Facility, University of Manchester, Jodrell Bank 
> Observatory, Macclesfield, Cheshire SK11 9DL, U.K. tel +44 
> (0)1477 572683 (direct); 571321 (switchboard); 571618 (fax).
> 




More information about the registry mailing list