Data set metadata schemas

Anita Richards amsr at jb.man.ac.uk
Thu Jun 19 10:25:15 PDT 2003


> Anita -- thanks for the pointers.  For others who may wish to review Anita's
> suggestions, note that the correct URLs are
> http://wiki.astrogrid.org/bin/view/Astrogrid/RegistryUnits and
> http://wiki.astrogrid.org/bin/view/Astrogrid/RegistryServiceMetadataConcepts.
Thanks Bob, and apologies again...

> When you say something "needs a namespace of recognized abbreviations" (as
> in Ticker/ShortName and PublisherID) do you mean like a pick-list?  We have
> had some debate over whether ShortName really needs to be unique.  Stock
> symbols are, and conflicts are resolved with sometimes rather arbitrary
> choices (LUV for Southwest Airlines, for example).  So far it is only the
> Identifier that we have strictly required to be unique.  I guess my approach
> would be to leave ShortName to the discretion of the resource publisher, and
> see if non-uniqueness is really a problem.  Of the Sloan Digital Sky Survey
> and Schmidt Data Service System, who owns SDSS?  Does it matter that they
> both use the same initials?
I am not quite sure what it is for if it isn't unique.. but I am willing
to see how it works in practice.

>
> Am not sure about DataSize.  There can be many answers for a given resource.
> HST archive DataSize might be 10 TB (entire holdings), 300KB (number of rows
> in pointings catalog), 160 MB (size of an ACS observation data set), 10 KB
> (size of a GIF preview image), etc.
So maybe this needs more definition, but I do think that the Registry
needs to know
1) how much data it potentially has to search through and
2) how much data it might need to return.
I would include data on both catalogue and nDim data e.g. image size as
both are relevant depending on the query, and the size which is chosen is
the size of the 'unit' of holding which might have to be searched - so
probably a single catalogue or a single image, etc.  Like all these
things, we can't allow for everything in advance, but e.g. if I was
cross-matching stars from  the MERLIN archive with USNO-B it would be a
lot quicker to search MERLIN (of order 10^4 rows) before USNO-B.
Conversely, if I was after full-size iamges from the MERLIN archive, at
2048x2048 pixels I might want to pass them to a cut-out service first
('me' being an agent or a human).

>
> UCDList:  Not sure this is practical.  A resource could have information on
> hundreds of UCDs.  I'm not sure I could get through registering even one
> resource if I had to figure out in advance all the UCDs it might be able to
> provide information about.  And do we actually learn anything by putting in
> things like POS_EQ_MAIN_RA when we also have metadata about spatial
> coverage?
What I meant was, where the catalogues are already in Vizier, extract the
existing UCD list. One might want to allocate UCDs to other holdings but
that should be optional.  It is easy to do iafter the CDS fashion, and I
know we all know it isn't perfect but it is a cheap and nasty way of
seeing if catalogues already contain information on things which are not
otherwise in the Registry e.g. proper motion, colour.  I was thnking that
it would be used literally as a UCD matcher rather than the VO having to
do anything else, but it will keep us interoperable with Vizier/Aladin
whcih I think is important.  Eventually like the bourgeois state UCDs may
wither away...

>
> I think the place to handle healpix is in the Space-Time Metadata.
I will see what comes out of the next iteration.  And hopefully we will
get feedback from the Planck people, who have set up a VO working group.

> We need a general specification concerning what to do with unknown,
> unspecified, or unapplicable metadata elements.  For the RSM document I
> think I will use those or similar strings, recognizing that they might be
> implemented in other ways (NULL field in a database, for example, for
> unspecified).
>
> Another potential metadata element we may wish to consider adding is a
> bibliographic reference, namely, the bibcode.
>
> I don't think we need to debate the above points extensively right now.  In
> the interest of moving ahead, as recent e-mails from Francoise Genova and
> Andy Lawrence have advocated, I will now try to finish up the RSM draft as
> quickly as possible.
>

Awaited eagerly ...

thanks

Anita



More information about the registry mailing list