Extensions to VOResource - was Error column in VOResource

Tony Linde tony at linde.me.uk
Tue May 18 22:57:44 PDT 2004


> I wonder if we are confusing what we mean by the term metadata?

Absolutely. We all, at times, use the term metadata to describe information
about the data, where the data was got from, how the data was got, how the
data was manipulated, the service which is providing the data... I came into
this work assuming metadata simply described the data itself then found
there was ten times more metadata about the telescope, the people who funded
the telescope, etc. Not only that but there were many hierarchies of
metadata above the data: metadata about the original raw data, about the ccd
the data was collected on, about the observation that collected the data,
about the session that observation was part of, about the programme the
session was part of etc. And the complicating factor is that no two datasets
will hold the same set of metadata and even if they do they probably wont be
represented in any comparable way.

What we need to do is factor out the most common metadata, ensure that that
can be extended and then look at standardising the extensions over time. If
we have this standard way of representing the metadata (DM-based schema) and
a standard way of storing and getting it (registry) then it is more likely
that in the future new missions will ensure their metadata conforms to the
standard and that their extensions are understood in advance so that when
the data appears everyone is ready to deal with it.

Cheers,
Tony.

> -----Original Message-----
> From: owner-registry at eso.org [mailto:owner-registry at eso.org] 
> On Behalf Of Doug Tody
> Sent: 19 May 2004 00:03
> To: Martin Hill
> Cc: registry at ivoa.net
> Subject: Re: Extensions to VOResource - was Error column in VOResource
> 
> I wonder if we are confusing what we mean by the term metadata?
> By metadata in the earlier discussion I referred primarily to 
> descriptive metadata used to describe a dataset - sky 
> coverage, bandpass, WCS, and so forth.  WSDL however refers 
> to the service interface.  Indeed, the service type and 
> interface, the capability matrix for a particular service 
> instance, etc., are all fairly static, well standardized 
> things which should be available via the registry.  I 
> certainly agree that all VO services should be registered and 
> described in the registry.  Doug
> 
> 
> On Tue, 18 May 2004, Martin Hill wrote:
> 
> > I'll second that.
> > 
> > I am concerned though that avoiding the standardisation 
> process *too* 
> > much will leave us with n x m connections between n 
> applications and m 
> > services.  Using the registry as a central connection point for 
> > metadata gives us long term resiliance - in fact 'loosely 
> couples' the 
> > applications and services (and indeed the service <-> service 
> > connections).
> > 
> > Services that dynamically generate data still have 
> consistent metadata 
> > - SIAP for example *could* be defined using WSDL and it 
> strikes me is 
> > such a common VO service that we should describe it in 
> VOResource so 
> > that discovery tools can find them and 'understand' what is 
> available 
> > at each one.  Such things as sky coverage, units used to select 
> > regions, other query parameters (incl UCDs), etc are all 
> things that 
> > apply to any queryable dataset, whether the data itself is 
> static or 
> > dynamically generated.
> > 
> > One-off or rare services are candidates for not trying to describe 
> > using VO metadata.  And certainly to begin with we should 
> concentrate 
> > on the the very common services.
> > 
> > Cheers (looking forward to more discussions on this over a few 
> > microbrewery pints)
> > 
> > Martin
> > 
> > Tony Linde wrote:
> > 
> > > I completely agree, Doug. We should standardize on what 
> we can agree 
> > > as a common standard - via the DM effort. But any 
> extensions should 
> > > follow some standard extension mechanism so that, as you 
> say, they 
> > > can at least be seen by users or included and passed on 
> by applications.
> > > 
> > > Cheers,
> > > Tony.  
> > > 
> > > 
> > >>-----Original Message-----
> > >>From: owner-registry at eso.org [mailto:owner-registry at eso.org] On 
> > >>Behalf Of Doug Tody
> > >>Sent: 18 May 2004 18:05
> > >>To: Tony Linde
> > >>Cc: registry at ivoa.net
> > >>Subject: RE: Error column in VOResource
> > >>
> > >>On Tue, 18 May 2004, Tony Linde wrote:
> > >>
> > >>
> > >>>Right back at ya :) ... How does an application know how 
> to handle 
> > >>>metadata that conforms to no known standard? Whatever the
> > >>
> > >>problems for
> > >>
> > >>>the registry, they are a thousand times worse for the apps since 
> > >>>there'll be thousands of applications wanting to use the 
> resources.
> > >>
> > >>We will never be able to standardize everything.  We will 
> never even 
> > >>be able to know about all the telescopes, survey projects, etc., 
> > >>being developed or underway around the world.
> > >> Even if we do know about a project it will be constantly 
> changing.  
> > >>All we can really hope to do is standardize the core, and 
> define a 
> > >>standard framework for things like resource description, dataset 
> > >>characterization, data formatting, etc.
> > >>
> > >>People will use these standard mechanisms, try to adhere to the 
> > >>standard core, but will need to add nonstandard 
> extensions to do new 
> > >>things, or to specialize the services, data model, or 
> data packaging 
> > >>to fully describe their data.
> > >>Sure, all applications will not be able to understand and 
> deal with 
> > >>the extensions, but this is how new standards develop, and some 
> > >>subset of applications will really need those extensions 
> to process 
> > >>certain classes of data, and will be written to do so.  
> So long as 
> > >>the service or dataset is compliant to some core model then all 
> > >>applications which support the core will work down to that level, 
> > >>ignoring the extensions.
> > >>Even nonstandard extensions can be useful if packaged in 
> a standard 
> > >>way, e.g., a human can browse them to better understand the data, 
> > >>generic searches can be performed, generic tools can be 
> used in an 
> > >>ad-hoc fashion, and so forth.
> > >>
> > >>Basically I am arguing that the standard VO framework should only 
> > >>try to go so far, but should be designed to be extensible.  If it 
> > >>tries to be all-inclusive it will be too complicated to 
> be used, and 
> > >>will never work anyway.
> > >>
> > >> 
> > >>
> > >>>I don't understand how the metadata can be dynamic 
> (other than by a 
> > >>>data centre accumulating more data). Surely the coverage of
> > >>
> > >>a dataset,
> > >>
> > >>>say, is based on the data in it? Even virtual data has to
> > >>
> > >>be generated
> > >>
> > >>>from some real data and it is on that data that the
> > >>
> > >>metadata is based.
> > >>
> > >>>Maybe some examples would help, Doug.
> > >>
> > >>This is all true for static archive data products, e.g., 
> precomputed 
> > >>survey images in an archive.  But what if we have, e.g., an image 
> > >>access service which generates images on the fly, e.g., image 
> > >>cutouts or mosaics?
> > >>Or perhaps the service generates images on the fly from 
> X-ray event 
> > >>data, applying a time filter in the process and 
> generating the image 
> > >>with the desired celestial projection?
> > >>SIA for example already supports all this.
> > >>Basically what happens is the client application tells 
> the service 
> > >>what it would ideally like to get back, the service 
> decides what it 
> > >>can provide, and returns metadata for one or more virtual 
> datasets 
> > >>which it can generate to satisfy the query.  The image is not 
> > >>actually generated until the access reference URL is invoked.
> > >>
> > >>What we need the registry for is to tell us what services are out 
> > >>there, what they are capable of, and the characteristics 
> of the data 
> > >>they can serve (specific data collections, bandpass, sky 
> coverage, 
> > >>etc.).  We also need to register all data collections and 
> be able to 
> > >>find services which can serve them up.  It could also be 
> useful to 
> > >>register individual static datasets within a data collection, 
> > >>including caching dataset metadata of some type (at least 
> that which 
> > >>uniformly characterizes the data at a high level).
> > >>This would start to provide a replica management capability for 
> > >>managing large data collections.  One has to ask though, whether 
> > >>this is something which should be provided by the 
> registry or by a 
> > >>separate replica management service.  If it gets 
> complicated enough, 
> > >>it may be better to split it off as a separate service in 
> order to 
> > >>avoid over-complicating the registry.
> > >>
> > >>Anyway, enough!  I have to get back to DAL stuff or I 
> won't be ready 
> > >>for next week.
> > >>
> > >>	- Doug
> > >>
> > >>
> > >> 
> > >>
> > >>>Tony.
> > >>>
> > >>>
> > >>>>-----Original Message-----
> > >>>>From: owner-registry at eso.org [mailto:owner-registry at eso.org] On 
> > >>>>Behalf Of Doug Tody
> > >>>>Sent: 18 May 2004 17:18
> > >>>>To: Tony Linde
> > >>>>Cc: registry at ivoa.net
> > >>>>Subject: RE: Error column in VOResource
> > >>>>
> > >>>>Tony, how does your approach handle services which 
> return virtual 
> > >>>>data, or datasets which contain metadata which has not been 
> > >>>>standardized?
> > >>>>
> > >>>>In the case of virtual data, the metadata for the virtual
> > >>
> > >>dataset is
> > >>
> > >>>>not static hence cannot be cached in the registry.
> > >>>> One has to ask the actual service what it can generate
> > >>
> > >>to service a
> > >>
> > >>>>specific query, and the metadata for the virtual dataset is 
> > >>>>generated on the fly.  Probably this sort of thing will
> > >>
> > >>be the case
> > >>
> > >>>>for most sophisticated VO services.  Hence, the registry
> > >>
> > >>is limited
> > >>
> > >>>>primarily to service discovery based on fairly high
> > >>>>level, static resource descriptors.    Doug
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>On Tue, 18 May 2004, Tony Linde wrote:
> > >>>>
> > >>>>
> > >>>>>As I keep saying, the coupling is an issue of
> > >>
> > >>implementation, not
> > >>
> > >>>>>design. We design the registry interface so that it is a
> > >>>>
> > >>>>one-stop shop
> > >>>>
> > >>>>>for all metadata but if one implementation gets the
> > >>>>
> > >>>>metadata from the
> > >>>>
> > >>>>>resource every time it is asked and another caches that
> > >>>>
> > >>>>metadata, it
> > >>>>
> > >>>>>is transparent to the calling application.
> > >>>>>
> > >>>>>
> > >>>>>>Making the registry a one-stop shop for metadata
> > >>
> > >>demands a tight
> > >>
> > >>>>>>coupling with the services they describe.  Any change
> > >>>>>
> > >>>>>I don't see why - the registry only needs to know how to
> > >>>>
> > >>>>ask for the
> > >>>>
> > >>>>>metadata and how to return it to the calling app. We
> > >>
> > >>will have a
> > >>
> > >>>>>standard way of getting the metadata (from Wil's 
> proposal for a 
> > >>>>>standard baseline for all services), a standard representation 
> > >>>>>(VOResource which includes the DM-based schema) and a
> > >>>>
> > >>>>standard way for apps to get that metadata (RI spec).
> > >>>>
> > >>>>>This is about as loose a coupling as I can think of.
> > >>>>>
> > >>>>>And if it *did* require tight coupling, all the more reason
> > >>>>
> > >>>>to put all
> > >>>>
> > >>>>>this processing into the registry, otherwise you end up
> > >>
> > >>with every
> > >>
> > >>>>>single application having to be tightly coupled to
> > >>
> > >>every resource
> > >>
> > >>>>>- but this is not the case.
> > >>>>>
> > >>>>>Making the registry the source of all metadata means that all 
> > >>>>>applications only have to manage one interface right down
> > >>>>
> > >>>>until they
> > >>>>
> > >>>>>select the service they want to invoke - they don't have to
> > >>>>
> > >>>>each and
> > >>>>
> > >>>>>every one be coded to fish around lots of services looking
> > >>>>
> > >>>>for the metadata they want.
> > >>>>
> > >>>>>T.
> > >>>>>
> > >>>>>
> > >>>>>>-----Original Message-----
> > >>>>>>From: Ray Plante [mailto:rplante at ncsa.uiuc.edu]
> > >>>>>>Sent: 18 May 2004 16:51
> > >>>>>>To: Tony Linde
> > >>>>>>Cc: registry at ivoa.net
> > >>>>>>Subject: RE: Error column in VOResource
> > >>>>>>
> > >>>>>>On Tue, 18 May 2004, Tony Linde wrote:
> > >>>>>>
> > >>>>>>>the registry is a one stop shop for all metadata.
> > >>>>>>
> > >>>>>>I disagree with this statement in general.  Besides
> > >>>>
> > >>>>various pratical
> > >>>>
> > >>>>>>reasons of scaling and scope, there is an issue of volitility.
> > >>>>>>
> > >>>>>>Making the registry a one-stop shop for metadata
> > >>
> > >>demands a tight
> > >>
> > >>>>>>coupling with the services they describe.  Any change in
> > >>>>
> > >>>>the service
> > >>>>
> > >>>>>>must be reflected back into the registry.  If the
> > >>>>
> > >>>>registry is simply
> > >>>>
> > >>>>>>about discovering services, the coupling is looser, and
> > >>>>
> > >>>>the service
> > >>>>
> > >>>>>>is more flexible to changes in implementation.
> > >>>>>>
> > >>>>>>It can be argued that the tighter the coupling, the more
> > >>>>
> > >>>>costly the
> > >>>>
> > >>>>>>system in terms of software development and coordination
> > >>>>
> > >>>>of people.  
> > >>>>
> > >>>>>>A tightly coupled design may be appropriate for a
> > >>
> > >>particular VO
> > >>
> > >>>>>>project that can manage that coordination; however, it's less 
> > >>>>>>appropriate for the IVOA as a whole.
> > >>>>>>
> > >>>>>>It's an interesting issue that I expect we'll learn more
> > >>>>
> > >>>>about with
> > >>>>
> > >>>>>>experience.
> > >>>>>>
> > >>>>>>cheers,
> > >>>>>>Ray
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>
> > > 
> > > 
> > 
> > 
> > 
>



More information about the registry mailing list