table metadata and the registry

Tony Linde Tony.Linde at leicester.ac.uk
Tue May 8 06:21:04 PDT 2007


Hi Ray,

Thanks for that - it does make your proposal clearer. I'll try to make 
my own argument clearer.

 > to see is that all of the table information be retrievable in one URL.

That is certainly more reasonable. But what I don't understand is why, 
if we're to have a single method of getting the metadata, it cannot be 
returned in the format recognised by the registry: VOResource? It would 
take no more work to format the metadata as VOResource than some other 
format so why not stick with what we have, why come up with another format?

 > that once you have located the resource, you should go to the service
 > for the table metadata; even in this case, it is one extra call to
 > access the URL that retrieves it directly from the service.

As I said, this is better but, as an application writer, I know I'd want 
to hit only registries that contain all the information I need to 
complete a task (eg, if building a workflow, I'd want the user to be 
able to construct a query regardless of whether the service is currently 
up since the workflow execution engine will wait around until the 
service is available, not the user).

So, we would need to ensure that full registries can be distinguished 
form pointer-only ones and that applications can specify, at 
installation time, that they expect to be connected to a full registry.

 > You have to give me a little credit here :-)

I certainly do, Ray. I just want to determine whether your proposal can 
work and, at the moment, I see issues that aren't currrently addressed.

 > See the latest VOSI doc from Guy.  The motivation is address the
 > fine-grained metadata issue:  the former provides information intended
 > for the registry and the latter is a fatter record.  So while I see

As is obvious, I'd not read it. And, as I said, I can see no reason for 
more than one method which returns the full VOResource record.

 > what is the point of making them optional in place when you force the
 > user to provide it anyway in another.

They are only optional because we could not agree on making them 
mandatory - I am not talking about metadata which is inapplicable to a 
given service but that which is simply not provided by a service 
although it is applicable (table and column metadata).

 > The general idea is that over time, we can develop discovery services
 > that leverage more and more information about resources.  Not all of
 > this information need be expressable in a VOResource schema.

But, as I said previously and above, this will lead to registries 
serving information in ways no other does and so, applications which 
rely on that information will only be able to work against specific 
registries.

 > To put it in concrete terms, the AstroGrid registry effectively
 > pressures the NVO registries into supporting fine-grained table
 > ...  First, you
 > encourage your publishers to provide table metadata.  We harvest these
 > records which in turn go out to our users as a result of queries.  We
 > have to then help users make sense of this information.

I don't understand this - the information is the same whether you get it 
from the registry alone or from the registry plus the service. What is 
the difference?

 > When there are
 > problems with the information, it reflects poorly on us, not you.

Poor metadata information is a problem we all have to tackle, not just 
AstroGrid: it is exacerbated by poor registry population applications 
and has nothing to do with the type of registry.

 > Second, your application in effect encourages our publishers to provide
 > table metadata to our registry if they are to be used in your
 > application, because your application only gets this information from
 > the registry.

This is not going to change, however the metadata is collected, whether 
by entry into the registry, by VOSI:getWhatever from the service or by 
the metadata URLs you propose. We will continue to provide full registry 
information and applications that rely on it: it is the only effective 
way to build responsive applications.

It is also, I might add, the only way to discover resources based on the 
additional metadata. If someone wants to discover x-ray catalogs with a 
given type of informaiton (specified by a ucd), how does it do this from 
a pointer-only registry, apart from getting every possible x-ray source 
and querying every one of them?

I guess if we do follow your proposal it will, over the next couple of 
years, show what the application developers really want by the number 
that tie themselves to AG-style registries vs NVO-style registries.

 > We do need to have a common understanding of what qualifies as
 > "fine-grained" information and develop mechanisms of exposing it only
 > when desired.  I don't think we have this, yet, but I will offer my
 > strawman at the meeting.

I'd like to expose more to the list since I won't be at the meeting.

 >> Bottom-line, Ray. ...
 > I hope I have clarified that this is not what I am proposing.

I'm still not so sure. I think that digging into the repercussions of 
your proposal will show that it is a major change.

At the core, I think the fundamental disagreement is over how the 
service provides its metadata: either by a standard method, getWhatever, 
which returns a full VOResource record; or, by a URL which returns some 
yet-to-be-determined format.

One last query, if the URL only returns the 'extra' metadata, where does 
the core service metadata come from? The registry only? Does this mean 
the service provider has to maintain metadata in two locations? Surely 
one additional benefit of the getWhatever method is that a service 
provider can update their registry record simply by changing the 
VOResource record served up by getWhatever?

(Shall go and lie down now... :) )

Cheers,
Tony.

Ray Plante wrote:
> Hi Tony,
> 
> On Tue, 8 May 2007, Tony Linde wrote:
>>> I don't understand this.  Anybody who wants the fine-grained
>>> information
>>> can get it by following the URLs.  Anybody who doesn't want this
>>
>> But this is an enormous waste of time. I thought the VO was supposed 
>> to make
>> things better. What you propose will mean that anyone who wants to 
>> provide a
>> general query builder will have to query the registry for resources and,
>> when the user selects a resource, find the service and query it for table
>> information, then, for each table, query the service for column 
>> information:
>> all while the user patiently stares at a spinning hourglass. This is not,
>> IMO, an improvement on existing services.
> 
> This is not at all what I am proposing.  First of all, what I would like 
> to see is that all of the table information be retrievable in one URL. 
> (Multiple URLs might be allowed only as a means to enable very large 
> collections of tables.)
> 
> If you want the AstroGrid registry's search interface to return an 
> "expanded" VOResource that includes the table metadata for the benefit 
> of your query builder, I think that is fine.  Others will probably argue 
> that once you have located the resource, you should go to the service 
> for the table metadata; even in this case, it is one extra call to 
> access the URL that retrieves it directly from the service.
> 
> The important thing for registries is the form of the records we share
> through the harvesting interface.  If these records simply have pointers 
> to table metadata, then those registries that do not wish to manage this 
> information don't have to.  The AstroGrid registry can pull the table 
> metadata via the URL when the record is harvested.
> 
> This idea was conceived to fit well into what you are already doing.  
> For example, if we choose to use the table model from VOResource as the 
> standard format, then it is trivial to pull the table metadata from the 
> service and insert it into your internal copy of the VOResource record. 
> You have to give me a little credit here :-)
> 
>>> (getRegistration() or getMetadata()) to include it in is the service
>>
>> I thought there was only one method to get metadata and it returned the
>> VOResource record. I cannot see the need for more than one such.
> 
> See the latest VOSI doc from Guy.  The motivation is address the 
> fine-grained metadata issue:  the former provides information intended 
> for the registry and the latter is a fatter record.  So while I see the 
> reason, I don't think it will accomplish its goal.
> 
>>> Furthermore, with no guideline as to what information should go in
>>
>> I would certainly mandate that the full VOResource record be returned 
>> with
>> all the optional bits of that made mandatory.
> 
> How does this help the provider?  One of the reasons metadata are 
> optional is because they won't necessarily apply to all resources.  And 
> what is the point of making them optional in place when you force the 
> user to provide it anyway in another.  This is not a recipe for quality 
> metadata.
> 
>>> is all or nothing.  Not only does the URL solution allow a registry to
>>> choose what fine-grained information it collects, but also it does not
>>> require that that information fit into the VOResource format.
>>
>> Why would the registry care about non-VOResource information? And what 
>> use
>> is a registry which cannot supply the information a calling service
>> requires?
> 
> As a discovery service.  Some will argue that a client should be getting 
> information like table data directly from the service when it plans its 
> queries, but I don't want to prevent you from getting it all from your 
> registry.
> 
> The general idea is that over time, we can develop discovery services 
> that leverage more and more information about resources.  Not all of 
> this information need be expressable in a VOResource schema.
> 
>> Do we now have to specify all the levels of metadata that a
>> registry can and cannot supply?
> 
> While we may not agree at the moment on the best way to address the 
> fine-grained issue, I hope we can at least agree on what the problem is.
> 
> To put it in concrete terms, the AstroGrid registry effectively 
> pressures the NVO registries into supporting fine-grained table metadata 
> needed to support your query builder but which we feel should be handled 
> in a different way.  This pressure comes in two forms.  First, you 
> encourage your publishers to provide table metadata.  We harvest these 
> records which in turn go out to our users as a result of queries.  We 
> have to then help users make sense of this information.  When there are 
> problems with the information, it reflects poorly on us, not you.  
> Second, your application in effect encourages our publishers to provide 
> table metadata to our registry if they are to be used in your 
> application, because your application only gets this information from 
> the registry.
> 
> We need to find a way that allows a registry like AstroGrid to innovate 
> and provide new discovery and automated retrieval techniques that do not 
> force other registries to follow suit.
> 
>> Do we now have to specify all the levels of metadata that a
>> registry can and cannot supply?
> 
> We do need to have a common understanding of what qualifies as 
> "fine-grained" information and develop mechanisms of exposing it only 
> when desired.  I don't think we have this, yet, but I will offer my 
> strawman at the meeting.
> 
>>> metadata.  A simple service (provided by a registry) can translate that
>>> information into a standard format, so off the bat you get good
>>
>> How can the registry do that? None of the catalog services have 
>> *standard*
>> ways of providing metadata: a registry will have to implement separate 
>> code
>> for every potential service unless we specify new standards for these
>> URL-based metadata retrieval methods.
> 
> SIA has a *standard* way of getting the table metadata: FORMAT=METADATA. 
> A simple service that takes only an SIA base URL as a GET input can 
> apply a stylesheet to return this information in a standard format.  The 
> others have *standard* ways but they are all different.  A converter for 
> each one provides a single way to get the table metadata from all of them.
> 
>> Bottom-line, Ray. I think what you are proposing is a radical change 
>> to the
>> way the VO works. This turns the registry into a simple pointer to 
>> resources
>> and puts the onus on VO applications to do all the searching for 
>> metadata,
> 
> I hope I have clarified that this is not what I am proposing.
> 
> cheers,
> Ray

-- 
Tony Linde
Phone:  +44 (0)116 223 1292    Mobile: +44 (0)785 298 8840
Fax:    +44 (0)116 252 3311    Email:  Tony.Linde at leicester.ac.uk
Post:   Department of Physics & Astronomy,
         University of Leicester
         Leicester, UK   LE1 7RH
Web:    http://www.star.le.ac.uk/~ael

Project Manager, EuroVO VOTech   http://eurovotech.org
Programme Manager, AstroGrid     http://www.astrogrid.org



More information about the registry mailing list