too much complexity?

Ray Plante rplante at poplar.ncsa.uiuc.edu
Wed Sep 17 09:21:05 PDT 2003


Hi Roy,

I understand your concern.  My overall goals in this effort include 
  *  modularity and extensibility:  use as little or as much metadata as 
        your application needs.
  *  clarity and easy access to meaning:  I stress having ready access to 
        definitions (that's why I put definitions right in the schema and 
        use them to create dictionaries and help files).
  *  re-usability:  so that we're not unnecessarily creating new metadata
        to capture the same thing. 
  *  operate well with tools

We are charting new waters here, so some things we won't get right the 
first time.  Plus, things chang fast--its not possible to have everything 
at production quality.  Let me ask you, as an NVO developer, a couple of 
questions: 
  *  With the two posted versions of VOResource, did you look at the 
        associated overview documents?  Did you find them helpful?  
  *  Do consult the definitions in the schemas as an aid to understand 
        them?  How about the related documents, like RM or the data 
        dictionary?  (Clearly, one can't get the big picture from 
        individual definitions.)
  *  What would help communicate the big picture to keep from getting lost 
        in the details?
  *  Do you feel that progress towards consensus is not happening fast 
        enough?  

More specifically...

On Tue, 16 Sep 2003, Roy Williams wrote:
> This registry schema is getting to be very complex. Even to understand the
> simplest xml instance, there need to be 6 or 8 schemas ingested. When we
> make binding tools for VOResource, there are hundreds of classes generated,
> one for each element. 

In v0.8.1: 
   VOResource (core):  32 elements
   VOOrg:               3
   VODataService:      15
   VOPerson:            2

> I am reminded of a
> Bill going through Parliament, having special interests adding their own
> pork-barrel projects. The rule in NVO is not to attempt completeness, but
> rather to get 95% of the use cases with 20% of the work. How can we return
> to this maxim?

I don't think we're trying for the 95%.  What is in there, for the most 
part, I believe, come from current needs.  

> (1) Is this schema modular? Do I need to parse all the optional modules in
> order to work with the core? 

Yes, it is modular, and it is intended that you do not have to parse the 
optional modules in order to work with the core.  Binding tools, however, 
may not be set up well to do that, while other parsing tools can handle 
this better.  This is one of the things we need to learn how to cope with.  
This is the research.

Do you really want extensibility?  Do you want to be able to define your 
"Elephant" resource with specialized elephant metadata?  Then this is what 
we have to figure out how to do.

> What is the semantic nature of the core module?

It describes generic resources.  DataCollections and Servives are 
specializations of a generic Resource.

> (2) What is the list of metadata formats that the registry covers? To me it
> is Services, Datasets, Projects, Organizations. Why are "people" still in
> the registry? Can't Astrogrid do their own thing somehow without bothering
> IVOA, since they are the ones that want this? They can make a "person"
> schema that includes VOResource, rather than forcing VOResource to include
> "person".

Recall from previous discussions:
  *  VOPerson is not meant to be part of the core metadata proposal.
  *  If you don't want to support Person resources, you do not have to.  
       (Don't create the classes for these.  Barf when see Person 
       resources if you don't want to be bothered.)
  *  Tony has said that handling descriptions of people's preferences and 
       privleges are a requirement for AstroGrid.  This is a perfect
       example of how a project can create an extension for its own 
       purposes.  If it's useful in a wider context, the IVOA can adopt it 
       as a standard.  We should encourage this.

Why do you feel "bothered" by "people"?

> (3) What small committee is responsible for additions -- and pruning -- in
> the light of experience? Let us form this in Strasbourg. What is the best
> number of people? 6? 10?

(I underscore Tony's response here.)

> (4) Why are there suddenly five kinds of linking relationship? If simple
> "citation" is good enough for the Journals, why is it not good enough for
> VO? Half the people filling in these forms will do nothing in response to a
> complicated question -- and so we lose metadata -- but they will recognize
> and respond to the word "citations".

Do we need to show that one resource is a mirror of another somehow?  Is 
this an important issue for compatibility with ADEC identifiers, which are 
location-independent, and which you feel is a high priority?  

What Tony has suggested is an approach to describing relationships that 
will actually reduce complexity in the future as we find the need to add 
more.  It puts them all in one place.  

> (5) If a Fortran programmer even older than me approaches the registry to
> publish, or to query, can we make something understandable for him/her? What
> does that form look like? Our primary purpose is capturing that metadata,
> not pandering to the most complex cases.

What would be helpful are some examples of simple queries or input forms 
we want to create and test whether we can accomplish this simply.  

If you want to present a simple form, then leave out the bits you want.  
As I've mentioned, we've been working on schemes for doing this which I 
would be happy to share with you.

> (7) Am I the only one with these mutinous thoughts?

Probably not, I'm sure.  However, I think that as IVOA developers, it is 
our job to put ourselves on the front line of difficult, complex issues.  
In doing so, we try to protect the lines behind us:  the data providers 
and the users.  Yes, we don't try to solve the entire problem now, but we 
also don't paint ourselves into corners by not thinking ahead--that's 
where modularity and extendible architectures come in.  

I think on a few Z39.50 metadata schemas from previous decades.  The 
BIB-1, used by librarians, has over 300 terms.  GEO-1, for earth science, 
has over 500.  If you look at these, you'll see incredible redundancy.  
Talk about complexity.  How do we support the simple when it needs to be 
simple, and how do we extend to the complex without things getting 
unwieldy?  

cheers,
Ray



More information about the registry mailing list