too much complexity?
Ray Plante
rplante at poplar.ncsa.uiuc.edu
Wed Sep 17 09:21:05 PDT 2003
Hi Roy,
I understand your concern. My overall goals in this effort include
* modularity and extensibility: use as little or as much metadata as
your application needs.
* clarity and easy access to meaning: I stress having ready access to
definitions (that's why I put definitions right in the schema and
use them to create dictionaries and help files).
* re-usability: so that we're not unnecessarily creating new metadata
to capture the same thing.
* operate well with tools
We are charting new waters here, so some things we won't get right the
first time. Plus, things chang fast--its not possible to have everything
at production quality. Let me ask you, as an NVO developer, a couple of
questions:
* With the two posted versions of VOResource, did you look at the
associated overview documents? Did you find them helpful?
* Do consult the definitions in the schemas as an aid to understand
them? How about the related documents, like RM or the data
dictionary? (Clearly, one can't get the big picture from
individual definitions.)
* What would help communicate the big picture to keep from getting lost
in the details?
* Do you feel that progress towards consensus is not happening fast
enough?
More specifically...
On Tue, 16 Sep 2003, Roy Williams wrote:
> This registry schema is getting to be very complex. Even to understand the
> simplest xml instance, there need to be 6 or 8 schemas ingested. When we
> make binding tools for VOResource, there are hundreds of classes generated,
> one for each element.
In v0.8.1:
VOResource (core): 32 elements
VOOrg: 3
VODataService: 15
VOPerson: 2
> I am reminded of a
> Bill going through Parliament, having special interests adding their own
> pork-barrel projects. The rule in NVO is not to attempt completeness, but
> rather to get 95% of the use cases with 20% of the work. How can we return
> to this maxim?
I don't think we're trying for the 95%. What is in there, for the most
part, I believe, come from current needs.
> (1) Is this schema modular? Do I need to parse all the optional modules in
> order to work with the core?
Yes, it is modular, and it is intended that you do not have to parse the
optional modules in order to work with the core. Binding tools, however,
may not be set up well to do that, while other parsing tools can handle
this better. This is one of the things we need to learn how to cope with.
This is the research.
Do you really want extensibility? Do you want to be able to define your
"Elephant" resource with specialized elephant metadata? Then this is what
we have to figure out how to do.
> What is the semantic nature of the core module?
It describes generic resources. DataCollections and Servives are
specializations of a generic Resource.
> (2) What is the list of metadata formats that the registry covers? To me it
> is Services, Datasets, Projects, Organizations. Why are "people" still in
> the registry? Can't Astrogrid do their own thing somehow without bothering
> IVOA, since they are the ones that want this? They can make a "person"
> schema that includes VOResource, rather than forcing VOResource to include
> "person".
Recall from previous discussions:
* VOPerson is not meant to be part of the core metadata proposal.
* If you don't want to support Person resources, you do not have to.
(Don't create the classes for these. Barf when see Person
resources if you don't want to be bothered.)
* Tony has said that handling descriptions of people's preferences and
privleges are a requirement for AstroGrid. This is a perfect
example of how a project can create an extension for its own
purposes. If it's useful in a wider context, the IVOA can adopt it
as a standard. We should encourage this.
Why do you feel "bothered" by "people"?
> (3) What small committee is responsible for additions -- and pruning -- in
> the light of experience? Let us form this in Strasbourg. What is the best
> number of people? 6? 10?
(I underscore Tony's response here.)
> (4) Why are there suddenly five kinds of linking relationship? If simple
> "citation" is good enough for the Journals, why is it not good enough for
> VO? Half the people filling in these forms will do nothing in response to a
> complicated question -- and so we lose metadata -- but they will recognize
> and respond to the word "citations".
Do we need to show that one resource is a mirror of another somehow? Is
this an important issue for compatibility with ADEC identifiers, which are
location-independent, and which you feel is a high priority?
What Tony has suggested is an approach to describing relationships that
will actually reduce complexity in the future as we find the need to add
more. It puts them all in one place.
> (5) If a Fortran programmer even older than me approaches the registry to
> publish, or to query, can we make something understandable for him/her? What
> does that form look like? Our primary purpose is capturing that metadata,
> not pandering to the most complex cases.
What would be helpful are some examples of simple queries or input forms
we want to create and test whether we can accomplish this simply.
If you want to present a simple form, then leave out the bits you want.
As I've mentioned, we've been working on schemes for doing this which I
would be happy to share with you.
> (7) Am I the only one with these mutinous thoughts?
Probably not, I'm sure. However, I think that as IVOA developers, it is
our job to put ourselves on the front line of difficult, complex issues.
In doing so, we try to protect the lines behind us: the data providers
and the users. Yes, we don't try to solve the entire problem now, but we
also don't paint ourselves into corners by not thinking ahead--that's
where modularity and extendible architectures come in.
I think on a few Z39.50 metadata schemas from previous decades. The
BIB-1, used by librarians, has over 300 terms. GEO-1, for earth science,
has over 500. If you look at these, you'll see incredible redundancy.
Talk about complexity. How do we support the simple when it needs to be
simple, and how do we extend to the complex without things getting
unwieldy?
cheers,
Ray
More information about the registry
mailing list