AstroGrid registry structure

Wed Jun 11 13:43:35 PDT 2003

Hi Tony,

In response to your take on AstroGrid registries
(http://wiki.astrogrid.org/bin/view/Main/TonyOnRegistryStructure), I
wanted to bring up three issues.  First is a question on where
interoperability is required, second is about motivating your
modifications to the data model, and third is about an approach to
using XML.  I'm warming up to the idea of metadata modules, but I do
have concerns about what we are trying to achieve and how we progress
toward it.

So looking at your UML diagram, I'm wondering where you see
interoperability being required.  You make the statement that putting
all resources into one registry "was considered impractical and
unnecessary."  Obviously given our modeling and prototyping to date,
we in the NVO project would disagree with this statement.  This brings
up a philosophical question, where can we agree to disagree?  Can our 
our interoperability requirements afford us sufficient freedom to
implement solutions the way to serves our individual projects best? 

I think we have been assuming that we would use common metadata
schemas for exchange of metadata--that's represented by the bottom
half of your diagram.  Do you see us sharing the top half,
representing registry structures, as well?  Given that the structure
of the bottom half is based on the top, it suggests the answer is yes.
When I think of the top half in terms of interfaces, I don't see a lot
of difference between many of the components there.  What specifically
do you see as the difference between the different registries?
Different software?  Different interfaces?  Different port addresses?

I could also use further clarification on what the drivers are for a
redesign of the metadata model.  In particular, I would like to
understand how our current model is insufficient.  Since we have put
together a prototype that is actually being used by the Data Inventory
Service, I wonder if it is premature to consider it "impractical".  I
have some notions about what your concerns are, but I feel the current
model addresses them (or at least from my understanding of them, I'm
not catching on to how our current model fails to address them).  

There is first the concern about putting lots of different kinds of
resources in the same registry.  As I've described, I don't see this
as a difficult problem to deal with.  I've discussed how the model
affords full freedom to the various players in the registry game to
deal with as much complexity as they care to.  (In particular, if you
want to segregate resources into different, separately accessed
registries, you are free to do so.)  This flexibility is born out in
our prototype as well: Tom's DIS is only interested in cone search and
SIA services, and that's all he gets even though the registry contains
descriptions of all types of resources.

There is the concern that some or much of what is currently called the
generic resource metadata does not apply equally well to all
resources.  That is a legitimate concern which I agree with at some
level.  The solution that remains consistent with the model is to
move the metadata of limited applicability to the extensions of the
appropriate class.  That said, I don't think it is as severe a problem
as I sense you think it is.  First, if you look at the definitions of
the various items in detail they are not as inappropriate as you
think.  For example, Creator is the "entity primarily responsible for
making the content of the resource."  If we are talking about a person
as a resource, it is not outlandish to interpret Creator such that the
persom him/herself is the one responsible.  Second, making most of the
items optional allows the registrant to not fill in items that seem
inappropriate to the resource.  

There is also the assertion that metadata modules make the model more
flexible.  I think the key difference is that in my model, which
metadata items can be included is controlled completely by the schema
defining the class.  More precisely, a class is defined by (1) a
semantic definition, and (2) a defined set metadata used to describe
it (i.e. in XML-speak, its type).  In your model, the class defines
some of the metadata, but the rest is the choice of the registrant
(yes?).  This is interesting.  However, with the way you've diagrammed
the modules, I'm concerned that it dilutes the usefulness of classes.
First, if an Community resource attaches Service metadata, does that
confuse what kind of resource it is?  In particular, is it possible
that one might not find it in a search for services (because it's held
in a different registry)?  It might make more sense to have the
modules be more orthogonal to classes; e.g. a Coverage module.  My
other concern has to do with how MMs are handled in XML, which is my
next issue.

So, while I think the metadata module idea is worth examining, I'm not
convinced that an alternative design--particularly the your notion of
classes based on segregated registries--is sufficiently motivated, at
least without some further prototyping.  I hope I'm not just being
protective, but my sense is that your diagram is more constrained and
less extensible.  Of course, this is not a problem if your model is
meant to be an AstroGrid rendering of a less constrained model that we
interoperate with.  

Finally, I wanted to stress the importance of considering how we
render metadata in XML.  The approach I've strived for is to leverage
as much as possible the mechanisms defined in XML standards.  The
more we do this, the less software we have to write.  XML's type
system and use of namespaces provides a natural and elegant way to
evolve our metadata (including the definition of class extensions).
The disadvantage of metadata modules is that it suggests a mechanism
outside of the XML technologies for "contracting" support for
different schemas.  That is, the modules supported are listed in some
tagged way in the resource metadata.  We have to implement mechanisms
(on the client and server sides) for figuring out which modules we
want and to retrieve them.  It suggests that getting all the
information desired may require multiple calls as a opposed to a
single call to retrieve all the information you need.

(In contrast, breaking an identifier into its components when rendered
in XML is a good idea as opposed to just tagging a URI.  This means
you don't need a separate parser to separate the components on colons
and slashes; you just use the XML parser.)

Other important XML leveraging issues include being able to use tools
that auto-generate software from classes.  I would also like to be
able to auto-generate metadata dictionaries that are reasonably
comprehensible to humans.  I like metadata models which result in
XPaths that are reasonably intuitive to construct.  I think the key to
most of these goals is a data model that captures real meaning.  Any
reorganization of the model should take this into account.

So [breathe], I am hoping that we wouldn't just yet throw out what we
have.  We should dig a little deeper into metadata model as it is
currently defined in the RSM and VOResource schema to see if individual
items need be be moved around or definitions modified.  It would be great
if we could see more prototypes implemented; does the current model allow
one to effectively segregate resources (truely or virtually)?  And it
would be good to see how this notion of metadata modules--i.e. extra
metadata attachments chosen by the registrant--could be incorporated using
standard XML typing and namespace mechanisms.

cheers,
Ray