AstroGrid registry structure
Ray Plante
rplante at poplar.ncsa.uiuc.edu
Thu Jun 12 23:16:02 PDT 2003
Hi Tony,
I have a number of comments about your registry metadata model, but I
won't attempt to get into all of them now. For now, I'd like to just talk
about general modeling issues.
The general concern I have looking at your UML and schema diagrams is that
many of your entities represent catagories rather than concepts. This is
reflected in the names of your entities (classes in UML and elements in
XML), and in how you translate your model into XML.
My point about the names was more obvious in your first & second versions
of the UML diagram. You had names exclusively like "ClassMD",
"ServiceCMd", and "ServiceMM". In the XML, you have less of this: there
is "Service" and "Resource", but you still have "Class" and
"ServiceClass". To me, these represent catagories for collecting entities
that are related in some way, rather than capturing any particular
meaning. A Service and a Resource represent concepts we are trying to
capture; in contrast, "Class" is a construct reflecting the organization
of the entities, not a concept that is directly used by the consumer.
Obviously, I think meaning is vital to defining metadata, but I'll get
into why later.
A good test is to try forming some XPaths to some of your nodes. In my
model, all elements have meaning relevent to consumers; when combined in
the hierarchy, the individual elements are meant to combine naturally into
a combined meaning. For example, in VOResource,
"Resource/Content/Description" is a description of a resource's contents.
"Service/Capability/StandardURL" is the URL that describes the standard
capabilities of the service. In contrast, what meaning is captured in
"Resource/Class/ServiceClass/Service"? It is, at least, less intuitive.
A big part of the problem is how you have translated your UML
relationships to XML: the difference between "has-a" and "is-a" is lost.
Elements that contain other elements reflect containment. Service is a
descendent of Resource: are you trying to say that a Resource has a
Service? A Service has a SkyService? Conflating the two types of
relationships diminishes the meaning that metadata is meant to convey, and
the elements turn into shopping bags.
I recommend using the XML Schema mechanisms of subtyping (primarily
extension) and substitutionGroups to capture the is-a relationship. When
you do, you'll find many of your layers disappearing. (see below for
details.)
Meaning is crucial to metadata as it defines how the information is meant
to be used. In the "old" days when we defined keyword-value pairs, each
keyword had a meaning associated with it. The limitation is that complex
concepts could not be described with a single value; several components
are needed. With XML, we can now define a complex concept like "Position"
with meaning and have its components, a longitude and latitude, contained
within that metadatum. This is what I feel we are trying to capture.
The metadata dictionary I pointed you to earlier was not just a cool
after-thought--it's central to what I'm trying to achieve.
Please have a look at section 2 of the VOResource Overview document
(http://www.ivoa.net/internal/IVOA/IVOARegWp03/MDinXML-Summary.html). It
represents the beginnings of a style guide for defining metadata with XML
Schema that I've been working on in conjunction with the VOResource
modeling work. It's a work in progress, but I believe the basic ideas
there are applicable to all our data modeling--not just for registry
metadata. If there are things there you don't agree with, I think we need
to talk about it. Consult the VOResource schema and example described in
section 1 for examples of these techniques (or just ask me questions).
cheers,
Ray
P.S: on a related item:
> we do not disagree on the registry content. Every
> resource will go in the/a registry....
> the summary lists the *full* registry mode; ...
> the uml diagram shows all resources as
> belonging to a common class of ResourceRegistry.
Okay. I saw both these in your documents; however, the "is-a" symbol
under ResourceRegistry confused me. I interpreted this as saying that
a ResourceRegistry comes in one of the three forms below it. Should
it be a "has-a" relationship with "0..1"?
More information about the registry
mailing list