handling metadata with multiple values
martin hill
mch at roe.ac.uk
Wed Aug 6 03:17:20 PDT 2003
Just to add 2pennies/cents worth, we (astrogrid) came to the same conclusion
putting together the parameter-passing XML document for our SExtractor service
(ACE). We just haven't changed the code yet...
Quoting Ray Plante <rplante at poplar.ncsa.uiuc.edu>:
> Hey Marco,
>
> On Wed, 6 Aug 2003, Marco C. Leoni wrote:
> > quick question: what was wrong with the first choice (<SUBJECT>
> > <item>...</item> <item>...</item> </SUBJECT>) ?
>
> Two main reasons: one is you don't need these <item> tags in practice
> (rather they get in the way), and second is that they make SUBJECT as a
> piece of metadata conceptually more opaque.
>
> The motivation for the above pattern is to have a node that contains all
> the subjects together; however in practice, we found that with typical
> techniques for extracting metadata from XML, multiple nodes of the same
> type are usually packaged up together anyway. For example, when using DOM
> on:
>
> <SUBJECT>...</SUBJECT>
> <SUBJECT>...</SUBJECT>
> <SUBJECT>...</SUBJECT>
>
> one would use getElementsByTagName('SUBJECT') on the parent node to get
> all the subjects, returning it in a NodeList object. You can use this
> technique for all elements, whether you expect multiple values or not. On
> the other hand, with the <item> pattern, you have treat the multiple value
> case as a special case, first getting the <SUBJECT> node, and then
> returning the <item> nodes as a list. Thus, these <item>s really just get
> in the way.
>
> The same is true with other techniques. With Java Binding tools--such as
> JAXB, Castor, and MS XSD--multiple, sequential occurrances of SUBJECT will
> automatically be parsed into a list container (e.g. ArrayList). When
> using XPath, "SUBJECT" will return all of the subjects.
>
> (Perhaps Wil can give the concrete example that he encountered when we
> were working on our prototype registry.)
>
> My second reason is that <item>s clutter the meaning behind the metadata
> model. I would like to see schemas in which all our elements carry
> meaning that can be pieced together to create more complex meaning.
> XPaths, as pointers into the data model, can be a very effective way of
> carrying that meaning. A good example would be
> "RESOURCE/CONTENT/SUBJECT", which points to a subject of the resource's
> content. If you use my preferred pattern for listing these (with no
> <item>s), then this path points to the actual subject values; however, in
> the <item> pattern, you have to use "RESOURCE/CONTENT/SUBJECT/item" to get
> the values. I don't like this because "item" adds no additional meaning
> to the path--it just clutters it. (Really, this second reason is just
> an abstract form of the first reason.)
>
> Note that all of the above applies to this non-preferred pattern as well:
>
> <SUBJECTS>
> <SUBJECT>...<SUBJECT>
> <SUBJECT>...<SUBJECT>
> <SUBJECT>...<SUBJECT>
> </SUBJECTS>
>
> The extra layer is not needed.
>
> (Not quite a quick answer for a quick question ;-)
>
> cheers,
> Ray
>
>
>
>
>
--
Software Engineer
Astrogrid, ROE (www.astrogrid.org)
Mob: +44 7901 55 24 66
Fax: +44 131 668 82 64
More information about the registry
mailing list