handling metadata with multiple values

Wed Aug 6 06:56:50 PDT 2003

The concrete example is when you do a Sax Parser

In SAx you see each tag ...
if I see Subject I know imediately what it is
if i see <item> I htave to remeber the parent tag to know what it is and
this make life a little more difficult..

wil
On Wed, Aug 06, 2003 at 03:45:06AM -0500, Ray Plante wrote:
> Hey Marco,
> 
> On Wed, 6 Aug 2003, Marco C. Leoni wrote:
> >     quick question: what was wrong with the first choice (<SUBJECT> 
> > <item>...</item> <item>...</item> </SUBJECT>) ?
> 
> Two main reasons: one is you don't need these <item> tags in practice 
> (rather they get in the way), and second is that they make SUBJECT as a 
> piece of metadata conceptually more opaque. 
> 
> The motivation for the above pattern is to have a node that contains all 
> the subjects together; however in practice, we found that with typical 
> techniques for extracting metadata from XML, multiple nodes of the same 
> type are usually packaged up together anyway.  For example, when using DOM 
> on: 
> 
>    <SUBJECT>...</SUBJECT>
>    <SUBJECT>...</SUBJECT>
>    <SUBJECT>...</SUBJECT>
> 
> one would use getElementsByTagName('SUBJECT') on the parent node to get 
> all the subjects, returning it in a NodeList object.  You can use this 
> technique for all elements, whether you expect multiple values or not.  On 
> the other hand, with the <item> pattern, you have treat the multiple value 
> case as a special case, first getting the <SUBJECT> node, and then 
> returning the <item> nodes as a list.  Thus, these <item>s really just get 
> in the way.
> 
> The same is true with other techniques.  With Java Binding tools--such as 
> JAXB, Castor, and MS XSD--multiple, sequential occurrances of SUBJECT will 
> automatically be parsed into a list container (e.g. ArrayList).  When 
> using XPath, "SUBJECT" will return all of the subjects.  
> 
> (Perhaps Wil can give the concrete example that he encountered when we 
> were working on our prototype registry.)
> 
> My second reason is that <item>s clutter the meaning behind the metadata
> model.  I would like to see schemas in which all our elements carry
> meaning that can be pieced together to create more complex meaning.  
> XPaths, as pointers into the data model, can be a very effective way of
> carrying that meaning.  A good example would be
> "RESOURCE/CONTENT/SUBJECT", which points to a subject of the resource's
> content.  If you use my preferred pattern for listing these (with no
> <item>s), then this path points to the actual subject values; however, in
> the <item> pattern, you have to use "RESOURCE/CONTENT/SUBJECT/item" to get
> the values.  I don't like this because "item" adds no additional meaning 
> to the path--it just clutters it.  (Really, this second reason is just 
> an abstract form of the first reason.)
> 
> Note that all of the above applies to this non-preferred pattern as well:
> 
>   <SUBJECTS>
>     <SUBJECT>...<SUBJECT>
>     <SUBJECT>...<SUBJECT>
>     <SUBJECT>...<SUBJECT>
>   </SUBJECTS>
> 
> The extra layer is not needed.
> 
> (Not quite a quick answer for a quick question ;-)
> 
> cheers,
> Ray
> 
>