<div dir="ltr"><div><div><div><div><div><div><div><div>I've been giving this some thought over the long weekend.<br><br></div>I have been, and am still leaning toward adding a simple Party model to the DatasetMetadata model for use in normalizing the DataID and Curation elements a bit.<br></div>(and for general use/extension thereafter). I want to be careful with this, because it quickly leads to needing to review/revamp/define the Observation model, and <br></div>incorportate the Provenance Entity =>Activity pattern, which is beyond the scope of this version of that model.<br><br></div>Gerard: In the SimDM, this is just an ObjectType in the Resource package.. in the diagram you attached here, it is labeled as a Model. We could probably<br></div> do it as an imported model, defined in the DatasetMetadata document, that would require it's own prefix, etc.. . What do you think? Would it need to be <br></div> its own model? or is a Package under Dataset good enough?<br><br></div>I'll start a thread tagged for the [ds] model to go into the details.<br><br></div><div>Now, about the Party object, which you said is properly normalized in the SimDM, and basically matches the diagram(s)<br></div> <a href="http://www.ivoa.net/documents/SimDM/20120503/html/SimDM.html#SimDM:/resource/Party" target="_blank">http://www.ivoa.net/documents/SimDM/20120503/html/SimDM.html#SimDM:/resource/Party</a><br><div><div><div><div><div><br></div><div> Some of these attributes are clearly objects<br></div><div> + Name (of individual) { prefix, first, last, suffix, middle[*] }<br></div><div> + Address, the description even says 'all components of the address are given in one string'<br></div><div> + Telephone, has different forms depending on country<br><br></div><div> And there certainly can be >1 relation to Address, Telephone, and perhaps email.<br><br></div><div><div><div> It is this sort of thing which make it challenging for those of us working through these issues.<br></div><div> Why is it OK to denormalize Address in Party, but not Publisher in Curation?<br><br></div><div>On the more technical side.<br></div><div> The ContributorRole enumeration.. <br></div><div> This seems like an unlikely enumeration candidate because:<br></div><div> 1) high likelyhood for change as new roles are generated<br></div><div> 2) in vo-dml, Attribute, and Relation extend role, so would be redundant with the name of the relation<br></div><div> eg: Publication.author assigns the author role to the target<br></div><div> eg: Curation.contact assigns contact role to the target.<br><br></div><div>Do I have that right?<br><br></div><div>Mark<br><br></div><div><br></div><div><br></div></div></div></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jan 4, 2016 at 5:50 PM, Gerard Lemson <span dir="ltr"><<a href="mailto:glemson1@jhu.edu" target="_blank">glemson1@jhu.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Mark and Markus<br>
Comments below.<br>
<div><div class="h5"><br>
<br>
> On Wed, Dec 30, 2015 at 12:54:17PM -0500, CresitelloDittmar, Mark<br>
> wrote:<br>
> > A couple quick comments.<br>
> ><br>
> > 1) various [string 0..*] attributes<br>
> > I agree that all of the examples I gave represent concepts<br>
> > which **could** be expanded with other attributes. In that<br>
> > regard, modeling them as such would be more 'correct'.<br>
> > However, we have NO use cases which require any more than the<br>
> > simple string representation. So the question becomes "is it<br>
> > cost effective to do so?".<br>
><br>
> Let me put in a bit of experience from Registry modelling efforts<br>
> here, as I believe this is a classic of the old saying that there's<br>
> nothing more practical than a good theory.<br>
><br>
> In the (informal) models behind the Registry, there's a number of<br>
> person-like entities -- creator, publisher, contact, contributor.<br>
> When the models were written, it seemed wise to model each entity in<br>
> an ad-hoc manner, such that, for instance, only a contact can have a<br>
> telephone number, and only creators and contacts have logos.<br>
><br>
> This made total sense at the time -- after all, what would a client<br>
> do with a logo of a contributor? But as soon as you start doing<br>
> something unforseen -- like perhaps a relational mapping -- all these<br>
> well-meaning shortcuts tend to make things complicated, and after a<br>
> while you end up introducing extra code ("business logic") that has<br>
> nothing to do with the world but is only needed because of the little<br>
> shortcuts you took in your model.<br>
><br>
> I'm not sure I follow you here. I certainly am not suggesting that we flatten<br>
> required structure. I'm asking if we need to add unrequired structure.<br>
<br>
</div></div>My answer to this would be that it depends on what aspects of the structure you are not adding.<br>
I believe that a concept that in a full-fledged model would have to be modeled as an object type (or maybe multiple object types!), should be represented as such also in a model that may not require all its details. E.g. an Author should really be modeled as a kind of "role" played by some identified Person in the creation of a Publication. The Author role has special features (e.g. affiliation) and should itself be modelled as an object type, with a reference to Person. See nr 1 in attached "picture" for a simple model.<br>
Note that by using this pattern we can formally query for all the Publications a particular person was author on. For only by modelling Person as an Object Type can we explicitly identify instances independent of their name. In a model that only uses their name as a string, or element in a string array, this is not possible.<br>
Now in some model you may not require all possible features of Person, only the fact that (s)he has a name. So (if this is the first time you encounter the Person concept,) you might create a restricted version of that type, say without email or phone. But the fact that the concept exists and corresponds to an object type, i.e. instances are identified explicit should not be removed from even the basic model. For by "normalizing" Person as an object type on its own it can be reused in the proper manner. See the second part of the model which use the same pattern for associating Persons as Contributors on some Resource. Note that the contributor has a label that indicates the particular role played by the person.<br>
<br>
And we might even extend this model for Person further by interpreting it as a special Party, another subtype of which is Organization (see part three in the picture). You *might* wish to introduce a Role supertype (though I would not do that), but a separate "party" model like this could easily be reused in many other efforts in the IVOA.<br>
<span class=""><br>
> The examples for this case are from the DatasetMetadata document,<br>
> the content of which is heavily rooted in the Resource Metadata document.<br>
><br>
> The DatasetMetadata model also has these person-like entities -- creator,<br>
> publisher, contact..<br>
> creator and publisher are modeled as simple strings, contact is an object<br>
> since there is a requirement for more structure (name, email). It (contact)<br>
> does NOT contain other elements which are listed in the RM document,<br>
> (address, telephone) since we have no use-cases where those elements<br>
> are needed.<br>
<br>
> No one has commented that the simplification of these entities to a string is a<br>
> problem. These elements have a 0..1 multiplicity, so are valid vo-dml.<br>
> So, why then, should it be a problem for 'Collection' to be simplified to a<br>
> single string.. simply because there can be more than one?<br>
<br>
</span>When I saw the original XML schemas for the registry I *did* actually believe I would have modelled this differently. I also believe the registry schemas (being XML schemas, not VO-DML) are implementation specific and have been based on implementation specific considerations. In the Simulation data model, which has a Resource super type, we do have Contact and Party properly normalized.<br>
<br>
In fact, a particular problem for XML schemas is that mapping VO-DML/References is not trivial. If the referenced object resides in the same document you can use ID/IDREFs, but in general this may not be the case and then it is hard to represent these properly. In VO-URP Laurent (Bourges) and I spent lots of time dealing with this. There we represented our VO-DML-like data model in a relational database, within which References are trivially mapped to foreign keys. Mapping the corresponding instances to XML documents required a custom solution (which we based on utypes!) to represent references to objects not serialized with the referencing objects.<br>
<span class=""><br>
<br>
><br>
> For the record, I am OK with the resolution being that all of these person-like<br>
> entitites should be modeled more formally (whether the structure is required<br>
> or not), both the single and multiple relations. In which case, this group<br>
> of conflicts with vo-dml multiplicity would go away. However, without that sort<br>
> of comment/decision being posted to the DatasetMetadata document, they<br>
> exist.<br>
><br>
</span>Ok, so let's do that :)<br>
I promise you that very similar things can be said for most if not all other examples.<br>
<span class=""><br>
><br>
><br>
><br>
><br>
> So, I'd pose the question: Is it cost effective *not* to do the<br>
> proper modelling? My suspicion is: probably almost never. Perhaps<br>
> your XML may look a bit more complicated when you actually model<br>
> your<br>
> universe of discourse rather than some guess at what your model might<br>
> be used for in the future, but that's more than weighed up by it<br>
> being more regular and thus easier to handle with actual code.<br>
><br>
><br>
> > And this is the point. I/we understand that we are not doing<br>
> > justice to the "Contributor" concept (for example), but the<br>
> > simple string array attribute is easier to interpret and serves<br>
> > the requirements.<br>
><br>
> Well, it *may* be easier to interpret for humans, but for programmers<br>
> and thus machines such shortcuts usually turn out to be more work.<br>
><br>
<br>
</span>I agree in general. And I think that this also touches the core of what VO-DML is supposed to be used for, which is allowing annotating VOTable(s) (and through them for example trivially TAP Services) with metadata that explains what type of objects are stored in them. For this primary use case in particular the more explicit a model defines its concepts the more expressive such an annotation can be. Particular if we want to be able to support the variety of representations we may encounter in reality, we should try to stay away from making assumption on how the model is to be used.<br>
<br>
Cheers<br>
<span class="HOEnZb"><font color="#888888">Gerard<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
<br>
><br>
> > 2) Polynomial<br>
> > My initial reaction is "what you describe is an entirely<br>
> > different goal". The STC Transform model is not trying to<br>
> > model the mathematical operations/functions. It is<br>
><br>
> Why not? It would seem to me that WCS, which has been reasonably<br>
> successful at what you're trying to do, is doing exactly that, no?<br>
><br>
> > encapsulating the user-provided specifications required by<br>
> > those operations. The Axis, for example, is not part of the<br>
><br>
> I'm not sure I follow the difference -- are you saying the actual<br>
> expressions are expected to be opaque to the model?<br>
><br>
> Yes, and I think that is consistent with what WCS provides..<br>
> The expression for generating a Tangent plane projection is opaque,<br>
> one merely knows that it is TAN, the projection point, and corresponding pixel,<br>
> (ie: encapsulation of the interface).<br>
><br>
<br>
><br>
> Mark<br>
><br>
<br>
</div></div></blockquote></div><br></div>