MIVOT: fully qualified attribute names

Wed Mar 8 18:21:08 CET 2023

btw I wanted to bump up this comment Markus made earlier in this thread,
because while we have different views on whether to include the fully
qualified ID or not, his comment does perfectly frame the problem, and in
fact in years-old conversations about the mapping we did have <dmtype> as
an element for exactly the reason Markus points out, which shows why
simplicity without a rationale in terms of design and requirements runs the
risk of falling short:

No, where I think the idea of the fully qualified attribute names
> came from was to lessen the pains of inheritance.  Say, a client
> doesn't know about RealQuantity but just about Quantity.  When we
> fully qualify the attribute names, it still can find attributes that
> are already part of Quantity.

> I will not speculate on whether or not that's a useful thing to do --
> as I said, we have far too little adoption of the whole thing to make
> a call.  But *if* we actually want to enable this kind of thing, we
> could go back to the proposal of annotating the class hierarchy used
> in the instance, perhaps by making dmtype an element rather than an
> attribute; that would still be a lot easier on annotators than the
> fully qualified attribute names.

On Wed, Mar 8, 2023 at 11:54 AM Laurino, Omar <olaurino at cfa.harvard.edu>
wrote:

> Paul,
>
> I believe much of this conversation fits in work we had crystallized in a
> number of documents over the past 10 years, starting with requirements we
> collected from the IVOA community regarding data models and their
> serializations.
>
> The old draft of the Mapping VODML to VOTABLE spec linked below lists a
> number of requirements and use cases for the serializations. See in
> particular sections 2.3.4 and 2.4. Some of the design decisions we are
> discussing come from those very explicit and detailed requirements. My
> experience tells me that there may be different ways of fulfilling those
> requirements, however what often risks getting lost in these conversations
> is the need to strike the balance among a number of competing requirements
> from the perspective of different potential client and server
> implementations. I can argue for a simplification that makes sense, but in
> doing so I might be blocking significant use cases that would require
> additional complexity. I don't think any single person can strike that
> balance, which is why in the past we strived to document as much of the
> rationale for a specific design as possible, as well as attempting a very
> formal approach to the mapping of VODML to other meta-models like VOTABLE,
> but also OOP and Relational Schemata, with an eye on the Object-Relational
> mistmatch. Much of that thoroughness was ditched in subsequent approaches
> that focused on alleged simplicity.
>
> My point is that the answer to some of your comments may depend on the
> kind of client we are implementing and how we want to use the data. For
> example, a client might be interested in all the instances that contain
> std:Type.attribute1 and std:Type.attribute2, irrespective of the instance
> type they belong to, duck-typing style, confident that the semantics of
> those attributes are exactly what they are supposed to be according to the
> std: data model. How does a mapping specification enable that while also
> allowing a client who couldn't care less about specific types to provide a
> dictionary-like, weakly typed, human readable representation of the same
> instance? Note that in neither case there's any just-in-time parsing of the
> model xml documents. In the former case the data model is assumed and maybe
> even hard coded in the client's implementation, in the other it is
> completely ignored. Other use cases might include a pre-parsing of the
> model xml, or even a just-in-time parsing.
>
> If one wants to leave the door open for additional usages of the
> serialization, using the full vodm-id provides a future proof solution for
> a little price in terms of complexity.
>
>
> https://volute.g-vo.org/svn/trunk/projects/dm/vo-dml-mapping/doc/VO-DML_mapping_WD.pdf
>
> On Wed, Mar 8, 2023 at 6:20 AM Paul Harrison <
> paul.harrison at manchester.ac.uk> wrote:
>
>>
>>
>> On 7 Mar 2023, at 22:52, Laurino, Omar <olaurino at cfa.harvard.edu> wrote:
>>
>> I really hope I didn't miss some connections in this long thread, but...
>>
>>
>>> The real point is that the VODML-ID "coords:TimeFrame.timescale” could
>>> be "coords:a.b" according to the VO-DML standard - there is no connection
>>> between the vodml-id and the name of the model element as defined in the
>>> standard - I want to make the connection, and once the connection is made,
>>> the VODML-ID is redundant as it can be generated from the model structure.
>>>
>>
>> A change could be made to the VODML spec to make the vodml-id generation
>> a requirement rather than a preference, by promoting Appendix C to
>> normative state. And while I remember believing that both approaches (full
>> vodml-id or just name) would work, as long as the mapping provides enough
>> markup to make the references unambiguous, I did have a preference for the
>> full vodml-id for two reasons: 1. because explicit is better than implicit
>> and 2. because it is more future-proof.
>>
>>
>> I had not noticed Appendix C - so yes I would support making that
>> normative, and moving it out of the appendix - that does at least part of
>> what I am arguing for. In fact I am more concerned by this happening rather
>> than my slightly more controversial desire to remove VODML-IDs….they would
>> just be repeated information then.
>>
>> If I understand Paul's point correctly, I'd like to point out that the
>> reason for having the entire vodml-id was to make sure that a model's
>> element could always be identified unambiguously in any context, in
>> particular when extending models. VODML allows data providers to extend a
>> type (section 4.6.1). When they do, parsers need a way to identify fields
>> in an unambiguous way, which includes mapping them to the model document
>> where they are defined.
>>
>> In that sense, the vodml-id becomes redundant not only if one makes the
>> connection with the name, but also if a mapping scheme defines a way to
>> represent extensions that provides that unambiguous mechanism. If an
>> instance is of type <custom:MyType> (which extends standard:Type), one
>> would have attributes identified by <custom:MyType.myAttribute> and
>> <standard:Type.attribute> within that instance, which the parser could map
>> to the respective definitions without having to rely on any heuristics or
>> complex logic.
>>
>> If one has a <custom:MyType> instance with attributes <myAttribute> and
>> <attribute> the parser wouldn't really know where to look them up unless
>> the connection between <custom:MyType> and <standard:Type> is made explicit
>> in the serialization markup. And even in that case, since the parser
>> doesn't know whether myAttribute is defined in custom: or in standard:
>> it'll have to try both.
>>
>>
>> Agreed that you would have to traverse the hierarchy, but I am not
>> convinced that much value is obtained from having a data model
>> representation unless you are prepared to do that. Even if you have some
>> code that does a simple switch-case-statement-like string match on
>> VODML-IDs to do “stuff” for a standard:Type, how are you going to arrange
>> things for the custom:MyType? It quickly gets messy if you just keep adding
>> to the monster “global” switch statement for all the possibilities. It is
>> actually not impossible that you might want to do something different for
>> <attribute> when it appears in custom:MyType compared with when it appears
>> in standard:Type.
>>
>>
>> People have argued in the past that inheritance requires parsers to have
>> complex type algebra, which may be true depending on the use case and of
>> the mapping strategy. However, extensibility was one of the main
>> requirements for VODML. A mapping strategy can minimize that effort by
>> identifying an instance as both custom:MyType and standard:Type. And since
>> we recommend vodml-ids to be generated algorithmically, a parser could
>> decide to ignore model definitions completely, and parse the vodml-ids to
>> display the attribute names, which would be human-readable. Other parsers
>> would be interested in the unambiguous identification of attributes to
>> provide richer context-dependent features to client software.
>>
>> I think that identifying attributes from an inheritance hierarchy would
>> only become “difficult” (i.e. requiring more than the name) if VO-DML
>> allowed multiple inheritance. However it does not, so I do not think that
>> the extra naming is necessary if there is a rule that attribute names have
>> to be unique within the parents of the hierarchy - the only case where
>> overrides can occur is explicitly dealt with by subsetting.
>>
>>
>> A reference to a full vodml-id is always going to unambiguously identify
>> a single element, like a URI. I can go from custom:MyType.myAttribute to
>> myAttribute and from standard:Type.attribute to attribute, but I can't go
>> from myAttribute to custom:MyType.myAttribute without some effort parsing
>> definition documents.
>>
>> So here I concede that more parsing of the definition documents would be
>> needed without the VODML-ID - however given that the definition documents
>> are XML it is necessary to do more than simple string matching to have any
>> sort of robustness, so the documents would need to read as DOM at a minimum
>> and then traversing the hierarchy looking for a particular element from a
>> VODML-REF is not so much more effort. If appendix C were mandatory, then my
>> objection to VODML-IDs is on the DRY principle. To create an in-memory
>> index for referencing the various elements, the key could be formed either
>> by just reading the VODML-ID or by constructing it from the hierarchy path.
>>
>> Paul.
>>
>>
>>
>>
>
> --
>
> Omar Laurino (he/him)
>
> Smithsonian Astrophysical Observatory
>
> Center for Astrophysics | Harvard & Smithsonian
>
> Office: (617) 495-7227
>
> 100 Acorn Park Dr. R-377 | MS 81 | Cambridge, MA 02140
>
>
> cfa.harvard.edu | Facebook
> <https://www.facebook.com/CenterForAstrophysicsHarvardSmithsonian/> |
> Twitter <https://twitter.com/CenterForAstro> | YouTube
> <https://www.youtube.com/channel/UC-UUo6Y7fP0N41Qw7KcKtcQ> | Newsletter
> <https://harvard.us14.list-manage.com/subscribe/post?u=13f357b8637e4a05e4a5d2845&id=c6100e9a6c>
>
>

-- 

Omar Laurino (he/him)

Smithsonian Astrophysical Observatory

Center for Astrophysics | Harvard & Smithsonian

Office: (617) 495-7227

100 Acorn Park Dr. R-377 | MS 81 | Cambridge, MA 02140

cfa.harvard.edu | Facebook
<https://www.facebook.com/CenterForAstrophysicsHarvardSmithsonian/> |
Twitter <https://twitter.com/CenterForAstro> | YouTube
<https://www.youtube.com/channel/UC-UUo6Y7fP0N41Qw7KcKtcQ> | Newsletter
<https://harvard.us14.list-manage.com/subscribe/post?u=13f357b8637e4a05e4a5d2845&id=c6100e9a6c>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20230308/0db64811/attachment-0001.htm>