MIVOT: fully qualified attribute names

Wed Mar 8 17:54:15 CET 2023

Paul,

I believe much of this conversation fits in work we had crystallized in a
number of documents over the past 10 years, starting with requirements we
collected from the IVOA community regarding data models and their
serializations.

The old draft of the Mapping VODML to VOTABLE spec linked below lists a
number of requirements and use cases for the serializations. See in
particular sections 2.3.4 and 2.4. Some of the design decisions we are
discussing come from those very explicit and detailed requirements. My
experience tells me that there may be different ways of fulfilling those
requirements, however what often risks getting lost in these conversations
is the need to strike the balance among a number of competing requirements
from the perspective of different potential client and server
implementations. I can argue for a simplification that makes sense, but in
doing so I might be blocking significant use cases that would require
additional complexity. I don't think any single person can strike that
balance, which is why in the past we strived to document as much of the
rationale for a specific design as possible, as well as attempting a very
formal approach to the mapping of VODML to other meta-models like VOTABLE,
but also OOP and Relational Schemata, with an eye on the Object-Relational
mistmatch. Much of that thoroughness was ditched in subsequent approaches
that focused on alleged simplicity.

My point is that the answer to some of your comments may depend on the kind
of client we are implementing and how we want to use the data. For example,
a client might be interested in all the instances that contain
std:Type.attribute1 and std:Type.attribute2, irrespective of the instance
type they belong to, duck-typing style, confident that the semantics of
those attributes are exactly what they are supposed to be according to the
std: data model. How does a mapping specification enable that while also
allowing a client who couldn't care less about specific types to provide a
dictionary-like, weakly typed, human readable representation of the same
instance? Note that in neither case there's any just-in-time parsing of the
model xml documents. In the former case the data model is assumed and maybe
even hard coded in the client's implementation, in the other it is
completely ignored. Other use cases might include a pre-parsing of the
model xml, or even a just-in-time parsing.

If one wants to leave the door open for additional usages of the
serialization, using the full vodm-id provides a future proof solution for
a little price in terms of complexity.

https://volute.g-vo.org/svn/trunk/projects/dm/vo-dml-mapping/doc/VO-DML_mapping_WD.pdf

On Wed, Mar 8, 2023 at 6:20 AM Paul Harrison <paul.harrison at manchester.ac.uk>
wrote:

>
>
> On 7 Mar 2023, at 22:52, Laurino, Omar <olaurino at cfa.harvard.edu> wrote:
>
> I really hope I didn't miss some connections in this long thread, but...
>
>
>> The real point is that the VODML-ID "coords:TimeFrame.timescale” could be
>> "coords:a.b" according to the VO-DML standard - there is no connection
>> between the vodml-id and the name of the model element as defined in the
>> standard - I want to make the connection, and once the connection is made,
>> the VODML-ID is redundant as it can be generated from the model structure.
>>
>
> A change could be made to the VODML spec to make the vodml-id generation a
> requirement rather than a preference, by promoting Appendix C to normative
> state. And while I remember believing that both approaches (full vodml-id
> or just name) would work, as long as the mapping provides enough markup to
> make the references unambiguous, I did have a preference for the full
> vodml-id for two reasons: 1. because explicit is better than implicit and
> 2. because it is more future-proof.
>
>
> I had not noticed Appendix C - so yes I would support making that
> normative, and moving it out of the appendix - that does at least part of
> what I am arguing for. In fact I am more concerned by this happening rather
> than my slightly more controversial desire to remove VODML-IDs….they would
> just be repeated information then.
>
> If I understand Paul's point correctly, I'd like to point out that the
> reason for having the entire vodml-id was to make sure that a model's
> element could always be identified unambiguously in any context, in
> particular when extending models. VODML allows data providers to extend a
> type (section 4.6.1). When they do, parsers need a way to identify fields
> in an unambiguous way, which includes mapping them to the model document
> where they are defined.
>
> In that sense, the vodml-id becomes redundant not only if one makes the
> connection with the name, but also if a mapping scheme defines a way to
> represent extensions that provides that unambiguous mechanism. If an
> instance is of type <custom:MyType> (which extends standard:Type), one
> would have attributes identified by <custom:MyType.myAttribute> and
> <standard:Type.attribute> within that instance, which the parser could map
> to the respective definitions without having to rely on any heuristics or
> complex logic.
>
> If one has a <custom:MyType> instance with attributes <myAttribute> and
> <attribute> the parser wouldn't really know where to look them up unless
> the connection between <custom:MyType> and <standard:Type> is made explicit
> in the serialization markup. And even in that case, since the parser
> doesn't know whether myAttribute is defined in custom: or in standard:
> it'll have to try both.
>
>
> Agreed that you would have to traverse the hierarchy, but I am not
> convinced that much value is obtained from having a data model
> representation unless you are prepared to do that. Even if you have some
> code that does a simple switch-case-statement-like string match on
> VODML-IDs to do “stuff” for a standard:Type, how are you going to arrange
> things for the custom:MyType? It quickly gets messy if you just keep adding
> to the monster “global” switch statement for all the possibilities. It is
> actually not impossible that you might want to do something different for
> <attribute> when it appears in custom:MyType compared with when it appears
> in standard:Type.
>
>
> People have argued in the past that inheritance requires parsers to have
> complex type algebra, which may be true depending on the use case and of
> the mapping strategy. However, extensibility was one of the main
> requirements for VODML. A mapping strategy can minimize that effort by
> identifying an instance as both custom:MyType and standard:Type. And since
> we recommend vodml-ids to be generated algorithmically, a parser could
> decide to ignore model definitions completely, and parse the vodml-ids to
> display the attribute names, which would be human-readable. Other parsers
> would be interested in the unambiguous identification of attributes to
> provide richer context-dependent features to client software.
>
> I think that identifying attributes from an inheritance hierarchy would
> only become “difficult” (i.e. requiring more than the name) if VO-DML
> allowed multiple inheritance. However it does not, so I do not think that
> the extra naming is necessary if there is a rule that attribute names have
> to be unique within the parents of the hierarchy - the only case where
> overrides can occur is explicitly dealt with by subsetting.
>
>
> A reference to a full vodml-id is always going to unambiguously identify a
> single element, like a URI. I can go from custom:MyType.myAttribute to
> myAttribute and from standard:Type.attribute to attribute, but I can't go
> from myAttribute to custom:MyType.myAttribute without some effort parsing
> definition documents.
>
> So here I concede that more parsing of the definition documents would be
> needed without the VODML-ID - however given that the definition documents
> are XML it is necessary to do more than simple string matching to have any
> sort of robustness, so the documents would need to read as DOM at a minimum
> and then traversing the hierarchy looking for a particular element from a
> VODML-REF is not so much more effort. If appendix C were mandatory, then my
> objection to VODML-IDs is on the DRY principle. To create an in-memory
> index for referencing the various elements, the key could be formed either
> by just reading the VODML-ID or by constructing it from the hierarchy path.
>
> Paul.
>
>
>
>

-- 

Omar Laurino (he/him)

Smithsonian Astrophysical Observatory

Center for Astrophysics | Harvard & Smithsonian

Office: (617) 495-7227

100 Acorn Park Dr. R-377 | MS 81 | Cambridge, MA 02140

cfa.harvard.edu | Facebook
<https://www.facebook.com/CenterForAstrophysicsHarvardSmithsonian/> |
Twitter <https://twitter.com/CenterForAstro> | YouTube
<https://www.youtube.com/channel/UC-UUo6Y7fP0N41Qw7KcKtcQ> | Newsletter
<https://harvard.us14.list-manage.com/subscribe/post?u=13f357b8637e4a05e4a5d2845&id=c6100e9a6c>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20230308/3eb5a071/attachment-0001.htm>