Datalink vocabulary extension: sibling/co-generated

Thu May 7 11:58:29 CEST 2020

Dear Lists,

On Wed, May 06, 2020 at 04:40:45PM -0700, Patrick Dowler wrote:
> with trees and graphs without confusing anyone. When I think of
> co-generated, that is a much tighter relationship: if it was kids, they
> would be twins, not just siblings :-)... with data it seems to mean

That is an interesting point that, I have to say, makes me feel a bit
less enthusiastic about #co-generated.

> instance also implying/saying "at the same time". So I'm pretty sure
> co-generated is a narrow term just because it seems quite
> specific/restrictive. Is it a narrower than sibling? I think sibling is
> just the "from the same input" because I think we are talking about
> siblings-by-provenance.

I suspect one could sensibly define #sibling as something like the
recursive closure of #co-generated (in some twisted way).  Which
would make it, in set-informed semantics, a wider term
(extension(#sibling) is a superset of extension(#co-generated)).

> My gut feeling is that vocabularies are easier to grow if one defines
> general broad terms and adds narrower terms later., when differentiation is
> needed.. the other way around (define a term and later on realise it is a
> narrower term for a new or existing broad term) is a kind of refactoring.
> That could be harmless in practice or it could imply a subtle change in
> meaning. Still, refactoring is also a normal kind of evolution so it isn't
> wrong to define a narrow term and add a broad parent later.

The process certainly admits that.

But: before I put in #co-generated into my service and thus make
François' proposal for VEP-004 valid and publishable: Pat, what
you're saying is you would not like the proposed description:

  Data products derived from the same progenitor as #this.  This
  could be a lightcurve for an object catalog derived from repeated
  observations, the dataset processed using a different pipeline, or
  the like.

for #co-generated?  And how much would you dislike it? [as in: enough
to block the VEP?  Because then I'd probably save the time until
we're closer to consensus or someone else goes ahead with
#co-generated]

Meanwhile, a process point: To avoid later confusion with running
numbers:

(a) Please try and have the Used-in: field ready (and the VEP
complete otherwise) before submitting them.

(b) Please do not assign VEP numbers yourself; as long as nobody uses
#counterpart, for instance, what François has sent around as VEP-005
it will not enter the VEP repo (assuming we play by the current WD's
rules).  Now, if another VEP comes in in the meantime, it will become
VEP-005, and people doing a web search and (hypothetically) ending up
in the mailing list archive will be confused when they see a
completely unrelated thing.

So, my request would be to submit to semantics@ without a running
number, just as "new VEP".

I also suspect it would be cool if there was a non-public way to
submit the requests, as people may worry about publicly making a fool
of themselves.  Which brings us back to the question if there should
be mail aliases for each WG/IG's char/vice-chair combo.  But that's
for another day.

Thanks for making it to here in these busy times,

         Markus