Datalink vocabulary extension: sibling/co-generated

Mon May 11 11:37:53 CEST 2020

Hi Paul,

On Mon, May 11, 2020 at 09:09:27AM +0000, Paul Harrison wrote:
> > On 2020-05 -07, at 10:58, Markus Demleitner <msdemlei at ari.uni-heidelberg.de> wrote:
> > for #co-generated?  And how much would you dislike it? [as in: enough
> > to block the VEP?  Because then I'd probably save the time until
> > we're closer to consensus or someone else goes ahead with
> > #co-generated]
> 
> It strikes me that as finding the correct term is proving so
> difficult then it is indeed the description/concept that is the
> problem, and perhaps it should be dropped entirely - I get the

Hmmm, no, frankly, in this case I am by now pretty much convinced
we've identified a useful concept, and people in the debate haven't
really been contesting it for quite some time now.  I think by now
it's really about "choose a word that gives people who don't bother
to read the description the right idea".

> I always like to keep things simple and imagine what a fairly dumb

Sure -- but keeping things as simple as possible is difficult not
only because you have to fend off feature creep but also because
making it simpler than possible is another danger.  And...

> client is going to do with the information to further process it
> rather than what a human looking at the pointed to information can
> do, and I generally think that in many cases you cannot really do
> better than “related” - Markus had said something similar earlier
> in the thread

..."related" is just too simple.  If it weren't related, it wouldn't
be in the datalink document in the first place.

For the datalink vocabulary, the central question I think is "what
would a reasonably smart client do with it?".  And I've argued for a
long time: Organise the datalink matches in a tree, preferably in a
consistent order so users' muscle memory can kick in.  

In that way, users don't see 200 links in the pop up on a
datalink-enabled result but something like

  Dataset
  Preview
  Derived data > 
  Progenitors and Calibration data >
  ...

-- and then you can follow the non-atomic items down in sub-menus.

That way, people get to access the most frquently used stuff in a
common place, and you can still rather predictably place even fairly
obscure stuff.

And while you're right, we're categorising along provenance relations
here, I think that absolutely makes sense for datalink (which is why
we've had #progenitor and #derivation from day one) because it
intersects with what you'd like to do with the data:

Debugging? -> #progenitor [wait for my VEP pushing #calibration below
  #progenitor]
Analyses other people have made? -> #derivation
Other things people have done with the observation? -> #sibling/#co-derived

What's still missing is "Other things I might like to know
about...<ugh>", as in, perhaps "HST spectra taken for items shown on
this image".   I think that's, indeed, a tough one because you'd have
to say what <ugh> is, and datalink doesn't really have a way to have
<ugh> anything but "the thing you've just discovered", and that
absolutely does not cover the use case I've made up above (where
<ugh> is "objects in this image" rather than "this image").

Given that, for now I'd like to agree with you on:

> Because linking is a very powerful tool, there is a tendency to
> want to put in lots of links,  but can you be sure that you are
> presenting a “complete” set of links. The IVOA has a whole lot of
> “discovery” protocols which are in competition with these “hard”

I hope as people get used to things like the Aladin 10's discovery
tree it'll be more natural to them to look for the HST spectra (or
any spectra at all) using a generic client, rather than get them
pre-packaged from the service operator.

         -- Markus

(perhaps getting carried away a bit in that last paragraph...)