[VEP-003]: datalink/core#sibling
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Tue Jan 7 10:45:59 CET 2020
Hi François,
On Fri, Dec 20, 2019 at 05:00:57PM +0100, François Bonnarel wrote:
> Le 20/12/2019 à 08:34, Markus Demleitner a écrit :
> > New Term: sibling
> > Action: Addition
> > Label: Sibling Data
> > Description: Data products derived from the same progenitor as #this.
> > This could be a lightcure for an object catalog derived from repeated
> > observations, the dataset processed using a different pipeline, or the
> > like.
> If I compare this to the initial VEP-001 "associated-data" proposal
> and to the use case exposed in the other thread I wonder if
> "sibling" is the right word. I'm not sure we can always identify a
> common progenitor for what I called the "Main" and what I called
> the "Target" (see the other thread for what I mean there) in the
> use cases VEP-001 was supposed to solve.
Can you describe the cases where you can't see the common progenitor?
Perhaps that would help us work out if
(a) #sibling to to special and needs to be generalised
(b) #sibling is useful and at the right level of generalisation, but
a second term is requried for something related but not quite
identical, or
(c) #sibling isn't useful at all and should be replaced by somthing
else.
> That's why instead of "associated_data" or "sibling" I proposed
> "Observation_Result_of_source".
Hm... I have to say I don't like it. Why? Well, datalink/core is a
vocabulary of properties, i.e., of things that in a simple
subject-predicate-object sentence work as predicates (with a minimum
of embellishment). A datalink response row with columns ID,
semantics, and access_url thus expands to
<ID> has-a-<semantics> <access_url>
as in
<ivo://example.edu/data?a/b/c> has-a-preview <http://example.edu/prev/a/b/c>
As there's little that's as practical as a good theory, I'd like to
try really hard to make sure that new terms match that pattern. And,
well,
X has-an-observation-result-of-source Y
is at least severely counter-intuitive.
I think what you're implicitly trying to do here is change the domain
of the datalink predicates, i.e., change what set X can be drawn
from. So far, since ID in Datalink columns is a publisher dataset
identifier, it was implicit that all datalink/core properties had the
set of datasets (as defined by SSAP, say) as domain.
If I understand your intent correctly, then appending -of-source to
the term tries to change this at least for this term to say "well,
this term's domain isn't datasets at all, it's 'sources'". I think
that goes far beyond the question of how to name or define a single
term; this is a large change in how clients should interpret datalink
results, and, indeed, it's a large change in what dataset identifiers
are supposed to mean.
Frankly, that's all a bit unnerving to me -- I mean, perhaps it's a
good idea to assign ivoids to "sources", but I'd rather wait with
that until we have defined what we think a source is (i.e., probably
the definition or a source DM).
Luckily, I think for what triggered VEP-001 and VEP-003 -- linking
gaia_source table rows to Gaia spectra and time series -- we don't
need to go all that profound. A Gaia catalogue row may relate to a
source in a sense that we will have to make more precise in a source
DM, but it very certainly works just fine as a dataset. Not a large
one, but still a dataset, complete with a pubDID.
This dataset is derived from a set of observations which also yielded
epoch photometry, RP/BP spectra, etc. And in that sense at least for
this use case it seems #sibling is exactly on point.
Which of doesn't mean I'm claiming we're done already; as I said
above: if you have different cases, it might well be that using some
other concept might work better in the long run, which is why I was
asking for them. Let's just make sure we don't needlessly blur what
datalink rows actually mean, because that's going to hurt all clients
down the line: Computers are bad at guessing.
-- Markus
More information about the dal
mailing list