Provide VOTable with metadata tracing Origin.

Fri Feb 16 17:33:16 CET 2024

Hi Gilles,

Thanks for your feedback.

On Fri, Feb 16, 2024 at 04:27:31PM +0100, gilles landais wrote:
> with ExportRequested, you specify that the data are available to be
> harvested and you put a date on it.
> I'm a little confused with this date which completes the existing date
> "Created" , "Updated" already available in the VOResource curation.
> Why to separate "Updated" and "ExportRequested" dates?

Well, that's because "most" (in some sense) VO resources should, in
the view of the note, not turn up in bibliographic services.  That,
in turn, is because I believe that the default mode for citing data
should be citing the paper the data comes from; we shouldn't require
two citations, and it's bad if people are unsure what to cite.

So, indicating that something ought to be picked up by bibliographic
services after all (presumably because it is not linked 1:1 to
something that's already in them) simply is a different thing from
marking it as updating.

Incidentally, Updated refers to the *data*, not the metadata, which
ExportRequested is relevant for.  The date the metadata was updated
is in the record's @updated attribute (in RegTAP, that's
rr.resource.updated).

> Isn't there a risk of desynchronization between the "Updated" date (after a
> modification like to append an author list) and the "ExportRequested" date.

Well, *if* we really want to enable incremental harvesting of these
records, we would, indeed, have to say that metadata updates should
result in updates to ExportRequested, too.

For now, I'd say we should wait for implemenation feedback from ADS.
If they think incremental harvests are a good idea for this kind of
thing, let's think again for a sane prescription.  It's not terribly
hard, either: you'd select rr.resource.updated (which contains a date
the *metadata* was updated) for all records with an ExportRequested
res_date and compare it with the value of updated in your local
resource.

> * Making VO Resources Citable -
>
> This suggests that datasets would be included in bibliographic services,
> with their metadata coming from the registry.

Waitwait... the data *sets* are a different matter, and for them only
the endpoints would come from the registry, *not* any metadata
itself.  So, I'd say this:

> Then it becomes important that the bibliographic service is synchronized
> with the "update" happening in the registry.
> Adding dataset inputs is great, Would it be possible (in a next step) to
> extend the article-dataset links to link a dataset to an other dataset?

refers to section 3.  If that is indeed what you were thinking of:
Well... where would ADS display those?  ADS (as the Registry) does
not have a notion of datasets as such (though their terminology is a
bit different).  So, there is no "landing page" or a similar thing
for a dataset.

If you, as a biblink-harvest operator, want to mark up
dataset-dataset relationships, show them in whatever pages you
produce for the cardinality >1 links.

> * About the biblink-harvest endpoint, you propose a service that provide
> JSON in output and you specify also that "link record can be interpreted as
> an RDF triple".
> Well ! do you have also a plan to include RDF in the serialization ?  For
> instance, a RDF exploited with JSON-LD header?

Well... I have of course thought about specifying some proper RDF
here (where I'd frankly choose turtle as the hands-down most readable
RDF serialisation).  But then I couldn't find a scenario in which the
resulting complication would buy us anything.  When I couldn't find
one, I opted for plain and simple custom JSON.

Do you have a scenario in mind where proper RDF would open up a use
case that's noticeably harder with our custom JSON?

       -- Markus