ObsCore Time extension : first version of working draft issued

Fri Jul 19 18:59:30 CEST 2024

Hi Mireille,

On Thu, Jul 18, 2024 at 07:16:29PM +0200, Mireille LOUYS via dm wrote:
> I posted the first version of the Time extension proposed  for ObsCore /
> ObsTAP services .
>
> This wiki page summarizes the access to the document and possibilities for
> comments .

At this early time I think discussion on the mailing list ought to
work better.  Oh, and I've also put in a couple of hopefully
uncontroversial editorial changes into
<https://github.com/ivoa-std/ObscoreTimeExtension/pull/28>.

For the rest, here is a collection of issues that I think need at
least some thought:

(1) Given section 2, I am not sure this should be called "ObsCore
Metadata Extension for Time Properties", because you are actually
talking about metadata you would need in datasets.  I think that's
laudable, but it's also pretty hard given we don't even have a
uniform container format.  Perhaps sect. 2 should be an appendix and
then concretely discuss PARAMs VOTable-serialised time series should
have, complete with utypes and all?

(2) Table 2: time scales should be taken from http://www.ivoa.net/rdf/timescale

(3) Table 2: reference positions should be taken from
http://www.ivoa.net/rdf/refposition

(4) Table 2: "t_uncertainty -- Resolution or uncertainty of the time
stamps"  I think that needs a lot more explanation given we already
have t_resolution in obscore itself.  If this is saying like "mean
error of the timestamps within the dataset" I'd say: "is there really
a discovery use case that requires that piece of metadata?".

(5) Table 2: I would drop the part of the discussion of t_sys_error
starting "Approximately 100s is good for the time\_scale since" --
people either can guess that much or they would need more
explanations.  If we want that discussion, it should be outside of
that table.

(6) Table 2: t_format -- that's supposed to be "within the dataset"?
Is there a discovery use case for that?  You see, I'd assume that if
I find some interesting piece of data, having to translate from "ISO"
to MJD would not keep me back :-).

(7) Table 2: t_offset -- I am *very* sure there is no discovery case
for that.  I give you data sets will have to specify something like
this if the timestamps are not in one of the standard representations
(JD, MJD, DALI-timestamp), but storing that in the obscore table
seems very wrong to me.

(8) Table 2: t_description -- I'm not arguing against dropping some
free text here; you should just make sure that people know that they
are not supposed to use that in constraints and, really, never tell a
(non general-AI) machine to touch it.  Relying on freetext in data
discovery is the end of interoperability.

Also, I don't think the name is quite right.  This isn't a
description of the time coordinate or the time axis or anything else
related to time at all.  This is a text description of the
observable.  What that means for the column name prefix... I'm not
competent to decide.  I'd just feel that "t_" would be really
confusing.  Having o_description in analogy to o_ucd would sound
good, but then that, I'd argue, would need to sit in obscore itself.
Perhaps that's what we should do: Tell people to add o_description to
ivoa.obscore until we'll specify it in obscore 1.2.?

(9) Table 3: What vocabulary is the "Field" used here from?  I think
things were cleaner if you dropped the Field column.  Oh, and IMHO
prettier if you dropped the horizonal rules.

(10) p. 10 -- I like the "time_" prefix on time_variant.  Perhaps we
have these extensions such that their columns all have a uniform
prefix?  Obs-radio would have radio_ then; that way, it would be
certain that different extensions will not use the same names, and
that will make NATURAL JOIN-s a lot safer.

(11) p. 10 f -- time_variant itself I don't particularly like.  Table
4 suggests that the dataproduct_type uniquely determines the
time_variant.  Why bother storing it then?  Table 4 seems easy to
include in any software that would need time_variant in the first
place.  Incidentally, if you do keep Table 4, please either use the
correct identifiers from the product-type vocabulary -- or (in this
case probably rather not) consistently the human-readable labels.

(12) p. 12 "Having it [time frame] as part of the query response coming
back for a search for time series would help the user application to
interpret time stamps precisely." -- mmmmh... I don't buy this, really.
We've tried to work around badly deficient container formats (CSV,
say) in SSAP to the general confusion of everyone.  No, I'm sure it's
better for *everyone* if datasets that *really* don't give the
minimal metadata (including reference position and hopefully
timescale) are fixed rather than pretending all is dandy because some
metadata is available *during discovery*.  The datasets outlive
discovery.  If they come without minimal metadata, everyone suffers.
So, please let's not do t_scale and t_ref_position in the obscore
table.

Well, also because: I don't think there is a discovery case for them.
If I want to use some data, I will certainly not drop it because I
will have to apply some light-time correction (I will need to know
*whether* I need to do it, but that's for the dataset to say, not
something relevant during discovery).

(13) p. 13 "When the sampling period, or cadence is even, t_delta_min ,
t_delta_max have the same value." -- should "even" be "constant" here?

(14) p. 13 While I can see discovery cases that want to distinguish
folded and unfolded time series, I can't see any for
t_fold_phaseReference; having the fold_period NULL and some value
otherwise seems prudent, but even that doesn't seem to have immediate
use cases beyond the boolean indication that you have folded time
series in either Enrique's list nor in the sample queries unless I'm
missing something.

(15) p. 14 -- ivoa.time-obscore is not a regular SQL identifier.  Since
radio started with obs_radio, and we have to change this anyway, why not
go for ivoa.obs_time?

(16) p. 14 "If they cannot be retrieved nor calculated from the data they
may be set to UNKNOWN."  -- no, please not.  SQL has proper null values
(NULL).  Please use that for missing metadata unless there is a really
striking and overarching reason why you would need a custom null
indicator.

(17) p. 14 "In an extended ObsTAP service the main ObsCore
table and the other extension tables must be gathered in a TAP_SCHEMA
with utype..." -- I don't really understand what you are saying here,
but I think I don't like it :-).  So... let me ask first what the
problem is that you want to solve with this regulation?

(18) p. 15f -- In general, I'm against prescribing UCDs.  This has been
nothing but trouble in past standards without giving any operational
advantage.  For some fields it is actually useful if data providers
can use different UCDs to give some extra (machine-readable)
information.  Let's have "UCD suggestion" in the respective table
columns and not make the UCDs validity criteria.

(19) Appendix A, the time scales: See above, please don't, there's
the vocabulary http://www.g-vo.org/rdf/timescale for that.

(20) Appendix B: similar, just with refposition.

(21) The use cases I find somewhat unsatisfying.  It would be a lot more
convincing if you gave an actual science case ("Are there photometric
time series around the location of a specific gamma ray burst in
space and time?") in each case.  The way things are, it is unclear
why there is the "slots>1000" constraint in the existing gamma ray
burst example.  On the pulsar example, why do we require exposure
times longer than 5 seconds?  On the MUSE example, we should at least
confess that at this point there is no way to guess or know the
'MUSE' string.  On the "using a specified Time system", you should
explain why anyone would want to do that.

(22) In general, *never* cite vocabularies like
https://www.ivoa.net/rdf/product-type/2024-03-22/product-type.html,
i.e., with versions and an extension.  These are only useful in the
rare cases in which there is a need to be version-sharp.  No, our
vocabularies have identifiers; these happen to resolve, but they are
still identfiers, and they are of the form
http://www.ivoa.net/ref/<vocabulary name>.  That's better to format
in text, it'll age well, and you can retrieve machine-readable
artefacts for these vocabularies in predictable ways (admittedly:
predictable for that rare species, the semantic web nerd).

Thanks,

          Markus