Generic FIELD/PARAM metadata items in VOTable
F.-X. Pineau
francois-xavier.pineau at astro.unistra.fr
Wed May 24 12:34:41 CEST 2023
Mark and All,
1 - General case of adding "arbitrary key/value pairs with a FIELD/PARAM"
Sorry if it as already been discussed (and let me know if it is dumb):
what about allowing additional non-VOTable-reserved attributes in the
FIELD/PARAM tags
(yes, it will break too restrictive existing parsers).
The fact that you (Mark) uses the "key/value pairs" term seems to
indicate that you have
in mind the uniqueness of keys (so it is compatible with XML tag
attributes), right?
If we serialize in JSON or TOML, we just get additional key/value pairs
in the FIELD/PARAM objects.
It seems pretty straightforward and elegant (no additional complexity
with sub-objects, ...).
The Rust VOTable parser support this since the beginning
https://github.com/cds-astro/cds-votable-rust/blob/81f8c481dca03f1766ab1d922c64e9726c29ef52/src/field.rs#L118-L120
and it will be even simpler if we assume values are only strings (like
you suggest).
2 - On the particular HEALPix example
VOTable represents tabular data, thus a very flat view on data.
I lack fantasy to imagine what to do from an HEALPix number without
knowing its order.
The order is an important piece of data going along with the HEALPix
indices.
When each row have a different order, it is natural to provide the order
in a separate column.
We do not consider the "order" column as a sub-column.
So, if all rows have the same order, it seems natural to me to provide
the info in a PARAM
(= column of constant value), thus at a table level (and not at a column
level).
From my point-of-view, it is the role of GROUPs or VODML or refs
(or other mechanism possibly complementing but letting the FIELD/PARAM
structure unchanged)
to introduce a hierarchy/logic/semantic(?) (beyond UCDs) in the set of
FIELDS/PARAMS.
> You could also end up with multiple PARAMs having the same name, but
> referring to different columns, but I don't think there is any rule
> against that.
I probably miss your point since I don't see this as problematic knowing
that
there is (at most) one ref per PARAM and the IDs are supposed to be unique.
(A good practice, though, should be to have unique FIELDs/PARAMs names).
Are you thinking of several columns sharing a same PARAM (that will have
to be duplicated)?
(I agree for the readability, but are VOTable made to be human readable?
And from a human point-of-view, the column name (e.g. 'hpx8') and the
DESCRIPTION
of the column should be enough to know the order).
Bonus: when serializing a VOTable in CSV, I tend to think that
PARAMs should be represented as columns containing constant values
(so even if the metadata is lost, we still have the PARAMs (redundant)
values in output).
What do you think?
fx
Le 23/05/2023 à 11:34, Mark Taylor a écrit :
> FX,
>
> I hadn't thought of that, it's definitely a possibility.
> The semantics of the various ref/ID linkages are rather under-documented
> in VOTable, so like the other options it would need to be written
> in the standard what the meaning of this construction would be.
> Compared to the other options it's less obvious to a human reader
> what's going on, but it's a bonus that it doesn't require any changes
> to the schema.
>
> One negative consideration is that legacy software (e.g. current
> version of STIL/STILTS/TOPCAT) would see such PARAMs, ignore the ref,
> and assume that this was table-level rather than column-level metadata -
> but the same might happen for option (1).
> You could also end up with multiple PARAMs having the same name, but
> referring to different columns, but I don't think there is any rule
> against that.
>
> Interested in other people's opinions.
>
> Mark
>
> On Mon, 22 May 2023, Francois-Xavier PINEAU wrote:
>
>> Hi Mark and all,
>>
>> Only considering the given example (so the following may be irrelevant), what
>> about something like:
>>
>> <PARAM name="healpix_order" value="8" ref="healpix_id"/>
>> <FIELD ID="healpix_id" name="healpix_id" datatype="int"/>
>>
>> which popped up as the more natural way of describing this to me
>> (PARAM <=> constant column; with a ref to "link" it to another existing
>> column).
>>
>> If the order is different for each row, it will naturally be described as
>> (italic = optional):
>>
>> <FIELD name="healpix_order" datatype="int" /ref="healpix_id"//>
>> <FIELD /ID="healpix_id"/ name="healpix_id" datatype="int"/>
>>
>> Cheers,
>>
>>
>> fx
>>
>>
>> Le 17/05/2023 à 18:07, Mark Taylor a écrit :
>>> Dear Applications,
>>>
>>> this mail is a summary of a proposed modification to VOTable that has
>>> been discussed on Github (https://github.com/ivoa-std/VOTable/issues/29)
>>> and that may make it into the proposed VOTable 1.5; I'm summarising it
>>> for comment on the apps mailing list at the request of Tom Donaldson,
>>> VOTable editor.
>>>
>>> Requirement
>>> -----------
>>>
>>> People sometimes want to add arbitrary key=value metadata to VOTable FIELD
>>> or PARAM columns, the sort of thing that doesn't fit into the existing
>>> attributes (unit, UCD, xtype, utype). Some examples:
>>>
>>> - Labelling DataLink PARAMs as mandatory or optional
>>> (https://github.com/ivoa-std/DataLink/issues/51)
>>>
>>> - Indicating HEALPix order for a column containing a HEALPix index
>>> (http://mail.ivoa.net/pipermail/apps/2016-August/001131.html)
>>>
>>> - Domain-specific standard metadata items from outside of astronomy
>>> (CAIO ATTRIBUTE
>>> athttps://www.cosmos.esa.int/web/csa-guide/tap-tables-and-views)
>>>
>>> At present there's really no way to do this, though in some cases it's
>>> possible to achieve the required effect by ad hoc abuse of some underused
>>> VOTable elements or attributes.
>>>
>>> I would like to see a way to associate arbitrary key/value pairs with a
>>> FIELD/PARAM to address issues like the above, and others we haven't
>>> foreseen.
>>> The idea would not be to associate any semantics to such per-column metadata
>>> within the VOTable standard, though other client standards or applications
>>> could do that using their own key vocabularies if they wanted to.
>>> I don't think the values need to be typed (i.e. key and value can just
>>> be strings as far as VOTable is concerned).
>>>
>>> Solutions
>>> ---------
>>>
>>> Since multiple instances per FIELD/PARAM might in principle be required,
>>> the obvious thing is to use child elements each with a key and value
>>> attribute.
>>> Some possibilities:
>>>
>>> (1) Allow FIELD/PARAM to contain INFO children:
>>>
>>> <FIELD name="healpix_id" datatype="int">
>>> <INFO name="healpix_order" value="8"/>
>>> </FIELD>
>>>
>>> (2) Invent a new element for this purpose, say META:
>>>
>>> <FIELD name="healpix_id" datatype="int">
>>> <META key="healpix_order" value="8"/>
>>> </FIELD>
>>>
>>> (3) Use the existing LINK element using RDF to indicate semantics:
>>>
>>> <FIELD name="healpix_id" datatype="int">
>>> <LINK action="rdf" content-role="#healpix_order" value="8"/>
>>> </FIELD>
>>>
>>> (1) and (2) would require modifications to the VOTable schema.
>>> (1) is arguably less disruptive since it doesn't introduce a new element;
>>> however it may be more prone to confusing existing clients, which may assume
>>> that an INFO anywhere within a TABLE represents table-level, rather than
>>> column-level, metadata.
>>>
>>> (3) requires no change to the VOTable schema, the only change required is
>>> an explanation somewhere in the document text about what this means,
>>> and that this pattern is the recommended way to do this sort of thing.
>>>
>>> Markus and I have had discussions on the relative merits of these options
>>> athttps://github.com/ivoa-std/VOTable/issues/29.
>>> Markus likes (3) because it fits into RDF semantic technology;
>>> I find (3) obscure (not obvious when reading what it means, not obvious
>>> when writing that this is how to communicate key=value intent)
>>> and therefore tend to favour (1) or (2) (probably (2)).
>>> But the fact that (3) requires no schema changes is clearly a significant
>>> bonus.
>>>
>>> I think either of us could live with either solution.
>>> Markus feel free to correct or clarify any of the above.
>>>
>>> Discussion
>>> ----------
>>>
>>> So, do others have opinions on:
>>>
>>> (a) whether this is a requirement worth expending effort to satisfy
>>> (b) which of options (1), (2), (3) or (other) is preferred
>>>
>>> I guess initial followups should go to this list, but presumably the
>>> discussion
>>> will make its way back tohttps://github.com/ivoa-std/VOTable/issues/29
>>> eventually; feel free to consult that Issue for more detail on the summary
>>> above.
>>>
>>> Mark
>>>
>>> --
>>> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
>>> m.b.taylor at bristol.ac.uk https://www.star.bristol.ac.uk/mbt/
> --
> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
> m.b.taylor at bristol.ac.uk https://www.star.bristol.ac.uk/mbt/
More information about the apps
mailing list