Generic FIELD/PARAM metadata items in VOTable

F.-X. Pineau francois-xavier.pineau at astro.unistra.fr
Wed May 24 12:34:41 CEST 2023


Mark and All,

1 - General case of adding "arbitrary key/value pairs with a FIELD/PARAM"

Sorry if it as already been discussed (and let me know if it is dumb):
what about allowing additional non-VOTable-reserved attributes in the 
FIELD/PARAM tags
(yes, it will break too restrictive existing parsers).
The fact that you (Mark) uses the "key/value pairs" term seems to 
indicate that you have
in mind the uniqueness of keys (so it is compatible with XML tag 
attributes), right?

If we serialize in JSON or TOML, we just get additional key/value pairs 
in the FIELD/PARAM objects.
It seems pretty straightforward and elegant (no additional complexity 
with sub-objects, ...).

The Rust VOTable parser support this since the beginning
https://github.com/cds-astro/cds-votable-rust/blob/81f8c481dca03f1766ab1d922c64e9726c29ef52/src/field.rs#L118-L120
and it will be even simpler if we assume values are only strings (like 
you suggest).


2 - On the particular HEALPix example

VOTable represents tabular data, thus a very flat view on data.

I lack fantasy to imagine what to do from an HEALPix number without 
knowing its order.
The order is an important piece of data going along with the HEALPix 
indices.
When each row have a different order, it is natural to provide the order 
in a separate column.
We do not consider the "order" column as a sub-column.
So, if all rows have the same order, it seems natural to me to provide 
the info in a PARAM
(= column of constant value), thus at a table level (and not at a column 
level).

 From my point-of-view, it is the role of GROUPs or VODML or refs
(or other mechanism possibly complementing but letting the FIELD/PARAM 
structure unchanged)
to introduce a hierarchy/logic/semantic(?) (beyond UCDs) in the set of 
FIELDS/PARAMS.

> You could also end up with multiple PARAMs having the same name, but
> referring to different columns, but I don't think there is any rule
> against that.

I probably miss your point since I don't see this as problematic knowing 
that
there is (at most) one ref per PARAM and the IDs are supposed to be unique.
(A good practice, though, should be to have unique FIELDs/PARAMs names).
Are you thinking of several columns sharing a same PARAM (that will have 
to be duplicated)?

(I agree for the readability, but are VOTable made to be human readable?
And from a human point-of-view, the column name (e.g. 'hpx8') and the 
DESCRIPTION
of the column should be enough to know the order).

Bonus: when serializing a VOTable in CSV, I tend to think that
PARAMs should be represented as columns containing constant values
(so even if the metadata is lost, we still have the PARAMs (redundant) 
values in output).

What do you think?


fx


Le 23/05/2023 à 11:34, Mark Taylor a écrit :
> FX,
>
> I hadn't thought of that, it's definitely a possibility.
> The semantics of the various ref/ID linkages are rather under-documented
> in VOTable, so like the other options it would need to be written
> in the standard what the meaning of this construction would be.
> Compared to the other options it's less obvious to a human reader
> what's going on, but it's a bonus that it doesn't require any changes
> to the schema.
>
> One negative consideration is that legacy software (e.g. current
> version of STIL/STILTS/TOPCAT) would see such PARAMs, ignore the ref,
> and assume that this was table-level rather than column-level metadata -
> but the same might happen for option (1).
> You could also end up with multiple PARAMs having the same name, but
> referring to different columns, but I don't think there is any rule
> against that.
>
> Interested in other people's opinions.
>
> Mark
>
> On Mon, 22 May 2023, Francois-Xavier PINEAU wrote:
>
>> Hi Mark and all,
>>
>> Only considering the given example (so the following may be irrelevant), what
>> about something like:
>>
>> <PARAM name="healpix_order" value="8" ref="healpix_id"/>
>> <FIELD ID="healpix_id" name="healpix_id" datatype="int"/>
>>
>> which popped up as the more natural way of describing this to me
>> (PARAM <=> constant column; with a ref to "link" it to another existing
>> column).
>>
>> If the order is different for each row, it will naturally be described as
>> (italic = optional):
>>
>> <FIELD name="healpix_order" datatype="int" /ref="healpix_id"//>
>> <FIELD /ID="healpix_id"/ name="healpix_id" datatype="int"/>
>>
>> Cheers,
>>
>>
>> fx
>>
>>
>> Le 17/05/2023 à 18:07, Mark Taylor a écrit :
>>> Dear Applications,
>>>
>>> this mail is a summary of a proposed modification to VOTable that has
>>> been discussed on Github (https://github.com/ivoa-std/VOTable/issues/29)
>>> and that may make it into the proposed VOTable 1.5; I'm summarising it
>>> for comment on the apps mailing list at the request of Tom Donaldson,
>>> VOTable editor.
>>>
>>> Requirement
>>> -----------
>>>
>>> People sometimes want to add arbitrary key=value metadata to VOTable FIELD
>>> or PARAM columns, the sort of thing that doesn't fit into the existing
>>> attributes (unit, UCD, xtype, utype).  Some examples:
>>>
>>>      - Labelling DataLink PARAMs as mandatory or optional
>>>        (https://github.com/ivoa-std/DataLink/issues/51)
>>>
>>>      - Indicating HEALPix order for a column containing a HEALPix index
>>>        (http://mail.ivoa.net/pipermail/apps/2016-August/001131.html)
>>>
>>>      - Domain-specific standard metadata items from outside of astronomy
>>>        (CAIO ATTRIBUTE
>>> athttps://www.cosmos.esa.int/web/csa-guide/tap-tables-and-views)
>>>
>>> At present there's really no way to do this, though in some cases it's
>>> possible to achieve the required effect by ad hoc abuse of some underused
>>> VOTable elements or attributes.
>>>
>>> I would like to see a way to associate arbitrary key/value pairs with a
>>> FIELD/PARAM to address issues like the above, and others we haven't
>>> foreseen.
>>> The idea would not be to associate any semantics to such per-column metadata
>>> within the VOTable standard, though other client standards or applications
>>> could do that using their own key vocabularies if they wanted to.
>>> I don't think the values need to be typed (i.e. key and value can just
>>> be strings as far as VOTable is concerned).
>>>
>>> Solutions
>>> ---------
>>>
>>> Since multiple instances per FIELD/PARAM might in principle be required,
>>> the obvious thing is to use child elements each with a key and value
>>> attribute.
>>> Some possibilities:
>>>
>>>      (1) Allow FIELD/PARAM to contain INFO children:
>>>
>>>           <FIELD name="healpix_id" datatype="int">
>>>             <INFO name="healpix_order" value="8"/>
>>>           </FIELD>
>>>
>>>      (2) Invent a new element for this purpose, say META:
>>>
>>>           <FIELD name="healpix_id" datatype="int">
>>>             <META key="healpix_order" value="8"/>
>>>           </FIELD>
>>>
>>>      (3) Use the existing LINK element using RDF to indicate semantics:
>>>
>>>           <FIELD name="healpix_id" datatype="int">
>>>             <LINK action="rdf" content-role="#healpix_order" value="8"/>
>>>           </FIELD>
>>>
>>> (1) and (2) would require modifications to the VOTable schema.
>>> (1) is arguably less disruptive since it doesn't introduce a new element;
>>> however it may be more prone to confusing existing clients, which may assume
>>> that an INFO anywhere within a TABLE represents table-level, rather than
>>> column-level, metadata.
>>>
>>> (3) requires no change to the VOTable schema, the only change required is
>>> an explanation somewhere in the document text about what this means,
>>> and that this pattern is the recommended way to do this sort of thing.
>>>
>>> Markus and I have had discussions on the relative merits of these options
>>> athttps://github.com/ivoa-std/VOTable/issues/29.
>>> Markus likes (3) because it fits into RDF semantic technology;
>>> I find (3) obscure (not obvious when reading what it means, not obvious
>>> when writing that this is how to communicate key=value intent)
>>> and therefore tend to favour (1) or (2) (probably (2)).
>>> But the fact that (3) requires no schema changes is clearly a significant
>>> bonus.
>>>
>>> I think either of us could live with either solution.
>>> Markus feel free to correct or clarify any of the above.
>>>
>>> Discussion
>>> ----------
>>>
>>> So, do others have opinions on:
>>>
>>>     (a) whether this is a requirement worth expending effort to satisfy
>>>     (b) which of options (1), (2), (3) or (other) is preferred
>>>
>>> I guess initial followups should go to this list, but presumably the
>>> discussion
>>> will make its way back tohttps://github.com/ivoa-std/VOTable/issues/29
>>> eventually; feel free to consult that Issue for more detail on the summary
>>> above.
>>>
>>> Mark
>>>
>>> --
>>> Mark Taylor  Astronomical Programmer  Physics, Bristol University, UK
>>> m.b.taylor at bristol.ac.uk            https://www.star.bristol.ac.uk/mbt/
> --
> Mark Taylor  Astronomical Programmer  Physics, Bristol University, UK
> m.b.taylor at bristol.ac.uk           https://www.star.bristol.ac.uk/mbt/


More information about the apps mailing list