SQLite + TAP_SCHEMA or similar for self-describing SQLite files

Mon Sep 25 10:59:20 CEST 2017

Hi Brian,

On Thu, Sep 21, 2017 at 10:30:03PM +0000, Van Klaveren, Brian N. wrote:
> For the metadata describing the tables, I'm hoping to use a variant
> of the TAP_SCHEMA supporting tables and/or the RegTAP tables (e.g.
> res_table, table_column), sticking as close as possible to those

Uh, apologies, then, that rr.tes_tables uses column names different
from tap_schema.tables.  I have to admit this is the first time I've
noticed that compatiblity with tap_schema should have been a
desideratum for RegTAP, too.  But at least I've just convinced myself
that there are valid technical reasons for the deviation between the
two.

> So we were thinking about having one table per schema. In that
> scenario, we need to pick a few good names, which hopefully never
> collide with the names people might otherwise choose. An easy
> option might be to just use "$" or "_" for the namespacing, e.g.
> "meta$tables" or "rr_res_table". Or we could just create new names,
> e.g. "meta_table", "rs_column".

Gut feeling: yes, that sounds a lot better than having semi-hidden
per-table metadata tables.

> 1. If these SQLite tables were an optional output of a TAP
> endpoint, I think we'd like some way of denoting which table is the
> primary result table. I was thinking we'd allow arbitrarily-named
> tables in out SQLite files, and the implementation would need to
> add a ";meta.main" to a table's UCD in the metadata table
> corresponding. Then, when reading in a file, a client could
> determine quickly which table is the default table for that file
> from a simple query.

That sounds a bit odd to me -- UCDs on tables are at least uncommon,
and given the UCD's semantics having them doesn't sound quite right.

But I don't think I've understood the use case here.  If you do a TAP
query into a persistent server-side table, people will either give
the result table name in the query ("select into...") or will get it
from a UWS result document (say), no?  What kind of discovery do you
think would then use this kind of annotation?

> 2. Representing column groupings in a table. Something similar to
> gavo's TAP_SCHEMA.groups I think would be fine.

I'm a bit surprised that in the, what, five years since the
TAP_SCHEMA.groups idea was floated, no further activity in that
direction has taken place -- I, for one, still believe it's a good
idea, and I'd be happy to work with you on figuring out the datails.

> 3. Other TAP/VOTable-specific features (params, fields, info,
> etc...). Do also we need to store information about indexes, keys,
> constraints, or would UCDs work fine for most of this?

Indexes and foreign keys you need for TAP_SCHEMA, and even though
this information currently isn't employed terribly usefullly by
common clients as far as I've seen, I think there's very good reasons
to show them to the user ("suggested joins", mainly).

Constraints: I doubt users will in general be terribly interested in
common column constraints (NOT NULL, UNIQUE, etc).  I could be wrong,
and if that turns out to be true, I suppose we should develop on
TAP_SCHEMA to cover more of these.  Note that there's nothing wrong
with experimentally appending a couple of columns to TAP_SCHEMA
tables.

> 4. Versioning - Since it's not an XSD, we would at least need one
> extra table to denote the version of the metadata tables we've
> included.

This point lets me doubt I understand your use case(s) that well.

But this *sounds* like you have one or more global data model(s).  If
that were the case I'd use a canonical URL (do yourself a favor and
make it all lower case; monkeying around with claiming
case-insensitivity might seem inoccuous but always ends in pain) with
the models' URLs and dump that URL into the TAP_SCHEMA.schema and/or
the TAP_SCHEMA.table's utype column.  

They're doing a similar thing in EPN-TAP, and in general I'm pretty
sure DM declaration (including Obscore and RegTAP) will have have to
migrate to something like this anyway -- DM conformance simply is a
property of a schema or a table, not of a service (where this is
currently declared through TAPRegExt).  Having URIs (rather than
essentially undefined opaque strings) in utypes, on the other hand,
seems like a good way to salvage the fields that are already there.

      -- Markus