Persistent TAP uploads
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Wed Oct 16 08:57:55 CEST 2024
Hi Pat,
On Tue, Oct 15, 2024 at 09:31:44AM -0700, Patrick Dowler via dal wrote:
> (ii) our very simplistic parameter-based solution allows for single
> column indices (unique optional) and while it seems very
> underwhelming on the surface it has been "good enough" for the
> youcat use cases.
Well, I'm always in favour of taking step after step, doing what's
simple and immediately doable. But spatial indexes are *so*
fundamental to what our users do that I'm sure we need that
from the start. Actually, as a legacy of what I do for conventional
TAP uploads, DaCHS will currently automatically create a spatial
index if there's a (pos.eq.dec, pos.eq.ra) pair with a meta.main
qualifier.
But that feels too implicit to me to make it into a standard.
Perhaps we should have an endpoint create_index below the table URI,
to which you can post one or more COLUMN parameter(s) for a (possibly
multi-column, but I don't expect that to be something many people
will do) "normal" (in Postgres: B-Tree, I suppose) index and
SPATIALCOLUMNS for the preferred sort of 2d spherical index that the
data centre has?
For reference, postgres' create index has this syntax:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON [ ONLY ] table_name [ USING method ]
( { column_name | ( expression ) } [ COLLATE collation ] [ opclass [ ( opclass_parameter = value [, ... ] ) ] ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ INCLUDE ( column_name [, ...] ) ]
[ NULLS [ NOT ] DISTINCT ]
[ WITH ( storage_parameter [= value] [, ... ] ) ]
[ TABLESPACE tablespace_name ]
[ WHERE predicate ]
-- with this proposal, only
CREATE INDEX (col1, ...)
would be left (i.e., in particular no partial indexes, which might
sometimes make sense; but then that's really large-table stuff), with
a dash of USING via SPATIALCOLUMNS.
For now, this feels like a simplification that's good for the
overwhelming majority of what we want with TAP user uploads.
> Since TAP (tap_schema) is also too simple to describe a
> multi-column index correctly, it seems like extending this will
> grow in scope/complexity.
Introspectability seems rather less urgent to me; at least for the
sort of ephemeral tables I have in mind here, I think people can
remember what they just did. And of course, presenting the sort of
information required to help them exploit the indexes in a way that
works across DB engines and is understandable without a lot of
service-specific knowledge *is* rather daunting.
In practice, though, I have to say almost all my user-visible indexes
are one of four kinds:
* Single-column numeric
* Single-column for some sort of string operation (ok, there's simple
patterns vs. full-text, IR-kind; that distinction actually would be
relevant to work out what queries to write; hm...)
* Two-column spatial
* One-column geometry.
I don't think I actually do partial indexes anywhere.
If that's halfway typical, I could imagine figuring out some
non-magic description of indexes that help people write good queries.
Thanks,
Markus
More information about the dal
mailing list