Persistent TAP uploads

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Wed Oct 16 08:57:55 CEST 2024


Hi Pat,

On Tue, Oct 15, 2024 at 09:31:44AM -0700, Patrick Dowler via dal wrote:
> (ii) our very simplistic parameter-based solution allows for single
> column indices (unique optional) and while it seems very
> underwhelming on the surface it has been "good enough" for the
> youcat use cases.

Well, I'm always in favour of taking step after step, doing what's
simple and immediately doable.  But spatial indexes are *so*
fundamental to what our users do that I'm sure we need that
from the start.  Actually, as a legacy of what I do for conventional
TAP uploads, DaCHS will currently automatically create a spatial
index if there's a (pos.eq.dec, pos.eq.ra) pair with a meta.main
qualifier.

But that feels too implicit to me to make it into a standard.

Perhaps we should have an endpoint create_index below the table URI,
to which you can post one or more COLUMN parameter(s) for a (possibly
multi-column, but I don't expect that to be something many people
will do) "normal" (in Postgres: B-Tree, I suppose) index and
SPATIALCOLUMNS for the preferred sort of 2d spherical index that the
data centre has?

For reference, postgres' create index has this syntax:

  CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON [ ONLY ] table_name [ USING method ]
      ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass [ ( opclass_parameter = value [, ... ] ) ] ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
      [ INCLUDE ( column_name [, ...] ) ]
      [ NULLS [ NOT ] DISTINCT ]
      [ WITH ( storage_parameter [= value] [, ... ] ) ]
      [ TABLESPACE tablespace_name ]
      [ WHERE predicate ]

-- with this proposal, only

  CREATE INDEX (col1, ...)

would be left (i.e., in particular no partial indexes, which might
sometimes make sense; but then that's really large-table stuff), with
a dash of USING via SPATIALCOLUMNS.

For now, this feels like a simplification that's good for the
overwhelming majority of what we want with TAP user uploads.

> Since TAP (tap_schema) is also  too simple to describe a
> multi-column index correctly, it seems like extending this will
> grow in scope/complexity.

Introspectability seems rather less urgent to me; at least for the
sort of ephemeral tables I have in mind here, I think people can
remember what they just did.  And of course, presenting the sort of
information required to help them exploit the indexes in a way that
works across DB engines and is understandable without a lot of
service-specific knowledge *is* rather daunting.

In practice, though, I have to say almost all my user-visible indexes
are one of four kinds:

* Single-column numeric
* Single-column for some sort of string operation (ok, there's simple
  patterns vs. full-text, IR-kind; that distinction actually would be
  relevant to work out what queries to write; hm...)
* Two-column spatial
* One-column geometry.

I don't think I actually do partial indexes anywhere.

If that's halfway typical, I could imagine figuring out some
non-magic description of indexes that help people write good queries.

Thanks,

         Markus



More information about the dal mailing list