xmin/xmax in postgres-based TAP services?
pdowler.cadc at gmail.com
Thu Nov 23 23:39:08 CET 2017
I agree that TAP services probably have to deal with this via column
renaming; I already do that with upload table names and have code to
rename columns so it isn't much of a stretch to do it all the time.
On validators: there are a lot of edge cases that services may not be
able to handle correctly or figure out. As long as the request fails
with a suitable error message and code I don't think that is reason
for a validator to declare it non-compliant. Sure, there are queries
all services should support (select * from tap_schema.tables) so some
test queries may be more compliance-mandating than others... For
example, I personally think that anyone using DISTINCT is just lazy so
I could put that word in my black list or maybe only allow it for
columns with a known finite set of values. That probably should not
make my service invalid. There are a wide range of quality-of-service
choices implementors make in figuring out which things to put effort
into making work and which to ignore... I don't think we can/should
make those decisions for them.
On 15 November 2017 at 05:17, Grégory Mantelet
<gmantele at ari.uni-heidelberg.de> wrote:
> Dear Markus,
> I fear I currently do the same as you in my TAP Library: I just quote table
> and column names, but I do not do any smart rename.
> But considering the issue you raised, I will probably do so now. I already
> offer the possibility to have a different name in the database (what I call
> SQL names) than in the ADQL queries (ADQL names). So that should not be that
> difficult in my case to change behind the scene the SQL name of all uploaded
> tables' columns. It won't affect the ADQL queries and should run as expected
> in Postgres (and other supported databases).
> I personally don't see any way to standardize column names of uploaded
> tables. Users will surely do as they want and putting more constraints on
> the input is probably not a good idea if we want them to use TAP willingly.
> So that's probably something that TAP servers should deal with....though I
> admit it is not always that straightforward to do.
> On 11/14/2017 01:19 PM, Markus Demleitner wrote:
>> Dear DAL,
>> in particular operators of TAP services over postgres,
>> the mapping between database column names and VOTable column names --
>> current spec: "For VOTable output, the name attribute of FIELD
>> elements must be the same as the column names (or aliases if
>> specified in the query) in the query" -- has all kinds of interesting
>> subtleties. I've discovered a new one the other day, in connection
>> with table uploads.
>> Actually, the upload section doesn't have an analogous regulation
>> ("the name attributes of FIELDs in uploaded tables must match the
>> column names in the database" -- should it? of course, there are
>> more subtleties in there, too, such as that FIELD/@names can be
>> repeated in a single VOTable...). I guess one still can take that as
>> For names that collide with reserved words, quoting helps, and that's
>> what DaCHS does.
>> However, I recently noticed that there's a couple of names in
>> Postgres that are, in a way, hyper-reserved in that the respective
>> columns exist in *all* tables and there's nothing you can do about
>> it. In particular, you can't have your own columns with these names.
>> That's, specifically, oid, tableoid, xmin, cmin, xmax, cmax, ctid.
>> In particular xmin, xmax and cmin, cmax have a large potential of
>> hurting. I'm sure there are lots of tables out there that have these
>> names -- a quick search in the relational registry turns up 9 "cmin"
>> columns in VO-published tables.
>> The question is: What do we do?
>> If a TAP service had the right to modify column names in uploaded
>> tables, the whole table upload business becomes, I think, unusable.
>> So, that's a no-no.
>> TAP probably cannot (really, usefully) forbid certain column names in
>> uploads, and even if we wanted that, TAP obviously shouldn't codify
>> Postgres reserved names -- so having a blacklist of certain names
>> isn't an escape route either.
>> DaCHS currently simply fails when you use one of the dirty name, and
>> I was pleasantly surprised that it even gives a reasonably
>> understandable error message ("column name 'xmax' conflicts with a
>> system column name"; I guess it should add what table xmax is in, but
>> it's a start).
>> The question is: Am I invalid with this behaviour? The current spec
>> says "Table upload must support all valid VOTable content", so an
>> ill-meaning validator could fail me. It also says "Services may
>> limit the size and number of uploaded tables; if the service refuses
>> to accept the entire table it must respond with an error as described
>> in section 3.3." -- there's no provision in here letting me refuse a
>> table just because I don't like the names in there.
>> So I guess by the current spec DaCHS *is* in violation of the TAP
>> I suppose I *could* re-write both uploads and queries to
>> transparently rename the columns to postgres-palatable names (with
>> additional subtleties, because the mangled names might clash with names
>> already in the table...).
>> But is it worth it? What do other people do? I'm particularly
>> intrigued by the appearance of three "cmin" columns in VizieR, I have
>> to say...
>> -- Markus
Canadian Astronomy Data Centre
Victoria, BC, Canada
More information about the dal