TAP information schema

Thu Oct 11 09:40:04 PDT 2007

On 2007-10-10 06:10, Keith Noddle wrote:
> Cases so dictate. Finally, it was made abundantly clear to us in Beijing
> - and it remains the case - that the priority for TAP V1.0 is to define
> how we handle ADQL querying. Period. No arguments.

I agree with this 100%. We all agree that TAP 1.0 should be a minimal spec we 
can move forward with and at the core this means doing ADQL querying.

As for metadata, one really does need more than tables and columns in the 
general case. Specifically, some RDBMSs require that the SQL contains the 
schema name (DB2, eg) on the front of every table name. I do not think that 
ADQL requires this (maybe shouldn't) but as a site using such a database I 
need to be able to tell people what the schema name is. Now, I could stretch 
the table name to include it (eg mySchema.myTable) but that actually throws a 
lot of stuff away (like the fact that I use different schemata for different 
versions) and would like to describe what each each schemameans, and that 
maybe the schema as a whole implements some data model -- as would likely be 
the case since few data models can be sensibly stored in a single table).

That's not a big deal right now, but if we ignore it and force services and 
apps to ignore schema names then in future we could have some problems when 
we try to expose it. The same goes for what metadata tells people how to 
write more complex queries with joins etc... we probably should not 
standardise now but we need to do it in a way that doesn't make the future 
detailed metadata still the definitive metadata.

So, my gut feeling eight now is that basic resource discovery in the registry 
is going to use VOResource (or some specialisation of that) and users need to 
be able to see what the content is (tables and columns) for that task. We 
should aim to support that task only -- suitable content discovery -- and we 
should not try very hard to make that VOResource description the way to 
actually formulate queries (just "accidentally on purpose" as a friend used 
to say :-)

What I am thinking is this: the "suitable content discovery" will describe 
content, which effectively means tables and columns: assuming there was 
detailed metadata for building queries elsewhere, you still need to ask for 
it so the VOResource needs to have the scheme (namespace) and table names and  
because people will be looking for things via utype and/or ucd of columns... 
the only thing not really needed for discovery that we can stick in so people 
can write queries are the actual column names*. Once we have a detailed 
metadata system for TAP 1.1 we could deprecate the column names in the 
VOResource, or not if no one cares enough.

* nominally, discovery doesn't care about units either, but practically client 
software will care if they don't have some generic unit conversion utility

Summary: VOResource describes tables and columns (maybe namespaces aka 
schemata) aimed at "suitable content discovery", but we stick in column names 
and units for completeness/symmetry with the table description. The service 
emits this document via the standard service method. This is good enough for 
full ADQL queries of single tables, with joins reserved for users that 
actually knows the target schema or care to learn it via documentation.

This would be "good enough" and not shut off any future development.

-- 

Patrick Dowler
Tel/Tél: (250) 363-6914                  | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre   | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada                  | Gouvernement du Canada
5071 West Saanich Road               | 5071, chemin West Saanich
Victoria, BC                                  | Victoria (C.-B.)