TAP information schema
Patrick Dowler
patrick.dowler at nrc-cnrc.gc.ca
Thu Oct 11 09:40:04 PDT 2007
On 2007-10-10 06:10, Keith Noddle wrote:
> Cases so dictate. Finally, it was made abundantly clear to us in Beijing
> - and it remains the case - that the priority for TAP V1.0 is to define
> how we handle ADQL querying. Period. No arguments.
I agree with this 100%. We all agree that TAP 1.0 should be a minimal spec we
can move forward with and at the core this means doing ADQL querying.
As for metadata, one really does need more than tables and columns in the
general case. Specifically, some RDBMSs require that the SQL contains the
schema name (DB2, eg) on the front of every table name. I do not think that
ADQL requires this (maybe shouldn't) but as a site using such a database I
need to be able to tell people what the schema name is. Now, I could stretch
the table name to include it (eg mySchema.myTable) but that actually throws a
lot of stuff away (like the fact that I use different schemata for different
versions) and would like to describe what each each schemameans, and that
maybe the schema as a whole implements some data model -- as would likely be
the case since few data models can be sensibly stored in a single table).
That's not a big deal right now, but if we ignore it and force services and
apps to ignore schema names then in future we could have some problems when
we try to expose it. The same goes for what metadata tells people how to
write more complex queries with joins etc... we probably should not
standardise now but we need to do it in a way that doesn't make the future
detailed metadata still the definitive metadata.
So, my gut feeling eight now is that basic resource discovery in the registry
is going to use VOResource (or some specialisation of that) and users need to
be able to see what the content is (tables and columns) for that task. We
should aim to support that task only -- suitable content discovery -- and we
should not try very hard to make that VOResource description the way to
actually formulate queries (just "accidentally on purpose" as a friend used
to say :-)
What I am thinking is this: the "suitable content discovery" will describe
content, which effectively means tables and columns: assuming there was
detailed metadata for building queries elsewhere, you still need to ask for
it so the VOResource needs to have the scheme (namespace) and table names and
because people will be looking for things via utype and/or ucd of columns...
the only thing not really needed for discovery that we can stick in so people
can write queries are the actual column names*. Once we have a detailed
metadata system for TAP 1.1 we could deprecate the column names in the
VOResource, or not if no one cares enough.
* nominally, discovery doesn't care about units either, but practically client
software will care if they don't have some generic unit conversion utility
Summary: VOResource describes tables and columns (maybe namespaces aka
schemata) aimed at "suitable content discovery", but we stick in column names
and units for completeness/symmetry with the table description. The service
emits this document via the standard service method. This is good enough for
full ADQL queries of single tables, with joins reserved for users that
actually knows the target schema or care to learn it via documentation.
This would be "good enough" and not shut off any future development.
--
Patrick Dowler
Tel/Tél: (250) 363-6914 | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada | Gouvernement du Canada
5071 West Saanich Road | 5071, chemin West Saanich
Victoria, BC | Victoria (C.-B.)
More information about the dal
mailing list