prototype: scalable VOSI-tables-1.1

Mark Taylor m.b.taylor at bristol.ac.uk
Thu May 7 15:21:29 CEST 2015


Hallo Pat.

On Mon, 4 May 2015, Patrick Dowler wrote:

> I have implemented a prototype VOSI-tables-1.1 resource to deal with the
> issues that came up in the TAP discussion in Banff: some services have many
> tables and many columns and the top-level VOSI-tables document can be very
> large.

I've taken a look at implementing a client for your proposed
VOSI-tables-1.1 interface.  The general idea looks OK, but
I have a couple of comments.

> The basic aproach is to define a RESTful resource tree following the
> VODataService model: tableset, schema, table. All the example URLs below work
...
> Any table name found in a schema can be used as a child; this returns a
> <table> document:
> 
> http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/tap/tables/TAP_SCHEMA/TAP_SCHEMA.tables

The hierarchical <base_url>/<schema_name>/<table_name> scheme used
here means that you need to know the schema name for a table in order
to query the details (e.g. columns) for that table.
Since, as per the recent table_name discussion on the DAL list,
the table_name must already be fully qualified, i.e. lives in a flat
namespace, it's not clear that this is a good idea.
If you're iteratively querying the /tables endpoint from the top
down to reach a table of interest, that may not matter,
since you presumably have the schema/table hierarchy already[*].
However, if for instance you're trying to parse and validate
some ADQL from scratch, you may only have the table_name,
and no indication of what schema it lives in (unless you're allowed
to pull the table name apart in order to guess, which we have
established elsewhere is not reliable), so you couldn't use this
service to find out the table's columns.

That would argue instead for something like

   <base_url>?schema=<schema_name>
   <base_url>?table=<table_name>

rather than

   <base_url>/<schema_name>
   <base_url>/<schema_name>/<table_name>

(the detail=<level> parameter can still get appended using an
ampersand separator in the usual way).

[*] As it happens in my implementation code, table metadata objects
    don't know their parent schemas so this is a practical issue for me,
    but that may be down to my poor design.

> There is no REST binding to get a single <column> within  a <table> as I don't
> see the use case for that.

Probably you're right, though there are cases when I want the columns
without the foreign keys or vice versa, which might argue for some
more detail options.  However, if I have to get both columns and fkeys
in either case, it's not a big deal, so the additional complication
may not be warranted.

> The change to the VOSI-tables XSD is to add <schema> and <table> as valid
> document root elements with types taken from VODataService-1.1 xsd.

Not necessarily.  You could just require that every response from
this (modified) tables endpoint still has the tableset top-level
element, but only contains the elements that have been requested
(e.g. the ancestor schema and tableset of a requested table, but
not the sibling schemas).  This would (arguably) simplify the code
required for parsing these responses, and have the advantage
that schema information is provided for the table, which you
may not otherwise have as per my previous point.
It also means no changes required to the VOSI-tables XSD.
It would mean slightly more output for table requests,
but probably that schema metadata is not very bulky.

Finally: in topcat (not yet released, but I'll talk about it in
Sesto, and working previews available if you're interested at
ftp://andromeda.star.bris.ac.uk/pub/star/topcat/pre/topcat-full_tap.jar)
I'm finally tackling the client-side issues that this is trying to
address, i.e. acquiring and presenting to the user metadata for
very large tablesets.  Although I'm still experimenting, I currently
use a hybrid metadata acquisition policy that uses the /tables
endpoint for small services, and TAP_SCHEMA for large ones:

   ncol = (SELECT COUNT(*) FROM TAP_SCHEMA.columns)
   if ncol < 5000
      slurp entire VODataService doc from /tables endpoint
   else
      read all schemas and tables, but not columns, from TAP_SCHEMA in one go
      read per-table column/foreign key info from TAP_SCHEMA as required

It seems to work well for the services I've tested against,
in particular it's OK for TAPVizier.  So for my purposes,
it doesn't look essential to have a scalable reworking of VOSI-tables
as presented by this proposal.  Of course that's not to say it's
not a useful thing to have.

Mark

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/


More information about the grid mailing list