TAP RFC [MTIME]

Thu Sep 17 10:03:44 PDT 2009

Hi Gerard -

On Thu, 17 Sep 2009, Gerard wrote:

>> The main issue is
>> what to do about deleted table rows.  [...]
> 
> For reasons such as this I sided with Pat and Francois to at least
> postpone introduing this feature to a next version.  Let's not
> introduce now features that aim to implement ill-understood
> requirements.  It likely will lead to problems and may for backwards
> compatibility have to be maintained even in a next version.

As I said earlier, since MTIME is an optional advanced capability
it could be deferred (like VOSpace integration etc.) to allow more
time to consider how to solve this problem and do some prototyping.

At this point the key thing I would emphasize is the deleted row
issue.  I think this would be quite hard to handle with the approach
of just adding visible metadata to tables (assuming we could get data
providers to modify tables in that way at all).

> This poor man's support for version control will open up a can of
> worms and there is arguably in the general case no need for it as
> often published data sets may actually not change (should we even
> insist that published data sets should be unchangeable?).  Eg SDSS
> created new data releases, keeping the old ones alive. They did not
> update them. Same for Vizier I think.

Tables are used for many things and they do change - we cannot tell
the world to stop changing their data.  An index table describing a
telescope data collection for example, will continually grow as new
data is added.  There are any number of such cases.  We would like
to be able to monitor such tables in order to implement capabilities
such as global data discovery.  Also, this is not merely a TAP issue;
SSA, SIA etc. need a similar capability - a parameter-based approach
such as MTIME (already in SSA) can handle all such cases uniformly.

> For now I doubt the necessity
> of this mechanism, and instead can see many possible pitfalls.  If I
> (am allowed to?) update one table, but not another and joint them
> together, does the MTIME apply to both tables? What if a row within
> the time interval is joined to one outside?

MTIME applies only to individual base tables; one has to specify the
individual table name.  Joins etc. are not an issue and certainly
have nothing to do with maintaining a remote table replica.  These
complications arise only with this other approach of inserting visible
time stamps directly into tables.

> > Extrapolating on Francois'
> > comment, I would propose to remove the "primary" attribute from the
> > column metadata? When I do a
> 
> > Unfortunately this would prevent switching between the "narrow" and
> > "wide" views of a table on the client side in smart table viewers.
> > This is a popular feature for viewing astronomical catalogs and other
> > tables which can have a lot of columns.  In previous discussions
> > we considered using only views but the consensus at the time was to
> > provide a simpler mechanism (the "primary" flag) for this most basic
> > narrow/wide use case, providing a view mechanism as well to allow
> > the data provider to define additional custom views of the data.
> > It is a simple enough thing to support in the column metadata which
> > does not take anything away.
> 
> Again I have problems with such a use case.  If table viewers are
> so smart, they likely have been written that way by the clients and
> have not been based by decisions of the publishers. I imagine that
> vizier's suport for such narrow views was not derived from metadata
> assigned to tables in astronomical journals that said certain columns
> were less important than others. SDSS SkyServer did introduce various
> useful views, showing that different columns may be of interest in
> different cases, iso 1 blanket choice of primary or not.

We discussed this again in a NVO telecon just this morning, and
identified several sites just from the people on the telecon where we
routinely do this already.  Providing narrow/wide views of tables is
a very popular feature which many of us already provide both at the
archive and client app level and which we would like TAP to support.

> It still does not answer the question how to treat "select * from
> sometable".  Should that only return primary columns? There is no
> feature in ADQL to select only primary columns.  so the most important
> (imho) reason for TAP, publishing relational databases with a rich
> query language (ADQL), has no use for it. Could we not concentrate
> on supporting this case (including ParamQuery) and postpone these
> features to future versions?

Select * from sometable returns all the columns.  Even with an
ADQL query a smart table viewer can use this metadata effectively.
ParamQuery does support narrow/wide views directly and requires this
metadata to be able to do so.  Why remove something so simple which
is already in the spec and has already proven quite useful?

 	- Doug