TAP 1.0: Editorial comments

Wed Jul 15 13:02:23 PDT 2009

Below I've enclosed some editorial comments on TAP.  These
are explicitly not intended to be substantive, but aim to clarify the
document.   Substantive comments were sent in an earlier message.

A lot of these comments have to do with carefully delineating the three 
levels of HTTP, TAP and ADQL/PQL and making sure that the standards 
documents work together cleanly.  Many thanks to the authors for putting
all of the pieces together even if I occasionally have slight 
disagreements on the wording.

	Regards,

	Tom McGlynn

1.1.4

For someone who doesn't understand hat ADQL and PQL are coming into
the document the organization is a little confusing.  1.1.4-1.1.6 are
actually subtopics of 1.1.1.  I.e., they are the languages for the the
data queries.  I'd either put them within or below 1.1.1  (i.e., 
1.1.1.1-1.1.1.3) or move them  immediately afterward.

I'd prefer 'Support for ADQL Data Queries is mandatory.'  Without the
context in which ADQL is used in the interface I'm not quite clear
what's meant by 'Support for ADQL is mandatory'.

While I understand the sense in which 'support arbitrary queries' is
intended, it's not really true. It's arbitrary within the constraints of
ADQL but that makes it a tautology.  I think an example and a discussion
of what an ADQL construct looks like would be better, e.g.,

"An ADQL query may query one or more tables using the Astronomical
Data Query Languages, a version of SQL adapted for use in the
astronomical context (see ...).  An example query is:
[Some example]
ADQL queries generally specify the tables used, and columns displayed
and used in constraints so that queries must be customized for each TAP
service."

The last sentence is very unclear to me.  I don't know what it means at
all, so I can't suggest a clarification.

1.1.5

"Support for PQL Data Queries is optional."

I'd cut down on the discussion of the PQL language -- or add to the
discussion of ADQL.  Right now they are unbalanced.

1.1.5

"to [an] underlying RDBMS"  not [the] since we don't require that the
user have an RDBMS.  Maybe they have a purely XML based system and are
amusing themselves translating relational queries to hierarchical ones.

1.2

Get rid of "Conversely,"  The converse is a logical relationship which
these two definitions do not satisfy.

I would add:

'Users select Synchronous and Asynchronous access by selecting the
appropriate base URL for their requests and then adding the query
specific parameters to the base.  If the underlying TAP services is at
"http://myhost/stem", then synchronous access is through the URL
"http://myhost/stem/sync" and asynchronous access through
"http://myhost.stem/async"'.

1.2.2

The second paragraph is commentary and not appropriate as part of the
standard.

Given the last sentence, it would probably be useful to have a table of
sync/sync versus data/meta/vosi, i.e., which types of queries can/must
be supported sync and async -- I see that's later on -- maybe just
delete this.

1.3

I've been reading this section and it has an immense amount of
information.  Even if this is only thought of as informative rather
than prescriptive I think it needs to be rewritten so that we don't
throw too much at the user at once.  Some of the topics that are covered
in the next few paragraphs:
    Sync versus async examples,
    HTTP return codes,
    POSTs and GETs
    Caching of results,
    Typical request parameters.

I've written a replacement for this.  While I don't think this needs to
be used as I've written it, I do think we need a rewrite.  Some of this 
may be coordinated with changes to section 7 [which I'd prefer to see as 
appendix 1]. The  replacement is delimited by the ---  lines.

------------------------------

1.3 Using TAP over HTTP.

TAP is implemented over the HTTP protocol using standard HTTP GET and
POST requests and conventions.  A TAP request specifies one or more 
parameter key/value pairs of strings.  Both the key and value are 
strings.  The keys used in TAP are explicitly discussed in the document 
  below and in the documents describing the query languages a given 
service supports.  The values may need to be encoded, depending upon 
exactly how the request is made.  In GET requests, the parameters are 
specified as
    key=value
pairs separated by &'s.  The key and value strings are URL-encoded, with 
most non-alphnumeric characters replaced by %xx escapes. The parameters
follow a ? in the GET URL.

In POST requests, the standard encoding for parameters is similar but
the parameters are passed in a data stream that is sent to the request
processor.  Other encodings are possible for POST request, and a very
different encoding is used when a file is uploaded.

This document will describe the HTTP parameters using the
   KEY=VALUE
syntax, but does not generally describe the process of HTTP encoding for
GET and POST requests except in section 7 [or appendix 1] which
discusses HTTP in more detail.

HTTP GET and POST requests can often be used interchangeably.  However
the are some differences.  GET's can be encoded within documents, since
the parameter information is explicitly part of the URL.  Services and
intermediate proxies are allowed to cache GET requests, but POST
requests must be independently processed each time they are received.
Requests which have voluminous inputs must be specified as POSTs since a
GET URL may be truncated if it exceeds ~ 1kB.  If a file upload is part
of a request, a POST must be used.

HTTP returns a variety of response codes.  Successful synchronous calls
will normally return the status code 200.  Successful asynchronous calls
will normally return a 303.  If an error condition is handled by the
server, then a status code of 200 may be returned but the returned
document will indicate the error.  Other error codes indicate conditions
where the server was not able to handle the request.

1.3.1 Data Examples

A user might want to query a table for objects in a given range of
r-magnitudes.  A synchronous ADQL query might have the following
parameters in the request:
     LANG=ADQL
     REQUEST=doQuery
     QUERY=SELECT * FROM magnitudes as m where m.r>10 and m.r<16

The LANGUAGE parameter indicates that we are using ADQL, the REQUEST
parameter is used to specify that this is a Data Query, and the actual
content of the query is in the QUERY  parameter.

If a TAP service has a synchronous base URL of
     http://example.com/tap/sync
we can execute the query by sending a GET HTTP request
     ...the fully HTTP encoded URL...

[This is the only time I'd give a URL in this section.  I think just
specifying the parameters is clearer and doesn't push the user to GET
URLs.  The other place where full URLs are fine is in the UWS stuff
where there are no parameters.  Note, by the by, that we probably
shouldn't be using GET URLs for status requests, since they can be cached.]

Since this is a synchronous request, the response to this should be the
requested data or some error indication.

This request could also have been sent use HTTP POST.

An asynchronous request would be initiated almost identically.  Only
the root URL replacing 'sync' with 'async' is different.  However the
anticipated response is an HTTP 303 'See other' response code which
indicates that the actual response is available in some other URL.  The 
user will be able to get the request number from the response and can 
then poll specified locations until a result or error is found.

The equivalent example in PQL might have parameters:

    REQUEST=doQuery
    LANG=PQL
    FROM=magnitudes
    WHERE=r,10/16
[TAM: I hope that the WHERE syntax in PQL can be improved, see my
comments there...]

and could be sent as GET or POST and either synchronously or asynchronously.

1.3.2  Metadata examples

Each TAP service has its own 'tableset': a collection of tables and
columns with locally-defined names. Those local names are the operands
in the queries and so a client needs to know the tableset for a
particular service to form a query. There are two ways of exploring the
tableset. First, a description of the entire tableset may be obtained in
XML via the using VOSI.  This has the parameters:

    REQUEST=getTableMetadata

Metadata requests must be synchronous so the base URL must use the
synchronous end point so.  The result is returned immediately. These
metadata are in the format defined for the IVOA resource-registry and
the client may find a cached copy in the registry.

Alternatively, the structure of the tableset is described by a set of
tables with fixed names beginning with TAP_SCHEMA which may be queried
using any supported language.

Users can also request metadata about tables whose existence they know
of, by making requests which return header information, but no rows of data.

1.3.3 Checking availability

The service's availability can be read using the fixed VOSI
availability.  This is a synchronous request with the parameter
    REQUEST=getAvailability.
Since GET requests may be cached by intermediate proxy services, POSTs
are recommended for getAvailability since changes to the state of the
service may be hidden by a cache.

------------------------

Back to comments...

2.1

I'm not sure what the Parameters column of the table is since it doesn't
include all of the parameters required in given situations.  If it's
intended to show how we distinguish the various items, then the values
of the parameters should be noted.  I.e., have
    REQUEST=doRequest
    LANG=ADQL
in the first row, not just REQUEST, LANG.

The table needs more explanation:

"This table indicates the services that a TAP implementation must
support and those which are optional.  The last column indicates the
parameters which are used to distinguish the type of request but does
not show all required parameters for that type."

n/a should be replaced with "no" or something less ambiguous.

2.2.1  Might want a warning about using GET for getAvailability

2.2.2  I think the UWS tree needs to be reproduced here.  This is too
central to the protocol to be left to another document. [Or deep in the
bowels of section 5].  I.e., replace the list you have here with a table 
that gives the meaning of each of those URLs.

2.3.1 Unless I misunderstand REQUEST is not used for every request, 
i.e., requests that are followups to an initial aynchronous request do 
not  have a REQUEST parameter.  It may be that request is being used in 
a way that precludes these, but if so that should be stated.

2.3.3
I would not specify lists of REQUEST using the & syntax.  This is
occasionally (if there are file uploads) incorrect and in any case
seems less clear that specifying the parameters independently.

E.g.,  Not
   REQUEST=doQuery&LANG=ADQL&<ADQL-specific parameters>
but
   REQUEST=doQuery
   LANG=ADQL
   <ADQL specific parameters>

The current usage also suggests that the order of parameters is significant.

2.3.4 We should be clear to indicate optionality at only one level.  It
seems to me that the ADQL standard indicated what was optional.  If so
then TAP need not mention it.  Similarly points regarding the subtleties
of the ADQL syntax (viz. the POINT discussion) are not appropriate here.

I'm not sure about the timestamp discussion.  Again it seems to me that
this belongs in the ADQL (and PQL) documents.

2.3.5
The discussion of @ parameters belongs in the PQL document.

2.3.6
The mime-types are also case insensitive.  The text might suggest to a
reader the opposite.

2.3.11
I find the wording... "Parameter values must be case sensitive".  I
don't think that's true.  The document is trying to indicate the service
must respect the input case, but this suggests that 1.6E7 should be
distinguished from 1.6e7.  I think the correct wording is "Parameter
values may be case sensitive."  The problem is that the subject of the
sentence is wrong.  Maybe.
"Parameter names must be treated as case insensitive"  and "Parameter
values must be treated as case sensitive"  It's the treater that the
must applies to not the parameter name or value.

2.5.2

[Got some substantive issues here but on the editorial side I have some
questions that should be addressed]

Can a table name be the same a a valid parameter, e.g.., 'request'?
Are table names case sensitive?  They are 'values' in some sense.

2.7.2

Paragraph 3 is language specific.  Perhaps "With the
exception of columns ... the columns should be as specified in the
appropriate language document."  E.g., in PQL I don't believe the
user needs to specify the output columns at all.

Paragraph 5.  What happens if a column value contains both a comma and
double quote?

2.9.1
Should probably make clear that these will return HTTP code 200 even
when there is an error.  [I think...]

3.

"The resource document"  -> "A resource document"

There might be more than one (in different registries) or none.

4.

Not clear what "Generic client must not ..." means.  Were you going to
come and beat me over the head?  I think this sentence should be
deleted.  It's meaning is conveyed later.

5.1.  I believe a short tabular summary of this is appropriate in 2.2.2.
[Or maybe this whole section belongs there.]  Section 2.2.2 and this are
coupled at the hip.

7.  I like having this section and since it's here I think we should use
URLs and URL fragments of multiple parameters, much less in the
preceding document.  I'd call it an appendix though.  It should have
nothing to do with TAP per se.  As written it tends to repeat some of
the requirements that have been made earlier though with a slightly
different spin.

Table 1.

I'm not clear if the last line of this table refers to the usage in
parameters defined in TAP and the TAP lnaguage document or to something
else.  I.e., I'm unaware of  HTTP 'list directed' parameters.  If this
is pulling in the special usage of these characters in TAP, PQL and
ADQL, then this doesn't belong here at all and perhaps the table should
go entirely.  Maybe it would be better to explicitly list the characters
that are allowed unescaped in parameter values.  There aren't many
beyond [A-Za-z0-9] and then discuss escaping all of the rest.

7.1.3, 7.1.4

The language at the beginning of 7.1.3 and 7.1.4 is not parallel so I'm
not clear if I need to support POST if I don't support file upload.  In
general I think this chapter would do better as an appendix with the
Must and such earlier on in section 2 and 3.

7.2

I think the requirements on the response should be made earlier and this
should just be an appendix.