DALI comments
Tom McGlynn (NASA/GSFC Code 660.1)
tom.mcglynn at nasa.gov
Mon Feb 29 21:46:31 CET 2016
While I'm often agnostic on the merits of most of Markus' recent
comments on the DALI specification, they have at least prompted me to
read the document myself. I've got a number of editorial comments on
the document. Given the central role of this document for accessing all
VO data, its clarity is particularly important. These comments refer to
version 1.1 of the document.
[When I started this was just going to be two or three points, but it
grew in the writing. Hope this is helpful. The points are in the
document order pretty much.]
Tom
1 .The introductory sentence is very nice: the document is going to
describe resources, parameters and responses. The next three sections
deal with each of these in turn. The rest of the intro is IVOA
goobledygook but that's Ok. This structure is simple and easy to
follow. I wish more of our standards has such a straightforward
organization.
2. However I think that there is something missing in either this
paragraph on in the beginning of each of the three following sections: a
clear definition of what a resource, parameter or response are.
This is particularly acute for resource. The intro in section 2 talks
about REST and jobs, but this is jumping to the implementation before we
set any context. E.g., Section 2 might begin:
--
DAL services are implemented using as a set of resources on the web.
DAL services use HTTP protocols to support their communications and each
service must implement multiple URL endpoints to access these
resources. This section describes the structure and relationships
between the required and optional endpoints that are provided by a
service. E.g., all services must implement a URL which allows a user to
ask if a service is available and may implement an endpoint that allows
for asynchronous access. The conventions described below allow a user to
infer the URL for each endpoint from some base location.
--
Note that this makes it clear that DAL is an HTTP based protocol and
introduces URLs. The current words are very nebulous. If someone is
not already clear what's going on, I don't think the words help.
3. I think the discussion of the resources available should be cleaner
if it really was limited to a discussion of the resources. E.g., the
vast bulk of sections 2.n is spent discussing the responses to
invocations of resources. This belongs in section 4.
4. I think there should be a very explicit definition of a job in terms
of resources. This is pretty subtle and should not be done en passant
in the introductory paragraph of section 2. E.g., I think something like:
--
The invocation of some resources may define a job within that
service. Once a job is created new resources may be available to
monitor or modify the actions of the service. E.g., a service may allow
users to create an asynchronous query as a job. The response to the job
creation resource will normally include an identifier for the job which
is used in the specification of resources to monitor, cancel or get the
results of the query. The resources associated with a specific job will
normally have the job identifier as part of the URL. [I'd include a
specific example of a job creation and then subsequent job resources
that are available.]
--
Again the idea is to start with the three things we're going to talk
about and build up from them.
5. Keeping this restricted to the actual calls would more clearly
expose the structure of DALI than the current words which hide the
structure in a clutter of details of the responses.
6. The discussion of parameters has some issues too... E.g., why do we
start with what purports to define a DALI job when what we need to
define is a parameter. We need to define things in terms of the three
elements we're talking about in this section: resources, parameters, and
responses. So I'd start with something like:
--
A parameter is a key-value pair that is passed to a resource to
control the response of the resource. When a user creates a job, the
parameters for the job may be specified in either the the initial
job-creating resource invocation or in subsequent calls to job resources
if this is supported by the DAL protocol for the service.
--
7. We should be specific about how we pass in parameters. We are are
assuming the standard CGI formats when we use the '=' notation latter
on. So just call it out. [I don't know if there is a formal reference
for this but it should be referenced if appropriate.] If you don't
want to assume this structure for passing in parameters, at least note
that this is one way we can send them in.
--
Most DAL service use standard web conventions for passing parameters.
Parameters using the standard URL encoding use a
key=value
syntax where the usual encoding rules for the key and value are observed
when the value is used within a URL or a POST stream. Other encodings
are possible and mutlipart-form encoding is mandatory in any resource
invocation which involves a file upload. However in this this document
we conventionally display the parameters using the unencoded key=value
string even though other encodings may be supported.
--
8. Section 3.3 does not belong in section 3. It very explicitly notes
that it discusses values in both parameters and responses. If so it
needs to go in a new section "5. Literal values". Section 5 should
then be explicitly referenced in sections 3 and 4.
9. I think that you do readers a disservice by not being more explicit
in defining how integers and real numbers are to be represented and
using a obscure reference. Not even a link in the version I'm seeing.
Reading this I've not idea if octal or hex numbers are supported. Is
exponential notation? Can I use e and E in the exponent. At least give
the use a taste of valid formats. As always examples are good! Examples
of what's not supported too.
10. Section 3.3.2 is self-contradictory. First it states that all data
and time values must be represented using the ISO-like FITS format.
Then later it has "where values may be expressed using Julian dates."
But by the first sentence that is never. I'm not quite sure what
is meant here.
11. In 3.3.2 I think it would be helpful to clarify if the boolean
values are case sensitive. E.g., can I use TRUE or just true?
12. I can't follow what 3.4.2 is saying.
13. I find the discussion of VOTable encodings a bit out of place but
I'm not sure why. I wonder if this structure is causing some
contortions. If we separate the parameter encoding and the VOtable
encodings, then we've no problem allowing a single values numeric
parameter to be interpreted as a range. I.e., user specifies
band=1
which get's interpreted as band=1 1 so that the current discussion (in
other DAL email) of array=2 versus array=2* is moot.
14. Aaargh. I do have a substantive comment. The discussion of 3.4.4
allows MAXREC to fail because there is no data matching the request (as
I read it) or to succeed regardless. I don't think this is right.
Either it should be a way of getting the metadata regardless of whether
there is matching data, or it should always check that there is matching
data and use the overflow indicator to indicate whether any data was
found (i.e., if there would be data, then the overlow is set to yes).
15. The discussion in 3.4.5 is confusing. We first say we would have
this parameter
UPLOAD=table3,param:t3
and this content where we then put multipart-form data in great detail.
However the UPLOAD parameter will also have been
encoded these way, not as a simple key=value string. So I'd add some
caveat like:
--
Note that in this case the UPLOAD parameter would also be encoded
using the multipart/form-data encoding but we
have presented it as a simple key/value value pair.
--
... or just note that you're using multipart form endcoding and that
given the UPLOAD parameter above the CGI parameter name
for the file upload should be t3.
16. The first sentence is section 4 jumps a little too quickly to
implementation. I'd suggest something like:
--
The output of the resource invocation is returned as a response using
the HTTP protocol. The response indicates the status of the request.
It either directly provides the requested information or directs the
user on how and where to find the desired data.
E.g.,the response to an availability request will include whether the
service is ready, while a request to initiate an asynchronous query
typically returns a job ID that can be used to construct resource URLs
to monitor the progress of the job and eventually retrieve the output.
In HTTP terms, DAL service responses can be of three types..."
--
17. I think much of the discussion in section 2 discussing the
responses to requests belongs in section 4.
18. I'd suggest the Content-type be mandatory. Not sure I care about
the others http headers.
19. The discussion of OVERFLOW in makes handling it more complex since
we have to handle two cases rather than
the one used in TAP. Not sure how it helps.
More information about the dal
mailing list