Comments on PQL

Tom McGlynn Thomas.A.McGlynn at nasa.gov
Fri Jul 10 10:54:04 PDT 2009


Hi Doug,

Comments on comments on comments...

	Tom

> On Thu, 9 Jul 2009, Thomas McGlynn wrote:
>> We should not let the WHERE parameter drive the definition of
>> parameters generally.
> 
> It is the other way around actually.  WHERE just adopts the standard
> range-list syntax used elsewhwere in DAL.  We don't have to do this
> if we decide we need something more complex for WHERE, but the
> current syntax addresses the most common queries simply.

I think it's manifest that the current syntax does not match the where 
clause.  E.g., if I want to ask for all sources where the intrument is
HRI, PSPCA or PSPCB and the exposure is greater than 10000, the 
suggested syntax is:

    WHERE=instrument,HRI,PSPCA,PSPCB;exposure,10000/

The list of parameter values for WHERE has the following elements (as 
defined in section 2)
    instrument
    HRI
    PSPCA
    PSPCB;exposure
    10000/

This is contradictory to the BNF described in the WHERE clause which 
associates things differently.  The WHERE clause needs a syntax 
consistent with the rest of the parameters.  The current document is 
suggesting a new syntax with ; and , creating a hierarchy of dividers.

> 
>> - The multicone discussion is unclear.  The coordinate systems
>> supported are not discussed (nor in the standard POS for that matter)
>> and how the positional columns are found is confusing.
...
>> Users should be able to explicitly specify the columns in the uploaded
>> tables where the positions are to be found.
> 
> Yes, but this can be done on the client side when the input multipos
> table is generated.  

No...  If the idea were that we were creating the table being uploaded, 
then we'd not need the complex hierarchy currently described where we 
first check UTYPE, then UCD then NAME, we would simply mandate that the 
VOTable be constructed with whichever of those we chose being set 
properly.  The document implicitly recognizes that a very common use 
case is where we are taking a VOTable that is a result of some other 
action and sending it along to match against.  We cannot mandate that 
users either manually or through a software proxy edit their VOTables. 
Note that I am not suggesting that we cannot use UTYPEs or UCDs, rather 
that there needs to be an option for explicit specification.

>... Other issues such as conversion of CSV or whatever
> to VOTable can be addressed that way as well.  In general if we are
> uploading tables some preparation of the input table is going to be
> required in any case.  Use of metadata to identify the input columns
> (default columns anyway) is attractive as this allows the operation
> to be automated.  Once we have such a table it can be used many times
> without having to explicitly specify the columns.
> 
> An alternative would be to use a parameter or set of parameters to
> allow the columns to be explicitly identified.  Metadata, if provided,
> could still be used to automatically identify the default columns.

This is what I am suggesting.  Basically I'm adding one more level to 
the hierarchy determining which column to use: user-specified, utype, 
ucd, name.  I also think name should be deleted from this list.


> 
>> - I believe there is a running confusion in the document regarding
>> the need for escaping characters.  While many characters in PQL defined
>> strings will need to be escaped when PQL parameters are encoded
>> in an HTTP request, that is not part of PQL and need not be discussed
>> here.
> 
> This is a valid point so long as we have a strong separation of the
> logical service interface from the transport (HTTP in this case).
> In general in TAP and PQL we have only partially done this so far.
> 

This document should parallel the ADQL document.  There is no usage of 
the string HTTP or URL in that document except when giving an actual URL 
to a document.  Similarly the only discussion of escaping is in the 
context of escaping SQL keywords and using them as names of variables.

>> The only character that might need to be escaped within
>> PQL itself is ',' (and possibly '/' if we allow range searches
>> in strings).  I'd prefer a backslash quoting for these if we decide
>> we need it since otherwise there are multiple levels of URL encoding
>> required in sending and receiving messages.
> 
> The issues of URL encoding and escaping (quoting) characters within
> a WHERE clause are distinct.  An escape mechanism is needed to
> include metachacters within strings.  Quoting is needed as well,
> e.g., to force case sensitive treatment of strings or substrings.

I don't believe there is any need for quoting, or more precisely we need 
either quoting or an escape mechanism for characters that are special 
within PQL but not both.

Independently there is a general question of whether text matches are 
case sensitive or not.  Personally I like the idea that seems implicit 
in the current examples but should be made explicit that string matches 
are case sensitive unless they contain wildcards in which case they are 
case insensitive.  That corresponds to standard SQL '=' and LIKE 
operations on strings.

> The simplest thing is to use the same quoting mechanism for both cases,
> which is what the document currently proposes.
>

There are no two cases.  HTTP quoting/esaping is done by an entirely 
different level of the software.  In practice I find that it is better 
to define indpendent quoting/escaping techniques for different 
protocols.  [E.g., note that ADQL quoting is different from HTTP.]

>> - I strongly disagree with the behavior suggested in section 2.8.
> 
> (This refers to ignoring data model-based query constraints that don't
> apply to the data being queried).  I agree that this probably does
> not need to be in this document, and is probably hard to understand
> without more context.  But what it describes is how DAL queries such
> as SIA and SSA have worked for years.  For example if we query for
> images by POS and BAND but no information on the spectral band is
> available, the query ignores the BAND constraint without error,
> leaving it to the client to refine the query.  General discovery
> queries are not the same as table data queries and tend to err on
> the side of including candidate datasets.  If we did not do this we
> could not pose the same query to 100 services.  But it is not how
> one would normally want to query a simple data table.

Rereading this I am less concerned, but I think the placement in the 
document is very confusing.

My original interpretation of this statement was that

     WHERE=exposure,10000/ (or WHERE=exposure=10000/)

would be ignored if a the table did not have a where field, but I now 
see that that is not the case.

I think this should be removed from here and placed in section 4.4. 
That is where it is relevant.  I found it quite confusing in section 2.8.

> 
> Basically this is a good example of specific parameter semantics that
> cannot easily be generalized and are better left to the specification
> of an individual interface.  It is a carry over from an attempt to
> generalize PQL and should probably be removed in the next draft.
> 
>> - The parameter qualifier syntax seems to have no real purpose and doesn't
>> seem to be needed. An additional parameter could be used to specify
>> the coordinate system.  I'd find this much cleaner.
>> [I saw no other uses of qualifiers.]
> 
> This is how POS,SIZE, BAND, etc. work in SSA; one can specify a
> qualifer in some of these cases.  Again, this is PQL attempting to
> generalize specific syntax and semantics of existing DAL interfaces.
> I agree that this is confusing if one does not understand where it is
> coming from and we should clean some of this up in the next draft.
> In general this level of detail is best left to the individual
> service interfaces.
> 
I think it's probably a bad idea there too but that may be water under 
the bridge.  If the qualifiers are to be used then a set of cardinality 
significantly > 1 needs to be explicitly enumerated in a table giving 
the valid strings for each qualifier [by reference to another document a 
la the coordinate strings is fine], the meanings of each and the 
parameters they may qualify.

I'm guessing that qualifiers are supposed to answer a need when there 
are multiple pieces of information that are linked together.  So, e.g., 
the position and coordinate system are tightly coupled, so it seemed 
like a good idea that they should be specified in a single string.  This 
becomes a serious issue only when there are multiple instances of the 
coupled sets, e.g., several positions.  As far as I can tell -- from the 
current document -- there are no instances of this except one 
abomination which I assume is not intended to be supported:
     POS=1.2;GALACTIC,2.3;ICRS

Note that the POS/SIZE combination is quite analagous but uses a style 
that I advocate for all such cases.

Maybe later versions of PQL will have significant justification for 
qualifiers, but this version does not seem to.

>  	- Doug
> 



More information about the dal mailing list