TAP, automated site monitoring, and gzip encoding.

Douglas Tody dtody at nrao.edu
Tue Jul 5 14:39:01 PDT 2011


On Tue, 5 Jul 2011, Tom McGlynn wrote:

> Just to bring people up to date...
>
> The problem was that a standard TAP metadata query/response looked like a SQL 
> injection attack and triggered flags in GSFC security monitors.
>
> We changed to using the STREAM/BINARY encoding in the VOTables used in our 
> TAP interface (suggested by Mark) and so far this seems to be satisfying our 
> security types.  [I don't want to get into a discussion of what should or 
> should not trigger such flags.]

Since probably few clients will be able to handle such as response this
is probably not the best solution, although technically legal I guess.
It would be better to address the real problem (smarter security
checking) as someone else suggested.

> I'm a little confused by Mark's quote from the SSA standard, since the 
> compress keyword seems to be duplicating the role of the Accept-encoding 
> header at the HTTP level.  I'd agree that some overall strategy that 
> addresses all of the DAL interfaces would be desirable.  Personally I'd 
> suggest that we recommend/require support for some level of compression using 
> the standard HTTP protocols and not add anything to the DAL protocols 
> themselves.

Note that we are discussing this very same issue again right now on the
DM list, in connection with ObsTAP.  ObsTAP (ObsCore) describes the file
formats of archive data products, independently of the particular
transport protocol used for any subsequent data access, e.g., HTTP, FTP,
whatever.  Support is included to describe the compression type if used
(for Rob's benefit this can include the astronomy-specific compression
schemes).  The query capabilities of the DAL protocols used for data
discovery and description are higher level and have nothing to do with
any possible stream compression or accept-type file format capabilities
of the particular lower level transport protocol used.

 	- Doug


> Douglas Tody wrote:
>> Right - we distinguished between compression of the dataset itself and
>> compression as used in the transport protocol.  HTTP already supports
>> the latter and ideally the client and server would both support stream
>> compression.  But of course it is optional (where we really need this is
>> to speed up feeding large text VOTables back to the client).  If
>> security is the main issue it might be better to require an
>> authenticated (HTTPS) connection.  Or just limit the TAP implementation
>> and client connection to data which could not be compromised by any
>> amount of SQL trickery.
>>
>>   	- Doug
>> 
>> 
>> On Mon, 4 Jul 2011, Mark Taylor wrote:
>> 
>>> On Thu, 30 Jun 2011, Tom McGlynn wrote:
>>> 
>>>> One solution that I had hoped might work was to use a GZIP transfer 
>>>> encoding
>>>> (or content encoding) for the query results.  Unfortunately it doesn't 
>>>> look
>>>> like clients currently note the HTTP encoding headers.
>>>> 
>>>> NASA is probably a bit more paranoid about this than some, but I suspect 
>>>> that
>>>> this will become a more common issue as time goes on.
>>>> Support for content or transfer encoding is an HTTP level issue so I 
>>>> don't
>>>> think it requires any change to the TAP standard, just clients that look 
>>>> for
>>>> the appropriate HTTP headers.  Would it be reasonable to request that 
>>>> clients
>>>> support gzip encoding?  In addition to address this security issue I 
>>>> suspect
>>>> this would generally substantially decrease the size of downloaded data 
>>>> and
>>>> make our queries more responsive.
>>>>
>>>> 	Tom McGlynn
>>> 
>>> FWIW, although TAP does not address this, the SSA standard
>>> (PR-SSA-1.1-20110417) does discuss compression in section 7.3:
>>>
>>>    7.3 Data Compression
>>>
>>>    If the query parameter COMPRESS is present then the service may return
>>>    a compressed dataset, using some standard compression technique such
>>>    as gzip, in place of a normal dataset, without indicating this in the
>>>    query response. Basically the client is indicating that it is prepared
>>>    to receive either compressed or uncompressed datasets and does not
>>>    care which is delivered (the service should pick whichever is more
>>>    efficient). This should be distinguished from protocol-level 
>>> compression,
>>>    which is transparent to the client, and may occur at the level of the
>>>    HTTP protocol if both client and server support HTTP protocol 
>>> compression.
>>>
>>>    In case of an HTTP GET the keyword Content-Encoding informs the 
>>> receiver
>>>    about the encoding of the output data, and should have a value such as
>>>    gzip. Note that the encoding is distinct from the MIME-type 
>>> (Content-Type)
>>>    of the returned data object.
>>> 
>>> the tone seems to suggest that Content-Encoding is something that
>>> clients might (but not MUST) be expected to do as a matter of course.
>>> 
>>> Probably DALI ought to say what the general assumption is for DAL
>>> services about content- and/or transfer-encoding.
>>> 
>>> Mark
>>> 
>>> --
>>> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
>>> m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
>>> 
>


More information about the dal mailing list