TAP, automated site monitoring, and gzip encoding.

Tom McGlynn Thomas.A.McGlynn at nasa.gov
Tue Jul 5 08:00:06 PDT 2011


Just to bring people up to date...

The problem was that a standard TAP metadata query/response looked 
like a SQL injection attack and triggered flags in GSFC security monitors.

We changed to using the STREAM/BINARY encoding in the VOTables used in 
our TAP interface (suggested by Mark) and so far this seems to be 
satisfying our security types.  [I don't want to get into a discussion 
of what should or should not trigger such flags.]

There was some discussion that we should simply assume that we are 
going to have to be responsible for our own security and tell our 
security monitors to ignore everything from our TAP server.  Certainly 
it's the case that we need to do our best to ensure that there are no 
security holes in our TAP interface.  If we had to do so, we could 
have gone this way.  However an independent layer of checking is 
something I don't want to forego if I don't have to.  There are lots 
of hackers out there and I daresay many are smarter and certainly more 
versed in the holes in our database's security than I.  So I'm hopeful 
that our format change will enable our security scanners  to continue 
monitoring our services without burdening them with large numbers of 
false intrusion detections.

With regard to encoding....  I'd originally thought to use 
transfer-encoding rather than content-encoding since my rather vague 
understanding is that transfer-encoding is something that clients 
aren't supposed to see, while content-encoding is not.  However it's 
not clear that gzip is really meant to be used as a transfer encoding 
in any case.  Transfer-encoding seems to be something envisaged for 
chunked downloads.

I'm a little confused by Mark's quote from the SSA standard, since the 
compress keyword seems to be duplicating the role of the 
Accept-encoding header at the HTTP level.  I'd agree that some overall 
strategy that addresses all of the DAL interfaces would be desirable. 
  Personally I'd suggest that we recommend/require support for some 
level of compression using the standard HTTP protocols and not add 
anything to the DAL protocols themselves.

Tom

Douglas Tody wrote:
> Right - we distinguished between compression of the dataset itself and
> compression as used in the transport protocol.  HTTP already supports
> the latter and ideally the client and server would both support stream
> compression.  But of course it is optional (where we really need this is
> to speed up feeding large text VOTables back to the client).  If
> security is the main issue it might be better to require an
> authenticated (HTTPS) connection.  Or just limit the TAP implementation
> and client connection to data which could not be compromised by any
> amount of SQL trickery.
>
>   	- Doug
>
>
> On Mon, 4 Jul 2011, Mark Taylor wrote:
>
>> On Thu, 30 Jun 2011, Tom McGlynn wrote:
>>
>>> One solution that I had hoped might work was to use a GZIP transfer encoding
>>> (or content encoding) for the query results.  Unfortunately it doesn't look
>>> like clients currently note the HTTP encoding headers.
>>>
>>> NASA is probably a bit more paranoid about this than some, but I suspect that
>>> this will become a more common issue as time goes on.
>>> Support for content or transfer encoding is an HTTP level issue so I don't
>>> think it requires any change to the TAP standard, just clients that look for
>>> the appropriate HTTP headers.  Would it be reasonable to request that clients
>>> support gzip encoding?  In addition to address this security issue I suspect
>>> this would generally substantially decrease the size of downloaded data and
>>> make our queries more responsive.
>>>
>>> 	Tom McGlynn
>>
>> FWIW, although TAP does not address this, the SSA standard
>> (PR-SSA-1.1-20110417) does discuss compression in section 7.3:
>>
>>    7.3 Data Compression
>>
>>    If the query parameter COMPRESS is present then the service may return
>>    a compressed dataset, using some standard compression technique such
>>    as gzip, in place of a normal dataset, without indicating this in the
>>    query response. Basically the client is indicating that it is prepared
>>    to receive either compressed or uncompressed datasets and does not
>>    care which is delivered (the service should pick whichever is more
>>    efficient). This should be distinguished from protocol-level compression,
>>    which is transparent to the client, and may occur at the level of the
>>    HTTP protocol if both client and server support HTTP protocol compression.
>>
>>    In case of an HTTP GET the keyword Content-Encoding informs the receiver
>>    about the encoding of the output data, and should have a value such as
>>    gzip. Note that the encoding is distinct from the MIME-type (Content-Type)
>>    of the returned data object.
>>
>> the tone seems to suggest that Content-Encoding is something that
>> clients might (but not MUST) be expected to do as a matter of course.
>>
>> Probably DALI ought to say what the general assumption is for DAL
>> services about content- and/or transfer-encoding.
>>
>> Mark
>>
>> --
>> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
>> m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
>>



More information about the dal mailing list