TAP, automated site monitoring, and gzip encoding.
Tom McGlynn
Thomas.A.McGlynn at nasa.gov
Tue Jul 5 08:00:06 PDT 2011
Just to bring people up to date...
The problem was that a standard TAP metadata query/response looked
like a SQL injection attack and triggered flags in GSFC security monitors.
We changed to using the STREAM/BINARY encoding in the VOTables used in
our TAP interface (suggested by Mark) and so far this seems to be
satisfying our security types. [I don't want to get into a discussion
of what should or should not trigger such flags.]
There was some discussion that we should simply assume that we are
going to have to be responsible for our own security and tell our
security monitors to ignore everything from our TAP server. Certainly
it's the case that we need to do our best to ensure that there are no
security holes in our TAP interface. If we had to do so, we could
have gone this way. However an independent layer of checking is
something I don't want to forego if I don't have to. There are lots
of hackers out there and I daresay many are smarter and certainly more
versed in the holes in our database's security than I. So I'm hopeful
that our format change will enable our security scanners to continue
monitoring our services without burdening them with large numbers of
false intrusion detections.
With regard to encoding.... I'd originally thought to use
transfer-encoding rather than content-encoding since my rather vague
understanding is that transfer-encoding is something that clients
aren't supposed to see, while content-encoding is not. However it's
not clear that gzip is really meant to be used as a transfer encoding
in any case. Transfer-encoding seems to be something envisaged for
chunked downloads.
I'm a little confused by Mark's quote from the SSA standard, since the
compress keyword seems to be duplicating the role of the
Accept-encoding header at the HTTP level. I'd agree that some overall
strategy that addresses all of the DAL interfaces would be desirable.
Personally I'd suggest that we recommend/require support for some
level of compression using the standard HTTP protocols and not add
anything to the DAL protocols themselves.
Tom
Douglas Tody wrote:
> Right - we distinguished between compression of the dataset itself and
> compression as used in the transport protocol. HTTP already supports
> the latter and ideally the client and server would both support stream
> compression. But of course it is optional (where we really need this is
> to speed up feeding large text VOTables back to the client). If
> security is the main issue it might be better to require an
> authenticated (HTTPS) connection. Or just limit the TAP implementation
> and client connection to data which could not be compromised by any
> amount of SQL trickery.
>
> - Doug
>
>
> On Mon, 4 Jul 2011, Mark Taylor wrote:
>
>> On Thu, 30 Jun 2011, Tom McGlynn wrote:
>>
>>> One solution that I had hoped might work was to use a GZIP transfer encoding
>>> (or content encoding) for the query results. Unfortunately it doesn't look
>>> like clients currently note the HTTP encoding headers.
>>>
>>> NASA is probably a bit more paranoid about this than some, but I suspect that
>>> this will become a more common issue as time goes on.
>>> Support for content or transfer encoding is an HTTP level issue so I don't
>>> think it requires any change to the TAP standard, just clients that look for
>>> the appropriate HTTP headers. Would it be reasonable to request that clients
>>> support gzip encoding? In addition to address this security issue I suspect
>>> this would generally substantially decrease the size of downloaded data and
>>> make our queries more responsive.
>>>
>>> Tom McGlynn
>>
>> FWIW, although TAP does not address this, the SSA standard
>> (PR-SSA-1.1-20110417) does discuss compression in section 7.3:
>>
>> 7.3 Data Compression
>>
>> If the query parameter COMPRESS is present then the service may return
>> a compressed dataset, using some standard compression technique such
>> as gzip, in place of a normal dataset, without indicating this in the
>> query response. Basically the client is indicating that it is prepared
>> to receive either compressed or uncompressed datasets and does not
>> care which is delivered (the service should pick whichever is more
>> efficient). This should be distinguished from protocol-level compression,
>> which is transparent to the client, and may occur at the level of the
>> HTTP protocol if both client and server support HTTP protocol compression.
>>
>> In case of an HTTP GET the keyword Content-Encoding informs the receiver
>> about the encoding of the output data, and should have a value such as
>> gzip. Note that the encoding is distinct from the MIME-type (Content-Type)
>> of the returned data object.
>>
>> the tone seems to suggest that Content-Encoding is something that
>> clients might (but not MUST) be expected to do as a matter of course.
>>
>> Probably DALI ought to say what the general assumption is for DAL
>> services about content- and/or transfer-encoding.
>>
>> Mark
>>
>> --
>> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
>> m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
>>
More information about the dal
mailing list