TAP1.0 Comments

Douglas Tody dtody at nrao.edu
Thu Jul 16 15:43:59 PDT 2009


Hi -

It would certainly be possible to add something like this to the GDS
query when we add this to PQL.  However, before doing so we should
think more carefully if this is really a good idea and have a clear
model in mind.  In any general mechanism like GDS it is not so simple.
So for example if we did a query and told the service to return
all the data links in extensions linked to the main query response
table, some of those data links might point to large images or large
instrumental datasets and we would have a problem.  In other scenarios
where we query tables, if the output is not a table we cannot easily
chain queries together, using the output of one query as the input
of other.  As Markus also suggested, since there are issues here which
need more serious thought and we cannot afford to have TAP 1.0 drag on
forever, thought this should be considered only for a later version.
TAP tends to keep growing in scope and complexity which is a warning
sign for any project of this type.

In the meantime it could be prototyped or provided with an aggregator
service and we could also try doing the same thing other ways,
e.g., as a series of smaller queries, and compare the approaches.
The Vizier table collection could be a very nice use case to try
out a prototype of the GDS query, and it could be interesting to get
GDS/Obs metadata describing the tables.

 	- Doug


On Thu, 16 Jul 2009, Francois Bonnarel wrote:

> I was also wondering about this single table output issue because I had use  cases in
> mind where it could be a great constraint (Observation metadata in the larger extent)
> What would be the price to pay to remove this limitation?
>
> On the other side, I understand from Doug's answers (which I knew allready from private
> discussions) that if we want to keep this mandatory feature of TAP we have to move to a
> "several step query" mode...
> GDS will be the first Step, Data_links table will be the output of the
> second step.
> If it could be possible to have these two tables in the same output with
> ref/key mechanism between both of them it would avoid this.
>
> Regards
> François
>
> Francois Ochsenbein <francois at vizier.u-strasbg.fr> a écrit :
>
> [Cacher les citations]
>
> On Mon, 13 Jul 2009, Francois Ochsenbein wrote:
> First, the question of TAP result in a single *table* : Alberto's
> question is quite right, and I'm afraid the reduction of the
> result to a single table will generate problems for us (vizier)
> and likely for other services. Yes the relational model implies
> that the result of any query is a single table -- but sticking
> to this means that queries like "give me all objects from any
> table this region of the sky" is not possible. Such questions
> however are quite frequent... How to deal with those ? I see
> only the following alternatives if TAP sticks to a single
> output table:
> a. the client asks for tables existing in the service;
>  upon the answer (7896 tables), the client generates
>  7896 queries. Not really realistic :-(
> b. the server creates some kind of minimal common schema
>  between all these tables -- in practice this can only be
>  the position and the table name (i.e. a 3 column table).
>  But then you have to get more details about each result,
>  details concerning data and as well as metadata.
>  Therefore you still have to generate many 'children' queries.
>
> Or should services like vizier give up with TAP ?
> This is an important use case, but not really a conventional (relational)
> table access problem.  It is getting more into the domain of the other
> DAL services which have data models.  Some possible approaches:
>
>    o        For this specific case (find tables with data in some region) PQL
>        could be used since it has a data model.  For example, query
>        TAP_SCHEMA.tables with POS,SIZE or REGION specifying the region
>        of interest.  Other simple constraints could be specified as well.
>
>    o        More generally we could use the Generic Dataset (GDS) query.
>        The GDS (Observation) data model can describe any kind of
>        dataset, including tables (also images, spectra, etc.).  So if
>        Vizier provides a global index table based upon the GDS model
>        it could be queried with either PQL or ADQL in TAP.
>
>    o        A footprint service could also be used, although this is much
>        the same here as a GDS query using REGION.
>
> In both of these cases the response is a single table.        In the first
> case it contains TAP_SCHEMA.tables metadata.  In the second case it
> contains GDS metadata providing a richer description of the tables,
> with the possibility of data links pointing to either the table files
> (if small) or to services which can be used to access the data.
> Doug,
>
> What is the Generic Dataset (GDS) query ? Where is it described ?
> I couldn't find any note or document describing this...
>
> I can't see either how a footprint would solve the problem if you
> are looking in very small regions (e.g. a circle of 5arcsec around
> a position) -- the only footprint I can imagine which could work is
> a union of all the positions contained in the original catalogs;
> otherwise I don't see how your "solutions" differ from my point b. ...
> 2.3.5: it looks strange for me that constraints can be ignored in PQL.
>      If a table is queried with just a contraint on TIME, and there
>      is no time in the table, the fact that this parameter is
>      ignored results in a dump of a (potentially very large) table.
>      Similarly for POS query (section 1.1.5) -- if the table
>      queried has no position, is it really a good solution to
>      return the whole table ? Hopefully this is not possible
>      with ADQL :-)
> Again, I think people misunderstand what was meant by this.  We should
> just remove this from PQL as it is specific to the semantics of SIA/SSA
> whereas PQL is a table query interface.  When querying an actual table
> the semantics want to be precise.   This is different from global data
> discovery in SIA or whatever where the same query is posed to many
> services, each of which may provide a different subset of metadata.
> Precise queries cannot easily be used in such a case, rather we need an
> iterative query which is what the S*AP interfaces provide.
> ===> I was talking of the TAP document where this is written.
>     Should therefore this remark be dropped also from the TAP document ?
> 2.3.8: MTIME -- I still have problems with this. A service may have
>     some tables which have such timestamp columns (typically
>     TAP_SCHEMA tables) while other tables have not this information.
>     I can't therefore see this feature as a service-wide feature,
>     and the MTIME capability would need to be specified in
>     the TAP_SCHEMA (section 2.6.2)
> MTIME is supposed to be a parameter query, hence it need not specify
> how update/delete/add metadata is maintained internally.
> ===> ... but at least it would be important to know for which tables
>     (or none or everyone) this parameter can be meaningful ?
>
> --Francois
> =======================================================================
> Francois Ochsenbein    ------   Observatoire Astronomique de Strasbourg
>   11, rue de l'Universite 67000 STRASBOURG  Phone: +33-(0)390 24 24 29
> Email: francois at astro.u-strasbg.fr (France)    Fax: +33-(0)390 24 24 17
> =======================================================================
>
>
>
> =====================================================================
> Francois   Bonnarel               Observatoire Astronomique de Strasbourg
> CDS (Centre de donnees          11, rue de l'Universite
> astronomiques de Strasbourg)    F--67000 Strasbourg (France)
>
> Tel: +33-(0)3 90 24 24 11       WWW: http://cdsweb.u-strasbg.fr/people/fb.html
> Fax: +33-(0)3 90 24 24 25       E-mail: bonnarel at astro.u-strasbg.fr
> ---------------------------------------------------------------------
>


More information about the dal mailing list