Table names, VOSI vs. TAP_SCHEMA

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Mon May 4 13:44:42 CEST 2015


Gilles,

On Mon, May 04, 2015 at 11:01:03AM +0200, gilles landais wrote:
> However, this ambiguity could be avoided with removing the schema name of
> the table_name (my implementation which concatenate the schema_name and the
> table_name  was not a good idea).

Well, it was not a good idea because what came out is not a valid
ADQL table reference.  Other than that, I think the TAP author's
solution to the underlying problem (enumerate all tables that can be
referenced in queries to this service) was sound, and splitting
things up is asking for trouble in the presence of DBs running
primarily on catalogs rather than schemas and other DBs that do
namespacing tricks to hide schema names from users.

Hence I believe that VODataservice -- your description:

> The XML tapschema  describes the schema in a XML node containing the schema
> name and the tables of the schema. So it is not needed to give the schema
> name again in the table_name.
> 
> <schema>
> <name>viz4</name>
>      ....
> <table type="base_table">
> <name>B/avo.rad/catalog</name>
> ....
>      </table>
> ....
> </schema>

-- is asking for trouble.  I (and, for dramatic effect, I'm now
putting on my Registry chair hat) believe the VODataService must be
fixed here, both for proper support of delimited identifiers and for
effortless referencing of tables regardless of naming schemes.

> Note: the subdivision in VizieR (viz2,viz3,vizA,...) is not very clear and
> is the result of technicals subdivision without any logic for
> astronomers..(i could update that..)

I'm sure everyone would like that.  That's already one example of why
we should have ready-made names in VODataService's table/name: If you
put in an empty string to schema in that situation, are clients
expected  to construct table names as ".tablename" or just keep it
"tablename"?                           ^ that's the empty schema name

Cheers,

       Markus



> 
> In any case, i would like to improve the TAPVizieR service and so, i am open
> to any proposals.
> 
> Gilles Landais
> 
> 
> 
> 
> On 27/04/2015 17:57, dal-request at ivoa.net wrote:
> >Send dal mailing list submissions to
> >	dal at ivoa.net
> >
> >To subscribe or unsubscribe via the World Wide Web, visit
> >	http://mail.ivoa.net/mailman/listinfo/dal
> >or, via email, send a message with subject or body 'help' to
> >	dal-request at ivoa.net
> >
> >You can reach the person managing the list at
> >	dal-owner at ivoa.net
> >
> >When replying, please edit your Subject line so it is more specific
> >than "Re: Contents of dal digest..."
> >
> >
> >Today's Topics:
> >
> >    1. Re: What does TOP mean ? (Arnold Rots)
> >    2. table_name syntax (Mark Taylor)
> >    3. Re: What does TOP mean ? (Walter Landry)
> >    4. Re: What does TOP mean ? (Markus Demleitner)
> >    5. Re: What does TOP mean ? (Arnold Rots)
> >    6. Re: What does TOP mean ? (Douglas Tody)
> >
> >
> >----------------------------------------------------------------------
> >
> >Message: 1
> >Date: Mon, 27 Apr 2015 09:43:46 -0400
> >From: Arnold Rots <arots at cfa.harvard.edu>
> >To: DAL mailing list <dal at ivoa.net>
> >Subject: Re: What does TOP mean ?
> >Message-ID:
> >	<CAJXToE_roWrKwCDyC9cBgQxcdThLrUvoQkqvikhp2wC5B9sU6A at mail.gmail.com>
> >Content-Type: text/plain; charset="utf-8"
> >
> >Yes, you need an observatory-specific model to calculate a score.
> >By its very nature, therefore, it will only be a relative score.
> >Example:
> >For Chandra we implemented a score that is based on instrument, exposure
> >time, and off-axis angle
> >(PSF degrades with increasing off-axis angle).
> >
> >   - Arnold
> >
> >-------------------------------------------------------------------------------------------------------------
> >Arnold H. Rots                                          Chandra X-ray
> >Science Center
> >Smithsonian Astrophysical Observatory                   tel:  +1 617 496
> >7701
> >60 Garden Street, MS 67                                      fax:  +1 617
> >495 7356
> >Cambridge, MA 02138
> >arots at cfa.harvard.edu
> >USA
> >http://hea-www.harvard.edu/~arots/
> >--------------------------------------------------------------------------------------------------------------
> >
> >
> >On Mon, Apr 27, 2015 at 3:25 AM, Markus Demleitner <
> >msdemlei at ari.uni-heidelberg.de> wrote:
> >
> >>Hi Petr, hi SSA crowd,
> >>
> >>On Sat, Apr 25, 2015 at 01:38:26AM +0200, Petr Skoda wrote:
> >>>And does anybody implemented TOP in scoring manner in SSAP ?
> >>Well, DaCHS accepts TOP but treats it equivalently to MAXREC so far
> >>(which I claim is legal, as that simply means I give all matching
> >>results the same score).
> >>
> >>The basic difficulty in doing a meaningful implementation is that you
> >>need to compute a score, and this kind of scoring in general is
> >>either difficult (because to do it properly you need a statistical
> >>model of both the data and the user) or haphazard (involving
> >>combining more or less ad-hoc measures for how good a match is for a
> >>certain constraint weighted... somehow).
> >>
> >>I think it would be a moderately worthwhile effort to create a model
> >>of either kind for SSAP's parameter set and validate it with user
> >>studies.  Has anyone perhaps already started on such a thing?
> >>
> >>My suspicion is that this model could work well across a multitude of
> >>services and domains with no or just a single parameter.  Be that as
> >>it may, I'm afraid I don't see GAVO tackling something like that any
> >>time soon.
> >>
> >>Cheers,
> >>
> >>         Markus
> >>
> >>
> >-------------- next part --------------
> >An HTML attachment was scrubbed...
> >URL: <http://mail.ivoa.net/pipermail/dal/attachments/20150427/d6f1339f/attachment-0001.html>
> >
> >------------------------------
> >
> >Message: 2
> >Date: Mon, 27 Apr 2015 15:00:31 +0100 (BST)
> >From: Mark Taylor <m.b.taylor at bristol.ac.uk>
> >To: dal at ivoa.net
> >Subject: table_name syntax
> >Message-ID:
> >	<alpine.LRH.2.11.1504271454250.9069 at andromeda.star.bris.ac.uk>
> >Content-Type: TEXT/PLAIN; charset=US-ASCII
> >
> >Hi DAL,
> >
> >I have an arcane query about how to represent and interpret TAP
> >service table names that appear in
> >    (a) the table_name column of the TAP_SCHEMA.tables table
> >    (b) tableset/schema/table/name elements in a TableSet document
> >(I'm assuming the answer is the same for both unless somebody says
> >different)
> >
> >TAP v1.0 sec 2.6.2 says:
> >
> >    The value of the table_name should be the string that is
> >    recommended for use in querying the table; it may or may not be
> >    qualified by schema and catalog name(s) depending on the implementation
> >    requirements.  The fully qualified table name is defined by the
> >    ADQL language and follows the pattern [[catalog.]schema.]table.
> >
> >My question is: if the catalog, schema or table parts of the table_name
> >do not match ADQL's identifier syntax, must they be quoted as delimited
> >identifiers?
> >
> >For many TAP services this is probably not an issue, but it sure is
> >for TAPVizieR, where table names usually contain the "/" character
> >and sometimes other non-identifier characters too.  For instance,
> >(http://tapvizier.u-strasbg.fr/beta/TAPVizieR/tap):
> >
> >    select top 3 schema_name, table_name from tap_schema.tables
> >           where schema_name='viz4'
> >
> >gives
> >
> >    +-------------+-------------------+
> >    | schema_name | table_name        |
> >    +-------------+-------------------+
> >    | viz4        | B/avo.rad/catalog |
> >    | viz4        | B/avo.rad/wsrt    |
> >    | viz4        | B/bax/bax         |
> >    +-------------+-------------------+
> >
> >Should it instead give
> >
> >    +-------------+---------------------+
> >    | schema_name | table_name          |
> >    +-------------+---------------------+
> >    | viz4        | "B/avo.rad/catalog" |
> >    | viz4        | "B/avo.rad/wsrt"    |
> >    | viz4        | "B/bax/bax"         |
> >    +-------------+---------------------+
> >?
> >
> >I initially thought the answer to this was no.
> >
> >But if that's the case, how do I tell[*] whether the
> >first entry in the result above is
> >table "B/avo.rad/catalog" from an unnamed schema (which it is) or
> >table "rad/catalog" from schema "B/avo" (which it's not).
> >
> >If the answer is yes, then (a) TAPVizier and possibly some other
> >services will need to change their content to comply, and
> >(b) what is the rule for other TAP_SCHEMA columns like
> >schema_name and column_name (and others?)?  Quoting these columns
> >would be unnecessary (since there is no possibility of delimited
> >parts in this case), but it would seem inconsistent to require
> >quoting for some of these metadata columns and not others;
> >at least it should be documented explicitly.
> >
> >Thanks!
> >
> >Mark
> >
> >[*] the ugly hack answer is obviously: see if the apparent schema
> >is the same as the schema_name column.  This would give you an
> >almost-certainly-correct indication of what you're looking at,
> >but it's fiddly, inelegant and not bulletproof.
> >
> >--
> >Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
> >m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
> >
> >
> >------------------------------
> >
> >Message: 3
> >Date: Mon, 27 Apr 2015 07:29:08 -0700 (PDT)
> >From: Walter Landry <wlandry at caltech.edu>
> >To: dal at ivoa.net
> >Subject: Re: What does TOP mean ?
> >Message-ID: <20150427.072908.190108314094379184.wlandry at caltech.edu>
> >Content-Type: Text/Plain; charset=us-ascii
> >
> >Markus Demleitner <msdemlei at ari.uni-heidelberg.de> wrote:
> >>Hi Petr, hi SSA crowd,
> >>
> >>On Sat, Apr 25, 2015 at 01:38:26AM +0200, Petr Skoda wrote:
> >>>And does anybody implemented TOP in scoring manner in SSAP ?
> >>Well, DaCHS accepts TOP but treats it equivalently to MAXREC so far
> >>(which I claim is legal, as that simply means I give all matching
> >>results the same score).
> >I thought that the difference between MAXREC and TOP is that MAXREC
> >requires an overflow indicator, while TOP would prohibit it.
> >
> >Otherwise, I agree.  We also treat them the same.
> >
> >Cheers,
> >Walter Landry
> >
> >
> >------------------------------
> >
> >Message: 4
> >Date: Mon, 27 Apr 2015 17:03:14 +0200
> >From: Markus Demleitner <msdemlei at ari.uni-heidelberg.de>
> >To: dal at ivoa.net
> >Subject: Re: What does TOP mean ?
> >Message-ID: <20150427150314.GA23328 at ari.uni-heidelberg.de>
> >Content-Type: text/plain; charset=utf-8
> >
> >Hi Arnold,
> >
> >On Mon, Apr 27, 2015 at 09:43:46AM -0400, Arnold Rots wrote:
> >>Yes, you need an observatory-specific model to calculate a score.
> >>By its very nature, therefore, it will only be a relative score.
> >>Example:
> >>For Chandra we implemented a score that is based on instrument, exposure
> >>time, and off-axis angle
> >>(PSF degrades with increasing off-axis angle).
> >Well, yes, that could be part of it, but the way I understand
> >things --
> >
> >   [...] the general idea is that the better a candidate dataset
> >   matches the query, the higher the score it receives.  (1.1, P.27)
> >
> >-- TOP's intended function is essentially like Google's ranking: it
> >gives "how well" a given returned row matches a data set.  Hence, for
> >a given dataset score would be different for different queries, and
> >while a global quality measure might play a role, it certainly
> >wouldn't be expected to dominate the response.  Or am I completely
> >off here?
> >
> >And regarding Walter's interjection:
> >
> >>I thought that the difference between MAXREC and TOP is that MAXREC
> >>requires an overflow indicator, while TOP would prohibit it.
> >Interesting thought -- is it intended to work this way? [it doesn't
> >in DaCHS, and a quick search in the 1.1 specs didn't give me anything
> >pointing in that direction]
> >
> >Cheers,
> >
> >         Markus
> >
> >
> >
> >------------------------------
> >
> >Message: 5
> >Date: Mon, 27 Apr 2015 11:37:48 -0400
> >From: Arnold Rots <arots at cfa.harvard.edu>
> >To: DAL mailing list <dal at ivoa.net>
> >Subject: Re: What does TOP mean ?
> >Message-ID:
> >	<CAJXToE-fKk5XcUe94LmyDu37vFf0fo3-WotqdtEfpz-vy01PQg at mail.gmail.com>
> >Content-Type: text/plain; charset="utf-8"
> >
> >No, it's not global, but applied to the datasets that satisfy the query.
> >Though instrument and exposure time do not change, the off-axis
> >parameter does. In the case of a cone search, for instance, it is
> >the angle between the center of the cone and the direction of the
> >optical axis.
> >
> >   - Arnold
> >
> >-------------------------------------------------------------------------------------------------------------
> >Arnold H. Rots                                          Chandra X-ray
> >Science Center
> >Smithsonian Astrophysical Observatory                   tel:  +1 617 496
> >7701
> >60 Garden Street, MS 67                                      fax:  +1 617
> >495 7356
> >Cambridge, MA 02138
> >arots at cfa.harvard.edu
> >USA
> >http://hea-www.harvard.edu/~arots/
> >--------------------------------------------------------------------------------------------------------------
> >
> >
> >On Mon, Apr 27, 2015 at 11:03 AM, Markus Demleitner <
> >msdemlei at ari.uni-heidelberg.de> wrote:
> >
> >>Hi Arnold,
> >>
> >>On Mon, Apr 27, 2015 at 09:43:46AM -0400, Arnold Rots wrote:
> >>>Yes, you need an observatory-specific model to calculate a score.
> >>>By its very nature, therefore, it will only be a relative score.
> >>>Example:
> >>>For Chandra we implemented a score that is based on instrument, exposure
> >>>time, and off-axis angle
> >>>(PSF degrades with increasing off-axis angle).
> >>Well, yes, that could be part of it, but the way I understand
> >>things --
> >>
> >>   [...] the general idea is that the better a candidate dataset
> >>   matches the query, the higher the score it receives.  (1.1, P.27)
> >>
> >>-- TOP's intended function is essentially like Google's ranking: it
> >>gives "how well" a given returned row matches a data set.  Hence, for
> >>a given dataset score would be different for different queries, and
> >>while a global quality measure might play a role, it certainly
> >>wouldn't be expected to dominate the response.  Or am I completely
> >>off here?
> >>
> >>And regarding Walter's interjection:
> >>
> >>>I thought that the difference between MAXREC and TOP is that MAXREC
> >>>requires an overflow indicator, while TOP would prohibit it.
> >>Interesting thought -- is it intended to work this way? [it doesn't
> >>in DaCHS, and a quick search in the 1.1 specs didn't give me anything
> >>pointing in that direction]
> >>
> >>Cheers,
> >>
> >>         Markus
> >>
> >>
> >-------------- next part --------------
> >An HTML attachment was scrubbed...
> >URL: <http://mail.ivoa.net/pipermail/dal/attachments/20150427/d07e2f2d/attachment-0001.html>
> >
> >------------------------------
> >
> >Message: 6
> >Date: Mon, 27 Apr 2015 09:56:50 -0600 (MDT)
> >From: Douglas Tody <dtody at nrao.edu>
> >To: Markus Demleitner <msdemlei at ari.uni-heidelberg.de>
> >Cc: DAL mailing list <dal at ivoa.net>
> >Subject: Re: What does TOP mean ?
> >Message-ID: <alpine.OSX.1.00.1504270948250.517 at colorado2.tuc.noao.edu>
> >Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
> >
> >The algorithm used to compute the score for a dataset is up to the
> >data center and service, but can include both a best-match term and a
> >quality term (clearly best-match would be more important for an
> >individual query).
> >
> >The main difference between MAXREC and TOP is scoring and sort by score.
> >Usually if TOP is specified MAXREC would not matter as it is likely to
> >be much larger, but MAXREC if specified should still be in effect and
> >could result in an overflow indication.
> >
> >A service could treat TOP the same as MAXREC (all records having the
> >same score), however computing a useful scoring would be preferable.
> >
> >Finally, Google made billions developing just such an algorithm!  It is
> >what made their search service stand out from the others, in the early
> >days.
> >
> >  	- Doug
> >
> >
> >On Mon, 27 Apr 2015, Markus Demleitner wrote:
> >
> >>Hi Arnold,
> >>
> >>On Mon, Apr 27, 2015 at 09:43:46AM -0400, Arnold Rots wrote:
> >>>Yes, you need an observatory-specific model to calculate a score.
> >>>By its very nature, therefore, it will only be a relative score.
> >>>Example:
> >>>For Chandra we implemented a score that is based on instrument, exposure
> >>>time, and off-axis angle
> >>>(PSF degrades with increasing off-axis angle).
> >>Well, yes, that could be part of it, but the way I understand
> >>things --
> >>
> >>  [...] the general idea is that the better a candidate dataset
> >>  matches the query, the higher the score it receives.  (1.1, P.27)
> >>
> >>-- TOP's intended function is essentially like Google's ranking: it
> >>gives "how well" a given returned row matches a data set.  Hence, for
> >>a given dataset score would be different for different queries, and
> >>while a global quality measure might play a role, it certainly
> >>wouldn't be expected to dominate the response.  Or am I completely
> >>off here?
> >>
> >>And regarding Walter's interjection:
> >>
> >>>I thought that the difference between MAXREC and TOP is that MAXREC
> >>>requires an overflow indicator, while TOP would prohibit it.
> >>Interesting thought -- is it intended to work this way? [it doesn't
> >>in DaCHS, and a quick search in the 1.1 specs didn't give me anything
> >>pointing in that direction]
> >>
> >>Cheers,
> >>
> >>        Markus
> >>
> >
> >------------------------------
> >
> >_______________________________________________
> >dal mailing list
> >dal at ivoa.net
> >http://mail.ivoa.net/mailman/listinfo/dal
> >
> >End of dal Digest, Vol 67, Issue 10
> >***********************************
> 

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?



More information about the dal mailing list