dal Digest, Vol 67, Issue 10

Tue May 5 13:34:59 CEST 2015

Gilles,

in the immediate future, I would really like to see TAPVizier
providing VOSI and TAP_SCHEMA table metadata corresponding to the 
"identifiers-usable-without-further-modification-in-ADQL" form
that the recent discussion has clarified was the original intent
of TAP.

So both the ADQL query:

   SELECT table_name FROM tap_schema.tables WHERE table_name LIKE '%B/avo.rad/catalog%'

and the VOSI endpoint:

   curl -s 'http://tapvizier.u-strasbg.fr/TAPVizieR/tap/tables' | grep B/avo.rad/catalog

should give you the value:

   "B/avo.rad/catalog"

(including those double quotes).  Currently the ADQL query returns
the unquoted name and the VOSI one gives you the unquoted name
prefixed with the schema name viz4 (that prefixing is not illegal,
but it's unnecessary).  Column names should also be quoted
*where necessary*, i.e. when they contain strange characters
(don't match the ADQL identifier pattern [a-zA-Z][a-zA-Z0-9_]*),
but not otherwise.

As I understand it, that would just be conforming to the original
intent of the TAP standard, i.e. fixing current incorrect usage
in TAPVizier.

If that can be done, I can write and release a version of topcat
that works with TAPVizier without special casing.
Probably(?) that's not a difficult change to make to the service,
but it may have implications for clients that work with it in
its current form (TapHandle?)

(I'd like all other TAP services to do the same thing too, but
TAPVizier is the main one that has huge numbers of strangely-named
tables and columns).

Mark

On Mon, 4 May 2015, gilles landais wrote:

> Concerning the usage of quotes around the VizieR funny names:
> 
> 
> Curenly TAPVizieR provides a XML output (see url
> http://tapvizier.u-strasbg.fr/TAPVizieR/tap/tables) without quotes but
> completed with the schema (ex: table_name=vizls.II/246/out).
> The main difficulties for clients to work with TAPVizieR are the VizieR "funny
> names" (tables, columns in VizieR contains special characters like "/", "." ,
> ...)
> 
> Dave Morris suggested translation function as: (i'm sceptical with this kind
> of function which are not user-friendly)
> select * from translate("II/246/out")
> 
> Mark Taylor talked about some ambiguities when column, tables contains some
> "." (ex: vizb.B/avo.rad/catalog) because it is difficult/impossible to
> separate the shema_name and the table_name.
> TAPVizieR could provide the XML TAP schema with quotes (as suggested by
> M.Taylor).
> This is may be not an ideal solution, but it is a solution for this ambiguity
> and it could simplify the clients implementation when they have to work with
> the VizieR funny names..
> If we choose this way, to have an explicit specification (as proposed by Mark
> Taylor) in the TAP documentation would be appreciated.
> 
> However, this ambiguity could be avoided with removing the schema name of the
> table_name (my implementation which concatenate the schema_name and the
> table_name  was not a good idea).
> The XML tapschema  describes the schema in a XML node containing the schema
> name and the tables of the schema. So it is not needed to give the schema name
> again in the table_name.
> 
> <schema>
> <name>viz4</name>
>      ....
> <table type="base_table">
> <name>B/avo.rad/catalog</name>
> ....
>      </table>
> ....
> </schema>
> 
> Note: the subdivision in VizieR (viz2,viz3,vizA,...) is not very clear and is
> the result of technicals subdivision without any logic for astronomers..(i
> could update that..)
> 
> In any case, i would like to improve the TAPVizieR service and so, i am open
> to any proposals.
> 
> Gilles Landais
> 
> 
> 
> 
> On 27/04/2015 17:57, dal-request at ivoa.net wrote:
> > Send dal mailing list submissions to
> >  dal at ivoa.net
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> > 	http://mail.ivoa.net/mailman/listinfo/dal
> > or, via email, send a message with subject or body 'help' to
> >  dal-request at ivoa.net
> >
> > You can reach the person managing the list at
> >  dal-owner at ivoa.net
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of dal digest..."
> >
> >
> > Today's Topics:
> >
> >     1. Re: What does TOP mean ? (Arnold Rots)
> >     2. table_name syntax (Mark Taylor)
> >     3. Re: What does TOP mean ? (Walter Landry)
> >     4. Re: What does TOP mean ? (Markus Demleitner)
> >     5. Re: What does TOP mean ? (Arnold Rots)
> >     6. Re: What does TOP mean ? (Douglas Tody)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Mon, 27 Apr 2015 09:43:46 -0400
> > From: Arnold Rots <arots at cfa.harvard.edu>
> > To: DAL mailing list <dal at ivoa.net>
> > Subject: Re: What does TOP mean ?
> > Message-ID:
> > 	<CAJXToE_roWrKwCDyC9cBgQxcdThLrUvoQkqvikhp2wC5B9sU6A at mail.gmail.com>
> > Content-Type: text/plain; charset="utf-8"
> >
> > Yes, you need an observatory-specific model to calculate a score.
> > By its very nature, therefore, it will only be a relative score.
> > Example:
> > For Chandra we implemented a score that is based on instrument, exposure
> > time, and off-axis angle
> > (PSF degrades with increasing off-axis angle).
> >
> >    - Arnold
> >
> > -------------------------------------------------------------------------------------------------------------
> > Arnold H. Rots                                          Chandra X-ray
> > Science Center
> > Smithsonian Astrophysical Observatory                   tel:  +1 617 496
> > 7701
> > 60 Garden Street, MS 67                                      fax:  +1 617
> > 495 7356
> > Cambridge, MA 02138
> > arots at cfa.harvard.edu
> > USA
> > http://hea-www.harvard.edu/~arots/
> > --------------------------------------------------------------------------------------------------------------
> >
> >
> > On Mon, Apr 27, 2015 at 3:25 AM, Markus Demleitner <
> > msdemlei at ari.uni-heidelberg.de> wrote:
> >
> > > Hi Petr, hi SSA crowd,
> > >
> > > On Sat, Apr 25, 2015 at 01:38:26AM +0200, Petr Skoda wrote:
> > > > And does anybody implemented TOP in scoring manner in SSAP ?
> > > Well, DaCHS accepts TOP but treats it equivalently to MAXREC so far
> > > (which I claim is legal, as that simply means I give all matching
> > > results the same score).
> > >
> > > The basic difficulty in doing a meaningful implementation is that you
> > > need to compute a score, and this kind of scoring in general is
> > > either difficult (because to do it properly you need a statistical
> > > model of both the data and the user) or haphazard (involving
> > > combining more or less ad-hoc measures for how good a match is for a
> > > certain constraint weighted... somehow).
> > >
> > > I think it would be a moderately worthwhile effort to create a model
> > > of either kind for SSAP's parameter set and validate it with user
> > > studies.  Has anyone perhaps already started on such a thing?
> > >
> > > My suspicion is that this model could work well across a multitude of
> > > services and domains with no or just a single parameter.  Be that as
> > > it may, I'm afraid I don't see GAVO tackling something like that any
> > > time soon.
> > >
> > > Cheers,
> > >
> > >          Markus
> > >
> > >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL:
> > <http://mail.ivoa.net/pipermail/dal/attachments/20150427/d6f1339f/attachment-0001.html>
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Mon, 27 Apr 2015 15:00:31 +0100 (BST)
> > From: Mark Taylor <m.b.taylor at bristol.ac.uk>
> > To: dal at ivoa.net
> > Subject: table_name syntax
> > Message-ID:
> > 	<alpine.LRH.2.11.1504271454250.9069 at andromeda.star.bris.ac.uk>
> > Content-Type: TEXT/PLAIN; charset=US-ASCII
> >
> > Hi DAL,
> >
> > I have an arcane query about how to represent and interpret TAP
> > service table names that appear in
> >     (a) the table_name column of the TAP_SCHEMA.tables table
> >     (b) tableset/schema/table/name elements in a TableSet document
> > (I'm assuming the answer is the same for both unless somebody says
> > different)
> >
> > TAP v1.0 sec 2.6.2 says:
> >
> >     The value of the table_name should be the string that is
> >     recommended for use in querying the table; it may or may not be
> >     qualified by schema and catalog name(s) depending on the implementation
> >     requirements.  The fully qualified table name is defined by the
> >     ADQL language and follows the pattern [[catalog.]schema.]table.
> >
> > My question is: if the catalog, schema or table parts of the table_name
> > do not match ADQL's identifier syntax, must they be quoted as delimited
> > identifiers?
> >
> > For many TAP services this is probably not an issue, but it sure is
> > for TAPVizieR, where table names usually contain the "/" character
> > and sometimes other non-identifier characters too.  For instance,
> > (http://tapvizier.u-strasbg.fr/beta/TAPVizieR/tap):
> >
> >     select top 3 schema_name, table_name from tap_schema.tables
> >            where schema_name='viz4'
> >
> > gives
> >
> >     +-------------+-------------------+
> > | schema_name | table_name        |
> >     +-------------+-------------------+
> > | viz4        | B/avo.rad/catalog |
> > | viz4        | B/avo.rad/wsrt    |
> > | viz4        | B/bax/bax         |
> >     +-------------+-------------------+
> >
> > Should it instead give
> >
> >     +-------------+---------------------+
> > | schema_name | table_name          |
> >     +-------------+---------------------+
> > | viz4        | "B/avo.rad/catalog" |
> > | viz4        | "B/avo.rad/wsrt"    |
> > | viz4        | "B/bax/bax"         |
> >     +-------------+---------------------+
> > ?
> >
> > I initially thought the answer to this was no.
> >
> > But if that's the case, how do I tell[*] whether the
> > first entry in the result above is
> > table "B/avo.rad/catalog" from an unnamed schema (which it is) or
> > table "rad/catalog" from schema "B/avo" (which it's not).
> >
> > If the answer is yes, then (a) TAPVizier and possibly some other
> > services will need to change their content to comply, and
> > (b) what is the rule for other TAP_SCHEMA columns like
> > schema_name and column_name (and others?)?  Quoting these columns
> > would be unnecessary (since there is no possibility of delimited
> > parts in this case), but it would seem inconsistent to require
> > quoting for some of these metadata columns and not others;
> > at least it should be documented explicitly.
> >
> > Thanks!
> >
> > Mark
> >
> > [*] the ugly hack answer is obviously: see if the apparent schema
> > is the same as the schema_name column.  This would give you an
> > almost-certainly-correct indication of what you're looking at,
> > but it's fiddly, inelegant and not bulletproof.
> >
> > --
> > Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
> > m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
> >
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Mon, 27 Apr 2015 07:29:08 -0700 (PDT)
> > From: Walter Landry <wlandry at caltech.edu>
> > To: dal at ivoa.net
> > Subject: Re: What does TOP mean ?
> > Message-ID: <20150427.072908.190108314094379184.wlandry at caltech.edu>
> > Content-Type: Text/Plain; charset=us-ascii
> >
> > Markus Demleitner <msdemlei at ari.uni-heidelberg.de> wrote:
> > > Hi Petr, hi SSA crowd,
> > >
> > > On Sat, Apr 25, 2015 at 01:38:26AM +0200, Petr Skoda wrote:
> > > > And does anybody implemented TOP in scoring manner in SSAP ?
> > > Well, DaCHS accepts TOP but treats it equivalently to MAXREC so far
> > > (which I claim is legal, as that simply means I give all matching
> > > results the same score).
> > I thought that the difference between MAXREC and TOP is that MAXREC
> > requires an overflow indicator, while TOP would prohibit it.
> >
> > Otherwise, I agree.  We also treat them the same.
> >
> > Cheers,
> > Walter Landry
> >
> >
> > ------------------------------
> >
> > Message: 4
> > Date: Mon, 27 Apr 2015 17:03:14 +0200
> > From: Markus Demleitner <msdemlei at ari.uni-heidelberg.de>
> > To: dal at ivoa.net
> > Subject: Re: What does TOP mean ?
> > Message-ID: <20150427150314.GA23328 at ari.uni-heidelberg.de>
> > Content-Type: text/plain; charset=utf-8
> >
> > Hi Arnold,
> >
> > On Mon, Apr 27, 2015 at 09:43:46AM -0400, Arnold Rots wrote:
> > > Yes, you need an observatory-specific model to calculate a score.
> > > By its very nature, therefore, it will only be a relative score.
> > > Example:
> > > For Chandra we implemented a score that is based on instrument, exposure
> > > time, and off-axis angle
> > > (PSF degrades with increasing off-axis angle).
> > Well, yes, that could be part of it, but the way I understand
> > things --
> >
> >    [...] the general idea is that the better a candidate dataset
> >    matches the query, the higher the score it receives.  (1.1, P.27)
> >
> > -- TOP's intended function is essentially like Google's ranking: it
> > gives "how well" a given returned row matches a data set.  Hence, for
> > a given dataset score would be different for different queries, and
> > while a global quality measure might play a role, it certainly
> > wouldn't be expected to dominate the response.  Or am I completely
> > off here?
> >
> > And regarding Walter's interjection:
> >
> > > I thought that the difference between MAXREC and TOP is that MAXREC
> > > requires an overflow indicator, while TOP would prohibit it.
> > Interesting thought -- is it intended to work this way? [it doesn't
> > in DaCHS, and a quick search in the 1.1 specs didn't give me anything
> > pointing in that direction]
> >
> > Cheers,
> >
> >          Markus
> >
> >
> >
> > ------------------------------
> >
> > Message: 5
> > Date: Mon, 27 Apr 2015 11:37:48 -0400
> > From: Arnold Rots <arots at cfa.harvard.edu>
> > To: DAL mailing list <dal at ivoa.net>
> > Subject: Re: What does TOP mean ?
> > Message-ID:
> > 	<CAJXToE-fKk5XcUe94LmyDu37vFf0fo3-WotqdtEfpz-vy01PQg at mail.gmail.com>
> > Content-Type: text/plain; charset="utf-8"
> >
> > No, it's not global, but applied to the datasets that satisfy the query.
> > Though instrument and exposure time do not change, the off-axis
> > parameter does. In the case of a cone search, for instance, it is
> > the angle between the center of the cone and the direction of the
> > optical axis.
> >
> >    - Arnold
> >
> > -------------------------------------------------------------------------------------------------------------
> > Arnold H. Rots                                          Chandra X-ray
> > Science Center
> > Smithsonian Astrophysical Observatory                   tel:  +1 617 496
> > 7701
> > 60 Garden Street, MS 67                                      fax:  +1 617
> > 495 7356
> > Cambridge, MA 02138
> > arots at cfa.harvard.edu
> > USA
> > http://hea-www.harvard.edu/~arots/
> > --------------------------------------------------------------------------------------------------------------
> >
> >
> > On Mon, Apr 27, 2015 at 11:03 AM, Markus Demleitner <
> > msdemlei at ari.uni-heidelberg.de> wrote:
> >
> > > Hi Arnold,
> > >
> > > On Mon, Apr 27, 2015 at 09:43:46AM -0400, Arnold Rots wrote:
> > > > Yes, you need an observatory-specific model to calculate a score.
> > > > By its very nature, therefore, it will only be a relative score.
> > > > Example:
> > > > For Chandra we implemented a score that is based on instrument, exposure
> > > > time, and off-axis angle
> > > > (PSF degrades with increasing off-axis angle).
> > > Well, yes, that could be part of it, but the way I understand
> > > things --
> > >
> > >    [...] the general idea is that the better a candidate dataset
> > >    matches the query, the higher the score it receives.  (1.1, P.27)
> > >
> > > -- TOP's intended function is essentially like Google's ranking: it
> > > gives "how well" a given returned row matches a data set.  Hence, for
> > > a given dataset score would be different for different queries, and
> > > while a global quality measure might play a role, it certainly
> > > wouldn't be expected to dominate the response.  Or am I completely
> > > off here?
> > >
> > > And regarding Walter's interjection:
> > >
> > > > I thought that the difference between MAXREC and TOP is that MAXREC
> > > > requires an overflow indicator, while TOP would prohibit it.
> > > Interesting thought -- is it intended to work this way? [it doesn't
> > > in DaCHS, and a quick search in the 1.1 specs didn't give me anything
> > > pointing in that direction]
> > >
> > > Cheers,
> > >
> > >          Markus
> > >
> > >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL:
> > <http://mail.ivoa.net/pipermail/dal/attachments/20150427/d07e2f2d/attachment-0001.html>
> >
> > ------------------------------
> >
> > Message: 6
> > Date: Mon, 27 Apr 2015 09:56:50 -0600 (MDT)
> > From: Douglas Tody <dtody at nrao.edu>
> > To: Markus Demleitner <msdemlei at ari.uni-heidelberg.de>
> > Cc: DAL mailing list <dal at ivoa.net>
> > Subject: Re: What does TOP mean ?
> > Message-ID: <alpine.OSX.1.00.1504270948250.517 at colorado2.tuc.noao.edu>
> > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
> >
> > The algorithm used to compute the score for a dataset is up to the
> > data center and service, but can include both a best-match term and a
> > quality term (clearly best-match would be more important for an
> > individual query).
> >
> > The main difference between MAXREC and TOP is scoring and sort by score.
> > Usually if TOP is specified MAXREC would not matter as it is likely to
> > be much larger, but MAXREC if specified should still be in effect and
> > could result in an overflow indication.
> >
> > A service could treat TOP the same as MAXREC (all records having the
> > same score), however computing a useful scoring would be preferable.
> >
> > Finally, Google made billions developing just such an algorithm!  It is
> > what made their search service stand out from the others, in the early
> > days.
> >
> >    - Doug
> >
> >
> > On Mon, 27 Apr 2015, Markus Demleitner wrote:
> >
> > > Hi Arnold,
> > >
> > > On Mon, Apr 27, 2015 at 09:43:46AM -0400, Arnold Rots wrote:
> > > > Yes, you need an observatory-specific model to calculate a score.
> > > > By its very nature, therefore, it will only be a relative score.
> > > > Example:
> > > > For Chandra we implemented a score that is based on instrument, exposure
> > > > time, and off-axis angle
> > > > (PSF degrades with increasing off-axis angle).
> > > Well, yes, that could be part of it, but the way I understand
> > > things --
> > >
> > >   [...] the general idea is that the better a candidate dataset
> > >   matches the query, the higher the score it receives.  (1.1, P.27)
> > >
> > > -- TOP's intended function is essentially like Google's ranking: it
> > > gives "how well" a given returned row matches a data set.  Hence, for
> > > a given dataset score would be different for different queries, and
> > > while a global quality measure might play a role, it certainly
> > > wouldn't be expected to dominate the response.  Or am I completely
> > > off here?
> > >
> > > And regarding Walter's interjection:
> > >
> > > > I thought that the difference between MAXREC and TOP is that MAXREC
> > > > requires an overflow indicator, while TOP would prohibit it.
> > > Interesting thought -- is it intended to work this way? [it doesn't
> > > in DaCHS, and a quick search in the 1.1 specs didn't give me anything
> > > pointing in that direction]
> > >
> > > Cheers,
> > >
> > >         Markus
> > >
> >
> > ------------------------------
> >
> > _______________________________________________
> > dal mailing list
> > dal at ivoa.net
> > http://mail.ivoa.net/mailman/listinfo/dal
> >
> > End of dal Digest, Vol 67, Issue 10
> > ***********************************
> 
> 

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/