VODataService

Patrick Dowler patrick.dowler at nrc-cnrc.gc.ca
Thu Dec 11 00:10:43 PST 2008


On 2008-12-10 22:33:53 glemson wrote:
> Hi Pat
>
> It seems to me there might be another possibility to solve your case,
> implement a toURL(..) UDF on your database.
> Users could now use the toURL(pats_uri) in the where clause, though it
> would be very inefficient. But it would be clear to them from the fact that
> a UDF is needed.
>
> To me the big advantage is that it makes explicit to the user what is
> happening, which I think is ALWAYS good. It seems to solve the issues with
> 1,2,3 and you do not need extra concepts such as select-only columns.
> Of course I assume it is possible for you to implement a UDF with this
> functionality.

I agree that with ADQL it is important that things be explicit and I am not 
100% happy about any of the ideas. However, the assumption that one can 
implement a UDF is problematic at best. The problem is that when one uses 
URIs a common practice (and one we have made good use of recently) is that 
different subsystems provide code or services to do that conversion and they 
register their scheme and code/service. This is all implementation detail but 
the point is that it may not be technically feasible to actually run that 
conversion inside the DB in a UDF. Secondarily, I think any solution that 
requires a UDF is not implementable by some people, for example when there is 
an existing DB and TAP access is to be provided but the TAP implementors 
cannot modify the DB in any way (e.g. they have to do everything in the 
service code). 


> Could you explain what is exactly the use case here?
> Do you want/need to keep the URI column itself available to the user?

The URI is the thing actually stored in the db. In our case, it is a real URI, 
but I could image in other cases (existing DBs) that there are several 
columns that together tell one enough to construct a URL. This would be the 
case in any DB that contains metadata for data files that users would want to 
download. 

I don't think there is value in exposing the URIs directly (or the multiple 
columns that are used to generate the URL), so the table metadata would say 
there is a column with URLs in it.

> If you wish to hide the URI column, you could create a view that contains
> the toURL(pats_uri) in the selec, not the pats_uri column itself, and only
> publish the view. (I know you know this.). If your database allows you
> oculd also use the UDF to define a derived column on the table.

Sure, once you have a UDF you can perform some indirection to expose something 
different than your underlying tables.

> For now I am against your solution 4.
> Does the select-only concept have any precedent elsewhere, do you have
> references to this? It will require pretty advanced support in ADQL
> parsers, though it is not an ADQL concept (ADQL being non-semantic).
> As far as I know, but I may be mistaken it goes beyond standard RDB
> concepts.

Yes, it is non-standard RDBMS and SQL. It has precedent in various DAL 
services where the output can (does) contain things that the caller cannot 
constrain in the request. Specifically, the AccessRef in DAL services is a 
URL to the data but you cannot constraint it and I very much doubt that 
anyone actually stores the full URL in the database. And generally it does 
not make much sense to constrain such a column, but in TAP and ADQL users can 
do that if the column is visible.  

> And it seems not to be necessary.

Ah, but only if one accepts that implementing UDFs in the DB is necessary, and 
I think a lot of implementors will be unwilling or unable to do that, for a  
variety of technical reasons.

> Gerard
>
> Patrick Dowler writes:
> > I ran into an odd table/column metadata case when messing with our TAP
> > prototype. In the database I have a column with a URI in it. The URI is
> > meaningless outside our own software as it is a private identifier, but
> > it can be turned into a URL by some custom code (the scheme handler for
> > the subsystem that "owns" the URI).
> >
> > As a result, and to be useful to users, I want to express that there is a
> > column with a URL in it and that users will be able to download data via
> > this URL. I can easily (enough) convert the URI to URL while writing the
> > VOTable output. The catch is that users cannot sensibly use the column in
> > the WHERE clause. For example, if someone did
> >
> >    SELECT * FROM some_table
> >
> > they would see in the result a column called "download" (eg) and values
> > like:
> >
> >    http://www.example.com/foo?bar=123
> >    ftp://ftp.example.com/something.fits
> >
> > However, if they then tried to something like:
> >
> >    SELECT * FROM some_table AS t
> >    WHERE t.download LIKE 'http://%'
> >
> > to only get the http downloads, they would get no results (because the
> > table contains URIs that get turned into http URLs on output). It would
> > be quite complex to convert that LIKE predicate into something that
> > returned the "correct" result (it would require knowledge of all possible
> > URI->URL conversions).
> >
> > I can see several ways to proceed:
> >
> > 1. Don't do that and actually store URLs in the db: very brittle and
> > doesn't allow one to direct downloads (eg like the AccessRef in DAL
> > protocols), most likely requires DB modification as people would not
> > likely put the full URL in the DB
> >
> > 2. Do not expose the URL column at all: for an observation DB, that would
> > mean people cannot actually download the data :-(
> >
> > 3. User needs knowledge of how to convert the URI to URL: this requires
> > that all URI schemes be standardised and service providers cannot use
> > custom ones (which is a highly useful and flexible and one we use
> > extensively) or that users need extra out-of-band knowledge to use the
> > service fully
> >
> > 4. Have columns that users can select but not constrain: basically the
> > service marks columns in its table metadata as "select only", which
> > covers computed columns like above and also columns where searching is
> > not feasible (e.g. columns with binary values or region columns in a db
> > without a spatial querying implementation)



-- 

Patrick Dowler
Tel/Tél: (250) 363-0044                  | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre   | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada                  | Gouvernement du Canada
5071 West Saanich Road               | 5071, chemin West Saanich
Victoria, BC                                  | Victoria (C.-B.)



More information about the registry mailing list