VODataService becoming WD, then PR

Gerard gerard.lemson at mpe.mpg.de
Mon Nov 24 03:05:48 PST 2008


Hi Francois

> 
> Well, in the relational _model_, foreign keys are, as far as 
> I know, a reference to a key (primary or candidate key) of 
> another table.
> Therefore if you specify that the attributes (A,B,C) of a 
> table T2 are a foreign key to the attributes (a,b,c) of the 
> table T1, you should ensure first that the attributes (a,b,c) 
> of the table T1 represent a key to T1, don't you ? And at 
> least in normalized schemas, the definition of keys 
> (uniqueness of a tuple from a subset of its
> attributes) is a constraint involving a single table (T1), 
> and is therefore a property of T1.
> 
> You might describe a constraint saying that e.g. the contents 
> of the attributes (A,B,C) of the table T2 must exist in the 
> attributes (a,b,c) of table T1 (without requiring that 
> (a,b,c) is a key of T1, i.e. removing the uniqueness 
> requirement) -- but then it's not a foreign key any more. 
> 
> But I think we agree on the foreign key definition  -- you 
> can't define a foreign key without having defined the 
> (candidate) key to which it refers (the uniqueness constraint 
> of the target table), right ? 
> Maybe I misunderstood your point -- on my side my message was 
> only to ensure that the various ways of defining the 'foreign 
> key' in the different VO components are fully compatible. The 
> VOTable way may look complex when a single column is 
> involved, but it defines unambiguously the referential constraints.
> 

I think we agree on the foreign key definition as defined in relational
databases.
In a relational database management system one must indeed define a
uniqueness constraint (candidate key) on the column(s) in the target table
before a foreign key can be defined referencing them. 
Note that the foreign key definition in a RDB does not use this candidate
key explicitly, only the target table and columns. See
http://savage.net.au/SQL/sql-92.bnf.html#referential%20constraint%20definiti
on (follow ALTER TABLE down to <referential constraint definition> and
further).

A concrete example may be useful:

-- 1
CREATE TABLE SourceTable(column1 integer not null, column2 integer not null,
foo char(100), bar real);
-- 2
CREATE TABLE TargetTable(acolumn integer not null, bcolumn integer not null,
baz timestamp );
-- 3
ALTER TABLE TargetTable ADD CONSTRAINT uq_target UNIQUE (acolumn, bcolumn);
-- 4
ALTER TABLE SourceTable ADD CONSTRAINT fk_source_target FOREIGN KEY(column1,
column2)
  REFERENCES TargetTable(acolumn, bcolumn);


The fourth statement fails if the 3rd (or an alternative primary key or
unique index definition) is not run.
But one sees that in the 4th statement there is no explicit mention of the
candidate key, uq_target, on TargetTable. This is where your VOTable
convention does not follow relational practice.

My proposal for the ForeignKey complexType tried to be as close a mapping
from the corresponding SQL statement as possible, with the exception
(noticed by Pat) that the source table is not explicitly mentioned as it is
assumed to be in the XML context. That in fact corresponds to an alternative
way of writing the definitions above:

-- 1
CREATE TABLE TargetTable(acolumn integer not null, bcolumn integer not null,
baz timestamp,
  CONSTRAINT uq_target UNIQUE (acolumn, bcolumn))
-- 2
CREATE TABLE SourceTable(column1 integer not null, column2 integer not null,
foo char(100), bar real,
  CONSTRAINT fk_source_target FOREIGN KEY(column1, column2) REFERENCES
TargetTable(acolumn, bcolumn));

Note that 1 must now come before 2.

My proposal was also motivated by what I expect to be the most important
usage of foreign key definitions, namely to indicate to users that a
reference exists between objects stored in two tables and to specify which
columns implement this reference. From this one learns what are the
appropriate join conditions when writing ADQL. One does not need to know the
details of the candidate key, its existence can be inferred from the
definition.
*For this purpose* it is not required that one has a separate definition of
the candidate key/uniqueness on the target table. I.e. using the above
example, we MUST represent 1,2 and 4 in the metadata, but not 3, at least
not for this purpose.

In my opinion it is useful to add more metadata about a database in the
TAP/VODataService schems, especially indexes (as I mailed earlier in this
thread). Maybe the same is true for uniqueness constraints, primary keys
etc. But whichever elements we decide are necessary, I strongly suggest that
we attempt to represent the corresponding relational statements as closely
as possible.

Btw, I assume that in VOTable you use the GROUP construct together with a
naming convention because introducing a more explicit FOREIGNKEY construct
would break backwards compatibility? Or was there another reason?
Also, since I though VOTables are supposedly used for messaging tabular
datasets, not for representing entire databases, why was this convention
added?

Best regards

Gerard



> --Francois
> 
> >
> >Hi Francois
> >
> >
> >> About the primary/foreign key definitions -- there is a 
> convention in 
> >> VOTable which makes use of GROUPs and the ID/REF 
> mechanism, as in the 
> >> following example:
> >>
> >> <TABLE name="table1" ... >
> >>   <GROUP ID="Galaxy_ID" name="primaryKey">
> >>     <!-- full definition of the primary key components: -->
> >>     <FIELDref ref="tab1_col1" >
> >>     <FIELDref ref="tab1_col2" > ...
> >>   </GROUP>
> >>   <FIELD ID="tab1_col1" name="Cluster" ... >
> >>   <FIELD ID="tab1_col2" name="Galaxy" ... >
> >>   ...
> >> </TABLE>
> >>
> >> <TABLE name="table2" ... >
> >>   <GROUP ref="Galaxy_ID" name="foreignKey">
> >>     <!-- Components of the foreign key, referencing the PK of
> >> table1 -->
> >>     <FIELDref ref="tab2_col1" >
> >>     <FIELDref ref="tab2_col2" > ...
> >>   </GROUP>
> >>   <FIELD ID="tab2_col1" name="ClusterName" ... >
> >>   <FIELD ID="tab2_col2" name="GalaxyNumber" ... >
> >>   ...
> >> </TABLE>
> >>
> >> With this mechanism as of as many keys as necessary can be 
> defined, 
> >> and a column may be a component of several foreign keys.
> >> It seems to me that this way should be mappable to TAP 
> schema as well 
> >> as the Registry one. However I don't see how you can avoid to use 
> >> ID/ref mechanisms: a foreign key has always to refer to the 
> >> definition of a primary key. The name of the primary key 
> might be the 
> >> same of a table, since by definition there is only one primary key 
> >> per table. But you might also wish to define several keys for a 
> >> table, in which case you have to assign names to your keys 
> to be able 
> >> to reference them.
> >>
> >You seem to imply that the introduction of foreign keys in 
> the metadata 
> >specification requires the introduction of primary keys as well. Is 
> >that correct?
> >In relational databases foreign keys are constraints, and on 
> the target 
> >table a primary key or at least a uniqueness constraint must 
> have been 
> >defined and must be captured.
> >I don't see though why for the problem at hand it is 
> necessary to add 
> >this complication.
> >
> >To me it seems that the main use of a "foreign key" 
> definition in the 
> >VODataService and the equivalent(/same?) TAP schema is to 
> inform users 
> >about the relationships between tables so they can learn what joins 
> >they can usefully make. That at least the concept I tried to 
> capture in 
> >the little model I proposed. I wanted to be able to say that 
> one tables 
> >references another and how the reference is implemented.
> >To support this you only need to know which column(s) in the 
> "from table"
> >correspond to which columns in the "target table". I used table and 
> >column names iso ID/IDREF combinations as I thought that tables and 
> >columns were not given ID attributes. It is also much more readable.
> >
> >
> >> My wish is really to be able to express one concept (relational 
> >> tables), if not with exactly the same terms, at least in 
> terms which 
> >> can be mapped between the various dialects without any loss of 
> >> information:
> >> if for instance in TAP one column can only be part of a single 
> >> foreign key, while the Registry has not this restriction, we loose 
> >> the full interoperability.
> >>
> >Why would TAP put that restriction on the metadata?
> >Why would you only be allowed to publish your database 
> through TAP if 
> >you had followed a particular design strategy, if it is so simple to 
> >describe the general case?
> >
> >Cheers
> >Gerard
> ==============================================================
> ==================
> Francois Ochsenbein       ------       Observatoire 
> Astronomique de Strasbourg
>    11, rue de l'Universite F-67000 STRASBOURG       Phone: 
> +33-(0)390 24 24 29
> Email: francois at astro.u-strasbg.fr   (France)         Fax: 
> +33-(0)390 24 24 32
> ==============================================================
> ==================
> 



More information about the registry mailing list