VODataService becoming WD, then PR

Mon Nov 24 23:57:35 PST 2008

Hi Gerard,

Good, therefore we agree on the definitions!

But the SQL "practice" is not the relational definition -- 
and in your example #4 it's exactly what I feel should be avoided:
define a constraint (an alternate key) on TargetTable 
within a definition of SourceTable. 

Seen from a 'data consumer' perspective, it looks important to
know the uniqueness definitions on a table (the alternate keys);
therefore these should be defined, IMHO. Then the specification
of the referencing contraints (the foreign key) has to specify
the (table,key) to which it refers; there are only 2 possibilities
of specifying this (table,key) :
-- either as a reference (a pointer to the definition of the key
   in TargetTable's definition)
-- or a repetition of the definition 

The first alternative is less prone to errors (conflicting definitions);
SQL has chosen the second alternative (duplicate definitions).

You are right, the GROUP construct with a naming convention was
chosen for backward compatibility. And this information about
keys was felt as an important (sometimes essential) piece of 
the metadata conveyed within a VOTAble document --- VOTable was
built from the beginning to include the data AND all metadata
necessary for a correct interpretation of the data (hence units,
ucds, utypes, links...)

It seems to me that, if the alternate keys are defined in
the registry, there is a full compatibility between the registry
and votable; but if the uniqueness of table A should be inferred
from a definition of another table B, it becomes problematic...

Best Regards,
Francois

>
>Hi Francois
>
>>
>> Well, in the relational _model_, foreign keys are, as far as
>> I know, a reference to a key (primary or candidate key) of
>> another table.
>> Therefore if you specify that the attributes (A,B,C) of a
>> table T2 are a foreign key to the attributes (a,b,c) of the
>> table T1, you should ensure first that the attributes (a,b,c)
>> of the table T1 represent a key to T1, don't you ? And at
>> least in normalized schemas, the definition of keys
>> (uniqueness of a tuple from a subset of its
>> attributes) is a constraint involving a single table (T1),
>> and is therefore a property of T1.
>>
>> You might describe a constraint saying that e.g. the contents
>> of the attributes (A,B,C) of the table T2 must exist in the
>> attributes (a,b,c) of table T1 (without requiring that
>> (a,b,c) is a key of T1, i.e. removing the uniqueness
>> requirement) -- but then it's not a foreign key any more.
>>
>> But I think we agree on the foreign key definition  -- you
>> can't define a foreign key without having defined the
>> (candidate) key to which it refers (the uniqueness constraint
>> of the target table), right ?
>> Maybe I misunderstood your point -- on my side my message was
>> only to ensure that the various ways of defining the 'foreign
>> key' in the different VO components are fully compatible. The
>> VOTable way may look complex when a single column is
>> involved, but it defines unambiguously the referential constraints.
>>
>
>I think we agree on the foreign key definition as defined in relational
>databases.
>In a relational database management system one must indeed define a
>uniqueness constraint (candidate key) on the column(s) in the target table
>before a foreign key can be defined referencing them.
>Note that the foreign key definition in a RDB does not use this candidate
>key explicitly, only the target table and columns. See
>http://savage.net.au/SQL/sql-92.bnf.html#referential%20constraint%20definiti
>on (follow ALTER TABLE down to <referential constraint definition> and
>further).
>
>A concrete example may be useful:
>
>-- 1
>CREATE TABLE SourceTable(column1 integer not null, column2 integer not null,
>foo char(100), bar real);
>-- 2
>CREATE TABLE TargetTable(acolumn integer not null, bcolumn integer not null,
>baz timestamp );
>-- 3
>ALTER TABLE TargetTable ADD CONSTRAINT uq_target UNIQUE (acolumn, bcolumn);
>-- 4
>ALTER TABLE SourceTable ADD CONSTRAINT fk_source_target FOREIGN KEY(column1,
>column2)
>  REFERENCES TargetTable(acolumn, bcolumn);
>
>
>The fourth statement fails if the 3rd (or an alternative primary key or
>unique index definition) is not run.
>But one sees that in the 4th statement there is no explicit mention of the
>candidate key, uq_target, on TargetTable. This is where your VOTable
>convention does not follow relational practice.
>
>My proposal for the ForeignKey complexType tried to be as close a mapping
>from the corresponding SQL statement as possible, with the exception
>(noticed by Pat) that the source table is not explicitly mentioned as it is
>assumed to be in the XML context. That in fact corresponds to an alternative
>way of writing the definitions above:
>
>-- 1
>CREATE TABLE TargetTable(acolumn integer not null, bcolumn integer not null,
>baz timestamp,
>  CONSTRAINT uq_target UNIQUE (acolumn, bcolumn))
>-- 2
>CREATE TABLE SourceTable(column1 integer not null, column2 integer not null,
>foo char(100), bar real,
>  CONSTRAINT fk_source_target FOREIGN KEY(column1, column2) REFERENCES
>TargetTable(acolumn, bcolumn));
>
>Note that 1 must now come before 2.
>
>My proposal was also motivated by what I expect to be the most important
>usage of foreign key definitions, namely to indicate to users that a
>reference exists between objects stored in two tables and to specify which
>columns implement this reference. From this one learns what are the
>appropriate join conditions when writing ADQL. One does not need to know the
>details of the candidate key, its existence can be inferred from the
>definition.
>*For this purpose* it is not required that one has a separate definition of
>the candidate key/uniqueness on the target table. I.e. using the above
>example, we MUST represent 1,2 and 4 in the metadata, but not 3, at least
>not for this purpose.
>
>In my opinion it is useful to add more metadata about a database in the
>TAP/VODataService schems, especially indexes (as I mailed earlier in this
>thread). Maybe the same is true for uniqueness constraints, primary keys
>etc. But whichever elements we decide are necessary, I strongly suggest that
>we attempt to represent the corresponding relational statements as closely
>as possible.
>
>Btw, I assume that in VOTable you use the GROUP construct together with a
>naming convention because introducing a more explicit FOREIGNKEY construct
>would break backwards compatibility? Or was there another reason?
>Also, since I though VOTables are supposedly used for messaging tabular
>datasets, not for representing entire databases, why was this convention
>added?
>
>Best regards
>
>Gerard
>
>
>
>> --Francois
>>
>> >
>> >Hi Francois
>> >
>> >
>> >> About the primary/foreign key definitions -- there is a
>> convention in
>> >> VOTable which makes use of GROUPs and the ID/REF
>> mechanism, as in the
>> >> following example:
>> >>
>> >> <TABLE name="table1" ... >
>> >>   <GROUP ID="Galaxy_ID" name="primaryKey">
>> >>     <!-- full definition of the primary key components: -->
>> >>     <FIELDref ref="tab1_col1" >
>> >>     <FIELDref ref="tab1_col2" > ...
>> >>   </GROUP>
>> >>   <FIELD ID="tab1_col1" name="Cluster" ... >
>> >>   <FIELD ID="tab1_col2" name="Galaxy" ... >
>> >>   ...
>> >> </TABLE>
>> >>
>> >> <TABLE name="table2" ... >
>> >>   <GROUP ref="Galaxy_ID" name="foreignKey">
>> >>     <!-- Components of the foreign key, referencing the PK of
>> >> table1 -->
>> >>     <FIELDref ref="tab2_col1" >
>> >>     <FIELDref ref="tab2_col2" > ...
>> >>   </GROUP>
>> >>   <FIELD ID="tab2_col1" name="ClusterName" ... >
>> >>   <FIELD ID="tab2_col2" name="GalaxyNumber" ... >
>> >>   ...
>> >> </TABLE>
>> >>
>> >> With this mechanism as of as many keys as necessary can be
>> defined,
>> >> and a column may be a component of several foreign keys.
>> >> It seems to me that this way should be mappable to TAP
>> schema as well
>> >> as the Registry one. However I don't see how you can avoid to use
>> >> ID/ref mechanisms: a foreign key has always to refer to the
>> >> definition of a primary key. The name of the primary key
>> might be the
>> >> same of a table, since by definition there is only one primary key
>> >> per table. But you might also wish to define several keys for a
>> >> table, in which case you have to assign names to your keys
>> to be able
>> >> to reference them.
>> >>
>> >You seem to imply that the introduction of foreign keys in
>> the metadata
>> >specification requires the introduction of primary keys as well. Is
>> >that correct?
>> >In relational databases foreign keys are constraints, and on
>> the target
>> >table a primary key or at least a uniqueness constraint must
>> have been
>> >defined and must be captured.
>> >I don't see though why for the problem at hand it is
>> necessary to add
>> >this complication.
>> >
>> >To me it seems that the main use of a "foreign key"
>> definition in the
>> >VODataService and the equivalent(/same?) TAP schema is to
>> inform users
>> >about the relationships between tables so they can learn what joins
>> >they can usefully make. That at least the concept I tried to
>> capture in
>> >the little model I proposed. I wanted to be able to say that
>> one tables
>> >references another and how the reference is implemented.
>> >To support this you only need to know which column(s) in the
>> "from table"
>> >correspond to which columns in the "target table". I used table and
>> >column names iso ID/IDREF combinations as I thought that tables and
>> >columns were not given ID attributes. It is also much more readable.
>> >
>> >
>> >> My wish is really to be able to express one concept (relational
>> >> tables), if not with exactly the same terms, at least in
>> terms which
>> >> can be mapped between the various dialects without any loss of
>> >> information:
>> >> if for instance in TAP one column can only be part of a single
>> >> foreign key, while the Registry has not this restriction, we loose
>> >> the full interoperability.
>> >>
>> >Why would TAP put that restriction on the metadata?
>> >Why would you only be allowed to publish your database
>> through TAP if
>> >you had followed a particular design strategy, if it is so simple to
>> >describe the general case?
>> >
>> >Cheers
>> >Gerard
>> ==============================================================
>> ==================
>> Francois Ochsenbein       ------       Observatoire
>> Astronomique de Strasbourg
>>    11, rue de l'Universite F-67000 STRASBOURG       Phone:
>> +33-(0)390 24 24 29
>> Email: francois at astro.u-strasbg.fr   (France)         Fax:
>> +33-(0)390 24 24 32
>> ==============================================================
>> ==================
>>
================================================================================
Francois Ochsenbein       ------       Observatoire Astronomique de Strasbourg
   11, rue de l'Universite F-67000 STRASBOURG       Phone: +33-(0)390 24 24 29
Email: francois at astro.u-strasbg.fr   (France)         Fax: +33-(0)390 24 24 32
================================================================================