a personnal contribution from the Data Model point of view

Mon Mar 3 00:07:15 PST 2008

Hello Mireille,

> About the versioning:
> For the Characterisation Model, developped and stabilized along  
> quite a  long period, we had this kind of problem. We have managed  
> it 'by hand' but with difficulties.
> Besides the XML schema, there can be other kind of serialisations,  
> like Utype-lists derived from UML classes and their attributes, or  
> FITS serialisation in the shape of adhoc FITS KEYWORDS . Versioning  
> also applies to these kinds of serialisation.
>
> What would be the overhead in using a proper versioning tool like  
> CVS or Subversion for example? does anybody have a oneyear/two year  
> experience about that?

Our lab has used version control systems for a long time, and a  
couple of years of years ago we made the switch from CVS to  
Subversion. In addition to the source code for our software projects,  
we also use the repositories to help with sharing the LaTeX files and  
images for publications. For the text-based LaTeX files, it is very  
convenient to be able to see a comparison of two different versions,  
while for the binary image files, just the logging of changes is useful.

One feature that I think greatly reduces the barrier to using version  
control, and makes it attractive to more users, is providing a web- 
based interface to the repository. Project hosting services (e.g.,  
SourceForge, Google Code) provide this, as do software project  
management packages such as Trac or FogBugz. These services and  
packages usually combine a browser for the source code (or XML  
documents and images), along with a Wiki or other means to easily  
create web pages.

As an example, you can look at the NVO's Trac site at http://trac.us- 
vo.org/nvo; This link: http://tinyurl.com/32ny82 will show you what  
an XML Schema looks like inside of the browser. For comparison, there  
is the Google Code project, Volute, set up by Norman Gray for IVOA  
groups, at http://code.google.com/p/volute/. This was set up for the  
Semantics group, and we've added the simulation data model to the  
repository. Google Code is nice, since no one has to manage the  
software or hardware, but the feature list is minimal.

In summary, for binary files, or complicated text files (XMI, Word  
documents), version control helps to track the changes made to the  
files. For text files, such as an XML Schema, or an XHTML web page,  
you can make comparisons of the actual content that was changed. And,  
adding a web interface to the repository helps users to actually see  
the files.

Now, as to the overhead: users will have to learn to use a version  
control client. There are command line and GUI clients for almost  
every platform (I'm using the term "almost" in case there's a VAX  
user out there); also, many editors and IDE's work with version  
control systems. If you do not want to host your project on a site  
like SourceForge or Google Code, you'll have to find someone to host  
it for you, which may add some overhead.

I will admit, getting wholesale adoption of a system like this may be  
impossible. You may have a user that either refuses to use the system  
or doesn't trust it. The solution we've found is to have the project  
leader accept changes by email, and then post them into the  
repository on behalf of the user. It's not a perfect process, but it  
works.

I hope you've found some of this useful. If you have any questions,  
please let me know.

--Rick

------------------------------------------------------------------------ 
-
Rick Wagner, Graduate Student Researcher
UCSD Physics
9500 Gilman Drive
La Jolla, CA  92093-0424
Email:  rwagner at physics.ucsd.edu
WWW:    http://lca.ucsd.edu/projects/rpwagner
(858) 822-4784 Phone
------------------------------------------------------------------------ 
-
No syllabus survives contact with the students.
--Rick Wagner
------------------------------------------------------------------------ 
-