Codebases and the IVOA

Norman Gray norman at astro.gla.ac.uk
Mon Feb 17 07:49:50 PST 2014


Mark and all, hello.

On 2014 Feb 14, at 15:01, Mark Taylor <M.B.Taylor at bristol.ac.uk> wrote:

> The IVOA Exec has asked the Applications Working Group to start a
> discussion about source code repositories in the context of development
> of VO applications and infrastructure.  The background/stimulus for this
> was the discussion in the Time Domain focus session at the Heidelberg
> interop meeting (May 2013).

I think this is a good discussion to have, and I'm rather surprised it hasn't come up years before now (but I didn't want to be the one to start it).

> For most of the VO standards we do have high-quality implementations
> in various languages available from publicly-readable source code
> repositories.  However these tend to be controlled locally, at the
> national project or institutional level.  This means (a) it is hard
> to guarantee continued support in the long term and (b) it is not
> clear to third parties which is the "right" one to use.
> Since the codebases are developed independently of each other there
> is also no guarantee that the the different components will play
> well together.

I think the last sentence is a part (c), or at least an important variant of (b), and each of these is distinct from (d) the question of how to manage the codebase qua repository

Although these are distinct points (and point (a) in particular we can't collectively do very much about), I think that Tim's overall response -- that we conceive of the IVOA software set as just another collaborative Open Source project -- is a very good starting point; I agree with most of what he says.  As a cultural point, I would guess that most of the people actively working on IVOA software are familiar enough with the way such projects work, that they would fairly naturally and comfortably fit in with this approach.  There's some stuff about roadmaps and testing and release management that would have to be worked out, but at this present stage, that's possibly just detail.

Turning to the specific questions:

> * Is it desirable for the IVOA to operate one or more central codebases?
>      - What would they look like in terms of curation?
>      - What would be the pros and cons for (a) contributors (b) users?
>      - Are there different answers for different types of software:
>           client-side/server-side; different languages?
>      - Are IVOA members prepared to supply the (significant) effort
>           involved in curation?
>      - Are contributors willing to cede control to the curators?
>      - How would we handle decisions about curation policy and
>           who's in charge?

I started off the Volute repository <https://code.google.com/p/volute/>, and I think that's been well-received, so I can make a few observations from that point of view.

Curation: The curation has been very lightweight.  There are a number of 'owners' of the project (meaning people with privilege to add committers and reconfigure the repository).  Each of the project/ trees has an informal owner, in the sense that when someone mailed me to suggest a missing tree, I made them an owner and asked them to add a summary to the front page bullet list, which roughly indicates the 'coordinator' for that tree.

Pro: I think the shared resource has been useful because (a) the content is safe and shareable, and (b) it's neutral, in the sense of being adminned and backed up and stuff by a third party.

Con: Having everything in a single tree feels a bit unwieldy, though we haven't run into any actual problems with this.

I don't see any grounds for different answers for different technologies.

The last three questions are probably 'obviously yes' or 'obviously no' as a sensitive function of the curation model.

The Starlink repository at github <https://github.com/Starlink/> works well (I watch this, but haven't contributed non-trivially to this incarnation of the codebase).  That has coherent planning, but it already constituted a funded community before the move to Github, so might be a less instructive example than Astropy.

> * Should the IVOA (or parts of it) draw up policy or best practice
>   on code development?
>      - Encouragement to use particular third-party codebases
>           (e.g. Astropy)?
>      - Encouragement to use particular technologies (e.g. github)?
>      - List of required engineering practices for IVOA endorsement?

I think that points 1 and 3 here would be... nice, but would depend more on who (singular or plural) ends up as benevolent dictator, or senate+dictator, or whatever, and what the community feels is important enough to invest in.  It might be that all the community _really_ needs is a securely available code repository, that isn't hostage to a grant running out, or a university reorganisation; in this case, I don't think it'd be worth anyone's while even _trying_ to mandate code policy.  If the community genuinely wants to create high-quality and coherent releases, then this would be worth while.

And, the biggie...

I would strongly argue against using Git.  People who have drunk the Git kool-aid are strongly evangelistic about it, but it is _not_ easy to use, and the people who protest that it is easy to use tend, in my experience, to be those who have already invested quite a lot of time in learning how to use it.  Git has a nasty-shaped learning curve, which means that you end up having to learn quite a lot about Git, and be fairly conversant with its internal storage model, before you can do simple things with much confidence.  Git isn't deeply hard -- it's just a DAG with per-node annotations -- but not everyone wants to make the investments of time and effort required to be comfortable using it in a non-voodoo way.

I'm not much of a fan of Subversion, but it's much more straightforward to use, for simple cases, and more readily intelligible.  The things that Subversion can't do, that Git can, are possibly things that we wouldn't need an IVOA repository to do. (Volute is Subversion).  Having been obliged to use both Git and Subversion, if I were further obliged to choose between exactly those two for a project like this, I'd select Subversion, despite its faults, without difficulty.

Mercurial is somewhere in the middle.  It has most of the functionality of Git, but a model which, for straightforward things, is as accessible as Subversion.  It's also pretty well documented.  There's a large intersection between the collaboration functionality of github.com and bitbucket.org, but there are multiple sites (including Google code) which can host either.

I hesitate to get into this argument, but I think there's a danger that the IVOA would sleepwalk into github simply because it's got the most/noisiest mindshare.

Let fireworks begin (oh, no....).

All the best,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK



More information about the apps mailing list