Codebases and the IVOA
Paul Harrison
paul.harrison at manchester.ac.uk
Tue Feb 18 01:45:18 PST 2014
On 2014-02 -17, at 15:49, Norman Gray <norman at ASTRO.GLA.AC.UK> wrote:
>
> Mark and all, hello.
>
> On 2014 Feb 14, at 15:01, Mark Taylor <M.B.Taylor at bristol.ac.uk> wrote:
>
>> The IVOA Exec has asked the Applications Working Group to start a
>> discussion about source code repositories in the context of development
>> of VO applications and infrastructure. The background/stimulus for this
>> was the discussion in the Time Domain focus session at the Heidelberg
>> interop meeting (May 2013).
>
> I think this is a good discussion to have, and I'm rather surprised it hasn't come up years before now (but I didn't want to be the one to start it).
> I started off the Volute repository <https://code.google.com/p/volute/>, and I think that's been well-received, so I can make a few observations from that point of view.
I do remember trying to bring it up at a TCG meeting and it was met with almost zero enthusiasm - which I thought was a mistake (and was why I was an early joiner of your Volute effort). I think that Volute has been reasonably successful despite being a very loosely curated repository - this might be a function of the relatively small number of contributors and eclectic mix of (non-overlapping) projects that have been hosted there. However, I believe that the transformative effect of version control systems on really being able to collaborate efficiently cannot be overstated - even for the writing of standards documents (one of the major uses of Volute).
In addition Volute proves that there is at least some desire for this sort of centralised repository, and I reckon that one of the major successes of Volute, is that it has clearly had contributors from several different institutions - so there has been a sense of community ownership which is essential to the success of any official IVOA repository. I know that there are several other version control repositories containing good VO code (opencadc, DaCHS,AstroGrid…) which although publicly readable at least, they tend to only have contributors from the originating institute.
>
>> For most of the VO standards we do have high-quality implementations
>> in various languages available from publicly-readable source code
>> repositories. However these tend to be controlled locally, at the
>> national project or institutional level. This means (a) it is hard
>> to guarantee continued support in the long term and (b) it is not
>> clear to third parties which is the "right" one to use.
>> Since the codebases are developed independently of each other there
>> is also no guarantee that the the different components will play
>> well together.
>
>
>> * Is it desirable for the IVOA to operate one or more central codebases?
>> - What would they look like in terms of curation?
>> - What would be the pros and cons for (a) contributors (b) users?
>> - Are there different answers for different types of software:
>> client-side/server-side; different languages?
>> - Are IVOA members prepared to supply the (significant) effort
>> involved in curation?
>> - Are contributors willing to cede control to the curators?
>> - How would we handle decisions about curation policy and
>> who's in charge?
>> * Should the IVOA (or parts of it) draw up policy or best practice
>> on code development?
>> - Encouragement to use particular third-party codebases
>> (e.g. Astropy)?
>> - Encouragement to use particular technologies (e.g. github)?
>> - List of required engineering practices for IVOA endorsement?
>
> I think that points 1 and 3 here would be... nice, but would depend more on who (singular or plural) ends up as benevolent dictator, or senate+dictator, or whatever, and what the community feels is important enough to invest in. It might be that all the community _really_ needs is a securely available code repository, that isn't hostage to a grant running out, or a university reorganisation; in this case, I don't think it'd be worth anyone's while even _trying_ to mandate code policy. If the community genuinely wants to create high-quality and coherent releases, then this would be worth while.
I think that the most important aspect initially is to sort out the hosting with some rules as to how this is to be organised - If there are some engineering standards to be followed then these should be describable at a very high level and be able to fit on one screen of text - huge amounts of detail will just put people off.
I think that doing something that appears to the outside world as a cohesive “whole" would require much curation effort. I imagine that we do not have a single person within the community that could give the necessary time to this job. However, perhaps several curators could be found for separate sections of the repository.
I rather like the idea of “incubator” projects that many organised code-bases use, whereby we set the barriers fairly low to allowing a particular code project to be part of the IVOA repository to encourage openness and collaboration, but if in the fullness of time the particular code-base is seen to follow good practices (and be useful) it can be promoted to IVOA endorsement.
I would also tend to favour including software that could be broadly described as "reusable libraries" in the first instance, rather than fully blown “clients” and “servers”. Servers especially often require some sort of execution “environment” that hampers their adoption as is, and it is better to provide some standard building blocks that can be used to create servers, so that more people are likely to contribute to a building block that can be used in their particular environment. Indeed when one looks into the task of trying to include some libraries of existing VO codebases, this problem of the assumed programming “environment” soon becomes apparent - there are often dependencies with other libraries within the existing codebase. So making an IVOA repository would be a good thing as it would gather these dependencies into one place, but including the best of what already exists will require some reworking of that code, not least because there is much overlap in functionality.
>
> And, the biggie...
>
> I would strongly argue against using Git. People who have drunk the Git kool-aid are strongly evangelistic about it, but it is _not_ easy to use, and the people who protest that it is easy to use tend, in my experience, to be those who have already invested quite a lot of time in learning how to use it. Git has a nasty-shaped learning curve, which means that you end up having to learn quite a lot about Git, and be fairly conversant with its internal storage model, before you can do simple things with much confidence. Git isn't deeply hard -- it's just a DAG with per-node annotations -- but not everyone wants to make the investments of time and effort required to be comfortable using it in a non-voodoo way.
>
> I'm not much of a fan of Subversion, but it's much more straightforward to use, for simple cases, and more readily intelligible. The things that Subversion can't do, that Git can, are possibly things that we wouldn't need an IVOA repository to do. (Volute is Subversion). Having been obliged to use both Git and Subversion, if I were further obliged to choose between exactly those two for a project like this, I'd select Subversion, despite its faults, without difficulty.
>
> Mercurial is somewhere in the middle. It has most of the functionality of Git, but a model which, for straightforward things, is as accessible as Subversion. It's also pretty well documented. There's a large intersection between the collaboration functionality of github.com and bitbucket.org, but there are multiple sites (including Google code) which can host either.
>
> I hesitate to get into this argument, but I think there's a danger that the IVOA would sleepwalk into github simply because it's got the most/noisiest mindshare.
>
> Let fireworks begin (oh, no....).
>
I broadly agree with you - Git seems to have too steep a learning curve - operations that should be easy are often difficult to do (though admittedly some difficult operations are easy to achieve!) - especially if you already have experience of other version control systems. However, Git might be worth the effort as it does allow some different development workflows compared with Subversion - indeed the optimal way that an IVOA repository might be organised and curated will depend crucially on the choice of version control technology.
Finally - perhaps we should just promote Volute as this IVOA codebase repository, and encourage more contributions. Certainly in the time before any official IVOA decision is made, I would urge anyone with code that they think is worthy of publishing to try Volute.
Regards,
Paul.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2774 bytes
Desc: not available
URL: <http://www.ivoa.net/pipermail/apps/attachments/20140218/da36ad0b/attachment.bin>
More information about the apps
mailing list