March 27th, 2009

Alcohol leads to sustainability and other truisms

The NSF workshop on software sustainability for cyberinfrastructure has been pretty good.

There are a few key ideas that are cropping up in the talks and discussions.

First, there is an immense tension in the scientific software ecosystem between the need for multi-year stability and the rate of innovation.  More so than general computing...because researchers looking for stability in some areas..are the very same people are innovating software up the stack.  It's a weird mix of production and prototyping and there's no clear distinction right now.

This is the crux of the problem. Grant funding is short term in nature, and it may fund major software developments like shared frameworks or specific science codes...but seldom prices in long term maintenance with the understanding that the environments in which software is being used move faster than the expected timescales over the useful lifetime of a science codebase. If real-world datasets are expected to be available for 10 years as a primary resource for scientific analysis. Can we build codes that we can expect to run 10 years after the data is collect in our current environment where we know that underlying hardware and software will change drasticly in that time?  Can we expect grad students who never studied software engineering in a serious way to write these codes?

Open source is a working assumption in all discussions as to how the research community is going to reach sustainability...but....its complicated.  Without institutional support can independent science researchers get together to sustain software even if they aren't funded to do it?  Open source does have benefits for the science community, peer review, reproducibility...things that go to the heart of science.  Open source is a win, but cost benefits may not be the the biggest win compared to other benefits accessibility and the openness of knowledge exchange.  The real issue here is are we doing as good a job of measuring the value going into the software tools that researchers are building.

Testing is hard. Testing matters.  We don't do enough of it, and we don't teach science researchers how to test the codes they write...especially the science areas not traditionally associated with CS. Do we give biologists in undergraduate enough software engineering experience before they go on to running and adapting numerical simulation codes? 

Software lifecycle. Science codes do not die. Forever is unsustainable. If as a community science researchers want to focus on sustainable...we will need to come with grips with end-of-life and actually focus on a better release model across the board.

Software patents...suck.  Are academic and institution signing up as members of the Open Invention Network patent mafia? If not why not?
If we are going to have shared open science software..and we want people to reuse do we protect individual reseachers and institutes from the exact same liability that Red Hat as a corporate entity is already dealing with?

How do existing open source communities help the NSF deal with this and all the associated questions? There's a lot in the webcasts for everyone to chew on.  Far more than I can blog about.