Home  |   About  |   Energy  |   Politics  |   Software  |   Music

18 April 2012

An obvious path for Science

I got an e-mail today with a link to a fresh publication in the Science journal. It is entitled "Shining Light into Black Boxes" and while it is mostly stating the obvious, it is quite a breakthrough in this sort of journal. Without further ado here's the abstract:
The publication and open exchange of knowledge and material form the backbone of scientific progress and reproducibility and are obligatory for publicly funded research. Despite increasing reliance on computing in every domain of scientific endeavor, the computer source code critical to understanding and evaluating computer programs is commonly withheld, effectively rendering these programs “black boxes” in the research work flow. Exempting from basic publication and disclosure standards such a ubiquitous category of research tool carries substantial negative consequences. Eliminating this disparity will require concerted policy action by funding agencies and journal publishers, as well as changes in the way research institutions receiving public funds manage their intellectual property (IP).
Amen.

Also of note is the list of authors, representing some of the crème de la crème of the United States academia:
  • J. Urban and P. Sliz - School of Law, University of California, Berkeley.
  • P. D. Adams - Lawrence Berkeley National Laboratory, Berkeley.
  • I. Foster - Argonne National Laboratory and University of Chicago, Argonne.
  • A. Sail - University of California, San Francisco.
  • D. Baker - University of Washington and Howard Hughes Medical Institute, Seattle.

The opaqueness of software is an obvious counter to the scientific method. If I said I invented a pocket size fusion reactor that produces some energy, but that you can't look inside it, would you believe me? Actually, that is precisely what is happening every time supposedly scientifically results based on close source software are published. Certainly Science is synonym with Open Source, but in what measure?

The software used by scientists can be divided in three main layers: (1) the Operating System, (2) Supporting Libraries, and (3) Ad-hoc code. The Operating System provides the basic interface with the hardware, facilitating all communication with the several computer components and providing basic logic and calculation functionality. Supporting Libraries provide further functionalities, that while going closer to the researcher's needs are still wide scoped enough for application in different problems or experiments; an example is a statistics programme that helps with the calculation of things like the standard deviation of a time series. Ad-hoc code is the software developed specifically to address a certain experiment or scientific problem.

In the future the Scientific method shall oblige a strict attitude towards each of these software layers. Regarding Operating Systems, open source doesn't have to be a requirement (hush Winblows folk), the sort of calculation/logic functionality these provide is so basic that it can be easily verified. As for Supporting Libraries, they can eventually remain closed source, but to be used in Science must be somehow certified, either against a batch of standard tests or against open source alternative. In any case I expect most of this software to naturally evolve towards open source, such has been happening with the statistical software R, or the database manager Postgres. Finally, the crux of the matter, the had-oc code, there simply isn't another way than for all of it to be open. And here lies the main conflict with the proprietary attitude some researchers and institutions have towards Science. They must realise that if they wish to retain their code closed then what they are developing are commercial products, not scientific results.

Perhaps the most famous controversy about source code in Science is the temperature reconstruction published by three American researchers in 1998, that got the folkish name of "Hockey Stick". Presenting a result that was markedly different from previous scientific knowledge, it was nonetheless vastly promoted by certain institutions and media outlets. This public exposure yearned a sort of scrutiny way beyond the usual peer review, with several arguments, counter-arguments and re-reconstructions published since. Remarkably, 14 years later no one has ever been able to exactly reproduce the initial results from the same original data. Had the original source code been published together with the results and everyone had been spared the on-going intellectual soap opera.

The time will come when every piece of scientific experimentation must be accompanied by its source code in order to pass the peer review process. My expectation is for this to take about a decade to fully materialise, fostered by the ever growing involvement of academia with the open source community.

Finally, I'd note the also obvious contradiction of an article like this being published in a journal that is not freely available to the public. But as Jesus once said: "A Physician doesn't visit the home of healthy folk".