Do Darwinian theories of evolution have relevance to Free Software projects?
The evolution of software carries a different pace depending upon the management structure and goals of the project. Here we explore the history of, and potential for evolutionary theories of software development.
Picture of Charles Robert Darwin above taken by By J. Cameron, via Wikimedia Commons. Photograph is in the public domain.
Charles Robert Darwin and Evolution
Charles Robert Darwin is famous for his theory of evolution by a process which he termed natural selection. Today we are going to attempt to stretch Darwin’s theories to cover software as well. While this may seem to be a strange attempt to use biological theories in a way they were never intended to be used, we believe that we can show that there is indeed a correlation, and that the correlation specifically affects only certain types of software.
Darwin’s theory in its time was controversial. In certain backwoods places it is still controversial today, more than one hundred and fifty years after the theory was originally proposed.
Scientifically the theory is considered proven. In the outside world a misunderstanding of how the scientific community uses the word “theory” has lead to certain radical elements to believe that there is insufficient proof that Evolution is real. We need first to address this confusion, which Wikipedia does quite admirably:
While theories in the arts and philosophy may address ideas and empirical phenomena which are not easily observable, in modern science the term “theory”, or “scientific theory” is generally understood to refer to a proposed explanation of empirical phenomena, made in a way consistent with scientific method. Such theories are preferably described in such a way that any scientist in the field is in a position to understand, verify, and challenge (or “falsify”) it. In this modern scientific context the distinction between theory and practice corresponds roughly to the distinction between theoretical science and technology or applied science. A common distinction made in science is between theories and hypotheses, with the former being considered as satisfactorily tested or proven and the latter used to denote conjectures or proposed descriptions or models which have not yet been tested or proven to the same standard.
It is suggested that anyone interested in the full details follow the link above for additional information and sources.
The important point is that a theory is a “proposed explanation” of a phenomena. As long as the explanation works, the theory is considered correct, EVEN IF WE DO NOT UNDERSTAND ALL OF THE DETAILS. Consider gravity, a very complex force. We understand many details of how gravity works, and we have a theory of gravity. We do not however know precisely what generates gravity. We know that it has something to do with mass, but is it mass as a whole? Or is it protons, neutrons, electrons, or quarks? Or might it be something that we do not know about yet?
Unless we reach a point where the theory no longer makes sense, this doesn’t matter. This is what happened when Einstein’s physics replaced Newton’s physics. Science had reached a point where the explanations provided by Newton’s theories no longer worked. Einstein produced his General Theory of Relativity, which answered the questions that Newtonian Physics could not answer.
We have not reached that stage with Darwin’s theories yet. Darwin’s Theory of Evolution has held up remarkably well. It has been added to many times by other scientists who have proposed additions, but the basic theory has not been changed.
Evolution and Software
Meir “Manny” Lehman started working on the concept of software evolution while working at IBM in 1968. His interest was in the evolution of the individual program, and he continued to write on the subject until his death in December of 2010. Lehman’s laws of Software Evolution provide an interesting viewpoint from the proprietary perspective. The text below is from the Wikipedia entry.
Prof. Meir M. Lehman, who worked at Imperial College London from 1972 to 2002, and his colleagues have identified a set of behaviours in the evolution of proprietary software. These behaviours (or observations) are known as Lehman’s Laws, and there are eight of them:
1) Continuing Change
2) Increasing Complexity
3) Large Program Evolution
4) Invariant Work-Rate
5) Conservation of Familiarity
6) Continuing Growth
7) Declining Quality
8) Feedback System
It is worth mentioning that the laws are believed to apply mainly to monolithic, proprietary software. For example, some empirical observations coming from the study of open source software development appear to challenge some of the laws.
Further below in the entry is the following:
Software evolution is not likely to be Darwinian, Lamarckian or Baldwinian, but an important phenomenon on its own. Giving the increasing dependence on software at all levels of society and economy, the successful evolution of software is becoming increasingly critical. This is an important topic of research that hasn’t received much attention.
We believe that this is incorrect. In a corporate environment Darwinian Evolution would be regarded by the corporation as inefficient, because for Darwinian Evolution to work at it’s best multiple paths of change must be available. In a corporate setting multiple paths are regarded as inefficient, and therefore the corporation would move to block Darwinian Evolution from occurring. We believe however that in Free Software Projects Darwinian Evolution will be the norm, as Free Software projects can encourage experimentation with different options in a way that proprietary projects are unable to, and thus that Free Software will evolve faster. Going back to the Salon article linked to above, on the third page it notes that:
Michael Godfrey, a University of Waterloo scientist, is equally hesitant but still finds the Lehman approach useful. In 2000, Godfrey and a fellow Waterloo researcher, Qiang Tu, released a study showing that several open-source software programs, including the Linux kernel and fetchmail, were growing at geometric rates, breaking the inverse squared barrier constraining most traditionally built programs. Although the discovery validated arguments within the software development community that large system development is best handled in an open-source manner, Godfrey says he is currently looking for ways to refine the quantitative approach to make it more meaningful.
Apple, by leveraging Free Software, was able to slingshot past Microsoft in the early years of the new Millennia. The use of BSD as the basis of their operating system software allowed Apple to modernize it, and then to use the same basic kernel on the IPod, and later the IPhone, and the IPad. Many of the basic programs provided with Mac OS X, like Safari, are built on a Free Software core; Safari is based on Webkit.
IBM uses Apache as the base of its Websphere product, and OpenOffice as the base of Lotus Smartsuite. Oracle uses OpenOffice as the base of StarOffice. Even Microsoft has used Free Software in the past, though they would rather not admit it. Even Microsoft Windows may use some BSD code.
But is Free Software development Darwinian?
What is Free Software?
Free software is software which meets the Free Software definition as published by the Free Software Foundation. The aim of the Free Software definition is to maintain the user’s freedom to:
- The freedom to run the program, for any purpose (freedom 0).
- The freedom to study how the program works, and change it to make it do what you wish (freedom 1). Access to the source code is a precondition for this.
- The freedom to redistribute copies so you can help your neighbor (freedom 2).
- The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.
One of the most common misunderstandings is that the Free Software Definition is not aimed to protect programmers. It’s aim is to protect users. That it protects programmers is incidental to its main purpose.
What is the Advantage of Free Software?
Software that protects users also has a strong attraction to programmers, because it also protects them. Consider the various Kernel projects. Apple was able to use the FreeBSD project as the basis of Mac OS X. While Apple has contributed back to the FreeBSD project, Apple was under no obligation to.
Compare that to Linux, where Apple would have been under an obligation to contribute back to the project. This has caused a philosophical rift in the Free Software community. The copyleft side of the community believe that a strong copyleft license is needed to protect the user, while the permissive side of the community believe that copyleft licenses are too restrictive of the users freedom to use the code however they wish.
Many programmers side with the copyleft movement. They prefer to keep their code free for use for all time. This is shown by the high percentage of projects that use Copyleft licenses. At present well over 60% of projects use Copyleft style licenses which enforce return contributions.
The most famous example is the Linux Kernel, which since the year 2000 has added support for a truly impressive list of hardware architectures. The enforced sharing which many in the open source camp have declared a bug, has instead appeared to become a feature, in that it has attracted many of the best and brightest developers. The result is that the Linux Kernel has evolved at a faster rate than any of the other Kernels in the last eleven years. Linux now supports a wide range of file systems, more types of hardware than most other Kernels, is the top Supercomputer Kernel, powers most of the World Wide Web, is taking over a large portion of the embedded operating system market, and continues to evolve at a furious rate.
Or consider LibreOffice, the recent fork of OpenOffice. Since it’s release of 3.30 in January, the project then released 3.31, 3.32, 3.33, and then jumped to 3.40 which was released in June. No proprietary software project could handle a release schedule like this, and none ever has. LibreOffice 3.41RC2 is available for download by those who like to test bleeding edge releases.
Oracle, the owner of OpenOffice responded by donating the OpenOffice code base to the Apache Foundation, a tacit admission that it was unable to compete. Whether Apache will be able to compete is open to question. LibreOffice is under a Copyleft license. OpenOffice will be under a permissive license. This will allow LibreOffice to take code from the OpenOffice project, but will not allow OpenOffice to take code from the LibreOffice project. Currently OpenOffice is at least six months behind LibreOffice, and falling further behind daily, it is unlikely that OpenOffice will be able to catch up without a code donation of some sort.
Projects Using a Free Software License Attract More Developers
They not only attract more developers, they also attract developers who are willing to work together as a team. It’s interesting spending time on a project mailing list. The camaraderie, the willingness to put the team before self… It is not something that you tend to see when working at a large company.
When working at a large company it tends to be a dog eat dog situation. A Free Software project is more collegial. Problems get worked around. Egos get parked at the door. Which isn’t to say that there aren’t jerks. There are jerks everywhere. But peer pressure tends to keep them under control.
Free Software Projects develop better developers. Because of the collegial atmosphere, mediocre developers get the mentoring they need to develop into superior developers. Effectively a Free Software project moves the bell curve. Instead of the center of the curve being at 50%, it ends up being at 70%. Just think of what this does to the project’s effectiveness.
As mentioned above, Free Software Projects often attempt more than one solution to a problem. The project may attempt four or five solutions if there are enough volunteers, and enough ideas. Unlike in biological evolution the unused code will not be turned off, instead it will be retained and evaluated for use elsewhere in the project. What may not be suitable for one use, may be suitable for another, and provides multiple evolutionary paths.
This is a freedom that a commercial developer doesn’t have. Many of the developers who will be working on a Free Software Project are paid to do so by their companies, however because of the way that a Free Software Project works the company isn’t as concerned if the developer is working on a duplicate path. What the company is concerned with is whether a release arrives at a given time. A check of the mailing list archives of any project will show that most of the email addresses are corporate. The path to corporate employment is often involvement on a Project, where you prove that not only you can deliver on the work, but also that you can work with people.
Projects With More Developers Evolve Faster
In The Mythical Man-Month: Essays on Software Engineering the central theme is that “adding manpower to a late software project makes it later”. To quote:
The mythical man-month
Assigning more programmers to a project running behind schedule will make it even later, because of the time required for the new programmers to learn about the project, as well as the increased communication overhead. When N people have to communicate among themselves (without a hierarchy), as N increases, their output M decreases and can even become negative, i.e., the total work remaining at the end of a day is greater than the total work that had been remaining at the beginning of that day, such as when many bugs are created.
Group Intercommunication Formula: n(n − 1) / 2
Example: 50 developers give 50 · (50 – 1) / 2 = 1225 channels of communication.
Brooks illustrates the fallacy of adding workers to speed the work by counterexample: If one woman can produce a baby in nine months, then nine women should be able to produce a baby in one month. The reason that this is false is that gestation is a sequential process, whose stages cannot run in parallel. If nine women get pregnant at the same time, in nine months they will produce nine different babies.
The book was originally published in 1975. Free Software Projects leverage the power of Web 2.0 in a way that wasn’t possible to consider in 1975, and which most corporations have not yet adopted (and which most corporations may never adopt). In Clay Shirky’s classic book about the Web 2.0 civilization Here Comes Everybody, Clay describes how society is beginning to harness the power of Web 2.0 in positive ways. What Clay missed is that Free Software Developers have used these Web 2.0 style innovations since before Web 2.0 existed, that in fact Free Software Developers had used these innovations before the web itself existed.
The result is projects that evolve faster and faster in non-traditional ways. When more developers are added and the project should, under classical rules, slow down, it instead speeds up. When developers leave the project putting more stress on the remaining developers, and the project should slow down, it maintains pace. When the project attempts several solutions to a problem, instead of slowing down, the project speeds up. The stress of competition against other projects, whether Free Software or Proprietary Software, will push the Free Software Project to move at a faster and faster pace of development, while the Proprietary Project is unable to respond in the same manner, since it is limited by the rules of the Mythical Man Month.
Darwinian Evolutionary Pressures will force Free Software Projects, whether they are permissive or copyleft licensed into faster and faster rates of evolution. It is expected that these same Darwinian pressures will eventually force proprietary software companies to adopt Free Software development methodologies. Whether they can successfully adapt them to a corporate environment, or whether they will be forced to file for bankruptcy we will have to wait and see.S|A