Programming Language Wars: The Movie

In computer science and hacker circles, the programming language wars have, it seems, been raging since the beginning of time. A little electronic archaeology reveals some amusing exchanges:

  • "By all means create your own dialect of FORTH. While your at it, you can add the best features of PL-I, F77 and CORAL66. Then, look me up when you get out of college and we'll show you how it's done when you have to make a living" [1985 thread]
  • "This debate ... is very much like two engineers engaged in building a three-mile bridge arguing over the brand of blue-print paper they use." [1987 thread]

Passionate arguments can often be improved by actual measurements. How fast, expressive, and efficient is a particular language? That's what The Computer Language Benchmarks Game set out to provide, measuring time, source code length, and memory use of several dozen languages across a set of benchmarks.

If you have measurements, why not improve them with a visualization? And so I present to you an interactive, multi-dimensional, dynamic, whizbang-o-matic rendering of the Programming Language Wars.

Each circle is a language. Its horizontal position represents the gzipped source code size used to implement the benchmarks, which is intended to measure the language's "expressiveness". Its vertical position represents the real time used to execute the benchmarks, and its size (and color) indicate how much memory was used.

The cluster of languages in the top left are slow but expressive scripting languages. At the bottom right you will find C and C++, the fastest languages, but which take quite a bit more coding to get the job done. In between there is a tradeoff between speed and expressiveness, where lie languages like OCaml (which I happen to use whenever possible).

Actually, each point is only a summary of the language's performance: Consider some metric, like real time, and some particular language L. The Benchmarks Game folks ran implementations of a set of about 12 benchmarks (FASTA, Mandelbrot, ...) in L. L's time for each benchmark is divided by the best time across all languages for that benchmark. This gives us a normalized score for each benchmark; we take the median of these to produce a summary real time score for L. Then we do the same for the other metrics: CPU time, source code length, and memory.

The plot shows data for a single-core x86 box (assuming you haven't yet messed around with the controls). If you press the movie button in the bottom left, it will transition to results on a quad-core box. (Still normalized by the best single-core score. The labels say 1901 and 1904 since Google's API wants dates.) TIP: When you play the animation, select a few languages you're interested in and check the Trails checkbox, so the movement stands out.

To better visualize which languages' implementations took advantage of parallelism, and then click Play. The languages that move downward have improved their real time. Some stay in the same spot, probably indicating that the Computer Language Benchmarks Game doesn't have the best implementations.

Fine. Just tell me which language is best.

These benchmarks are almost certainly not representative of what you want to do. There are various flaws in this approach — how we choose to summarize (the median here) will affect the ordering of languages; the implementations are not perfect; some languages are missing implementations for some benchmarks; even for one language there are many possible implementations with different tradeoffs and only the fastest was tested; and so on. Perhaps most significantly, we're completely lacking important metrics like programmer time, maintainability, and bugginess.

Thus, just as someone out there thinks Circus Peanut Gelatin pie is a good idea, so most of these languages are the right tool for some job. We can't use these benchmarks to brand a language as useless. What I think the benchmarks and visualization can do is introduce you to general-purpose languages that may be a better solution for many tasks.

In particular, you might want to take a gander at the Pareto-optimal languages: those which, for every other language L, are better than L in at least one metric under consideration. If we consider source code length and real time as the two metrics, then the Pareto-optimal languages are:

1 core4 cores
More expressive
Ruby 1.9
Ruby JRuby
Javascript TraceMonkey
Python PyPy
JavaScript V8
Lua LuaJIT
Haskell GHC
Java 6 SteadyState
C GNU gcc
More expressive
Ruby JRuby
Python CPython
Erlang HiPE
Haskell GHC
F# Mono
Java 6 Steady State
C GNU gcc
C++ GNU c++

From top to bottom, these languages trace the best points in the tradeoff between expressiveness (at top) and speed (at bottom). Perhaps what this does best is to illustrate why it is hard to pick a "best" language. For the single-core case, 27% of the languages are on the list above; for quad-core, 48% made the cut. Even with just two simple metrics, many languages might be considered "optimal"!

Coming soon: a visualization of the 3D tradeoff space.


Last year, Guillaume Marceau produced informative plots based on the The Computer Language Benchmarks Game, similarly showing the tradeoffs between metrics. The CLBG has now included similar plots on their site. [Updated: the CLBG didn't use Marceau's plots directly.] The visualization here summarizes more (which can be good and bad), includes the memory metric and the quad-core data, and lets you interactively play with the data thanks to Google Motion Charts. A chat with Rodrigo Fonseca several years ago got me interested in plotting these tradeoffs. Finally, my apologies to those without Flash for the chart.