Programming Language Wars: The Movie

In computer science and hacker circles, the programming language wars have, it seems, been raging since the beginning of time. A little electronic archaeology reveals some amusing exchanges:

  • "By all means create your own dialect of FORTH. While your at it, you can add the best features of PL-I, F77 and CORAL66. Then, look me up when you get out of college and we'll show you how it's done when you have to make a living" [1985 thread]
  • "This debate ... is very much like two engineers engaged in building a three-mile bridge arguing over the brand of blue-print paper they use." [1987 thread]

Passionate arguments can often be improved by actual measurements. How fast, expressive, and efficient is a particular language? That's what The Computer Language Benchmarks Game set out to provide, measuring time, source code length, and memory use of several dozen languages across a set of benchmarks.

If you have measurements, why not improve them with a visualization? And so I present to you an interactive, multi-dimensional, dynamic, whizbang-o-matic rendering of the Programming Language Wars.

What happened to the Internet on Friday

Note to readers: Judging from the past, this blog will have posts related to both computer science and politics. If you like, you can view just CS or just politics posts, or subscribe to feeds for just CS or just politics.

On Friday, a large disruption of Internet traffic made the news as an experiment gone awry. What actually happened? It's a good lesson in how fragile and insecure the Internet's routing protocol can actually be.

Political divisiveness at an all-time high, quantitatively speaking

There's a lot of talk about politics becoming more divisive across parties, with less and less common ground, and Republicans being called and embracing the "party of no" label. But is it true? No need to speculate; we can find out with the help of the raw data in the form of Senate roll call votes (the last ~21.5 years of which are conveniently available), plus some perl scripts.

Number of votes over time

Let's start with something simple. Here's the number of yea/nay votes over time in the Senate. (By yea/nay votes, I mean I'm excluding some exceptional votes such as when the vote is Guilty/Not Guilty.) As the plot below shows, the number of votes has remained fairly steady across time, except that odd-numbered years get more votes and there was an unprecedented spike in 1995. And of course, 2010 is at a disadvantage for obvious reasons.

The Party of No

Does one party vote "no" more often? Below is the fraction of Yea votes cast by each party across time. There is some signal in this data, such as the Republican takeover in the mid-90s. And 2010 is so far at very low fraction of Yea votes by Republicans (49%), beat only by 1993 (43%). But this data clearly requires some more interpretation as to what a Yea or Nay vote actually means. One has to question, for example, why Republicans voted Yea more often in 2009 than Democrats.

The naysayingest senator

Moving to individual member stats, aggregated over the 21.5 years of the data, most members vote no about 30-40% of the time, but there are outliers who are quite agreeable and quite disagreeable:

Here are the top 20 outliers on either end. Current sitting senators, none of whom are in the Yeasayingest column, are linked to their Wikipedia pages.

YeasayingestNaysayingest
Member% NayMember% Nay
Barkley (I-MN)*14.3 Wallop (R-WY)49.9
Carnahan (D-MO)21.5 DeMint (R-SC)47.9
Burdick (D-ND)23.7 Coburn (R-OK)47.8
Matsunaga (D-HI)24.0 LeMieux (R-FL)45.5
Krueger (D-TX)24.3 Goodwin (D-WV)*45.5
Burdick, Quentin S (D-ND)24.8 Symms (R-ID)45.1
Mathews (D-TN)25.6 Armstrong (R-CO)45.0
Riegle (D-MI)26.1 Humphrey (R-NH)44.9
Bentsen (D-TX)26.2 Vitter (R-LA)43.5
Sanford (D-NC)26.8 Barrasso (R-WY)43.4

Jim DeMint takes the prize as the naysayingest sitting senator, but he's Walloped by the naysayingest of all time who voted no on almost half his votes. Malcolm Wallop very nearly maximized the entropy of his votes. Again, this data should be taken with a grain of salt, since how often one votes Nay depends on who currently controls congress.

*Barkley and Goodwin don't really count; Barkley was briefly appointed to replace Paul Wellstone and only cast 14 votes, 12 of them Yea. Goodwin was appointed less than two weeks ago to replace Robert Byrd and has only cast 11 votes, 6 of them Yea.

General disagreement

Yea or Nay votes could mean almost anything, depending what question is being voted upon. Here's a more robust metric: the agreement of a vote is 1 if everyone voted the same way, 0 if the vote was split 50/50, and linearly interpolated in between. We're currently at an all-time low of 34.4% (where "all-time" = the last 21.5 years), as you can see below.

But this doesn't expose the divisiveness between parties.

Political divisiveness

Here's what we really want to see: the average distance between Republicans and Democrats. On some particular vote, we can represent the average Democrat position as the fraction of Dems voting Yea; same for Republicans. Divisiveness is the distance between these two average positions. For example, if 50% of Dems and 50% of Reps vote yes, then divisiveness is 0. If 10% of Dems and 90% of Reps vote yes (or vice versa), then divisiveness is 0.8.

The data shows a striking difference. Politics were more centrist in the late 80s. Divisiveness didn't move much for about 18 years, but then divisiveness dramatically spiked since the beginning of the Obama administration, setting a record in 2009 and another record so far in 2010. The difference here is really quite dramatic: 29% divisiveness in 1989, vs. 70% today.

Party unity

Party unity is closely related to divisiveness. I'm defining unity as the fraction of members who take the majority position in their party, so 0.5 is the minimum score and 1 is the maximum. Party unity appears to have increased over time, though with some wild shifts particularly on the Republican side. Interestingly, Democrats seem to be just as unified as Republicans on the mean vote.

Anyone know where to get data going back earlier than 1989?

Update (July 29, 2010):

A few important points to emphasize:

  • One shouldn't read too much into this data. Divisiveness, as defined above, has increased---but this says nothing about why it has increased. As commenter GoldenBoy pointed out at politicalwire.com, we should naturally expect divisiveness to increase when one party has control of congress and the White House, since they have less need to work with the other party in order to pass legislation. The analysis here really says nothing about to what extent this or other factors caused the increased divisiveness.
  • Political scientists have performed much more extensive analysis than the simple graphs I've plotted in this post. Brendan Nyhan pointed me to Voteview.com which has great plots of polarization, party unity, and more back to 1879. As a commenter below noted, they also have complete historical roll call data back to the first Congress.
  • It's important to put the recent increase in divisiveness in the context of a longer-term trend of increasing polarization. There does seem to be a divisiveness spike in the last couple years, but in general an increase is not surprising.

Eyjafjallajökull

Incredible photos of lightning storms over Eyjafjallajökull Volcano in Iceland.

An optimization problem

You are interviewed for an arbitrary job, which happens to do a background check. What criminal record maximizes your chance of being hired?

"None" is a valid criminal record, but is unlikely to be the optimal answer.

Inspired by a true story.

SIGCOMM 2009

For readers interested in networking, I note that the SIGCOMM 2009 program is now available.

Also available is Pathlet Routing, our paper with Igor Ganichev, Scott Shenker, and Ion Stoica. Pathlet routing is a new Internet routing architecture which can improve scalability by enabling very small forwarding tables, and can allow senders to choose between multiple paths for improved reliability and path quality. The idea is basically to do source routing over a virtual topology whose nodes are arbitrary virtual nodes (vnodes) and whose links are sequences of vnodes (pathlets). Intuitively, this architecture is highly flexible because vnodes can represent arbitrary granularities, and because pathlets can represent policy constraints on routing while simultaneously enabling a large number of path choices. This is because sources can stitch together pathlets to form an end-to-end route in potentially exponentially many ways.

An interesting property of the design is that it doesn't impose a global requirement on what "style" of routing policy is used, but rather allows multiple styles to coexist. One router could choose to have routes like in today's Internet, with a giant forwarding table specifying only a single allowed route to each destination. And the next router could have a tiny forwarding table that still gives the network owner some control, but provides a high degree of path choice for the senders. I think of this as being very much in the spirit of the principle of designing for variation in outcome advocated by Clark et al. in their Tussle in Cyberspace paper.