What's wrong with computer science reviewing?

There is a sense among some researchers in computer science that many peer reviews in our field are bad — in particular, too often unfairly slanted against papers in various ways that do not encourage good science and engineering. Why might this be happening and what can we do about it?

[Aside: you can now follow me on Google+. Short posts there, long posts here.]

The problem

First of all, let me be clear: (1) I think most reviews, regardless of whether they recommend acceptance or rejection, are well done and reflect care and significant time that the reviewers have invested. (2) Exciting, impactful research still manages to get done, so the system does generally work pretty well. Still, that doesn't mean we can't improve it. (3) Despite the fact that I try to take care with each review, statistically speaking I have probably committed each of the problems discussed here.

With the fine print out of the way ... what is this possible problem? The evidence is almost all ancedotal and biased. But since that is all we have, let me supply some anecdotes, first that reviews are often negative:

  • This July's edition of Computer Communication Review rejected all its peer-reviewed submissions, thus matching (at least for one issue) the Journal of Universal Rejection as the most prestigious journal as judged by acceptance rate.

  • Jeffrey Naughton critiqued the state of research in the databases community with bad reviewing ("Reviewers hate EVERYTHING!") as a key problem, giving the anecdote that in SIGMOD 2010, out of 350 submissions, only paper 1 had all reviews rate it "accept" or higher; and only 4 had an average rating of "accept" or higher.

  • Taking a recent major systems-and-networking-related conference as a representative presumably-normal example, papers received generally 4 or 5 reviews and were scored on a scale of 1 to 5. Out of about 177 submissions, only six accepted papers received any 5's. Only one received more than a single 5, which also says something about variance.

  • "...there is a pervasive sense of unease within these communities about the quality and fairness of the review process and whether our publication processes truly serve the purposes for which they are intended. ... It was clear from the reaction to the panel that concerns with the reviewing process cut across many, if not all, fields of computer science." (Panel summary from a session at the 2008 CRA Conference at Snowbird)

Is it just a reweighting problem? If reviews were just conservative numerically, any problem could be fixed by "curving" the scores up. That would be nice, but ... another anecdote:

Even taking into account the fact that the survey included authors whose submissions were rejected and might just be grumpy, those numbers seem undesirable. Reviews are sometimes wrong or emphasize unimportant or subjective problems, even when the paper gets accepted. (And that's not necessarily the reviewer's fault.)

But this is true in every field, right? After all, authors have been complaining about criticisms for centuries. Here, by the way, are a couple favorite criticisms:

"Your manuscript is both good and original. But the part that is good is not original, and the part that is original is not good." (unattributed)
"In one place in Deerslayer, and in the restricted space of two-thirds of a page, Cooper has scored 114 offenses against literary art out of a possible 115. It breaks the record." (Mark Twain, How to Tell a Story and Other Essays.)

Getting back to the point, there is plenty of precedent for reviewers not seeing the light. But there is anecdotal evidence that this is a bigger problem in CS than in certain other areas:

  • A study of NSF panel reviews found that reviewers in computer science give lower scores on average than in other areas. (Note: I read this in CACM or some other magazine but now I can't find it; if you can, please let me know.) Update: Here's the data: CISE proposals average 0.41 points lower than other directorates.

  • While it appears to be a common (but not universal) belief in CS that reviewers are too-often wrong and frustrating, I'm told by at least one physicist that that is not the general feeling about reviewers in that field. They are "rarely out to actively find problems with your paper", and while they may often misunderstand parts of the paper, the authors can respond and usually the reviewers or the journal editor will accept the response. Publication is still competitive and often annoying for other reasons, but reviewers are generally reasonable.

So what? Does this have any negative impact? As pointed out by others:

  • Researchers may be discouraged. (I know of at least one top PhD graduate who went to industry citing weariness with "selling" papers as one cause.)

  • It puts CS at a disadvantage with other fields, if we are generally more negative in grant proposal reviews. As Naughton wrote, "funding agencies believe us when we say we suck".

  • More speculative or unusual work (with dozens of potential challenges for the approach that a reviewer could cite) is at a disadvantage compared to work with well-known quantitative metrics for evaluation.

  • Variance in reviews may make papers more likely to return for another round of reviewing at another conference, increasing time to publication and reviewer workload.

Causes of the problem

Naughton suggested that "Reviewers are trained by receiving bad reviews from other reviewers who have received bad reviews in the past". Keshav suggested human failings and increasing reviewer workloads. Without disagreeing with those possibilities, I'm wondering what about CS in particular might exacerbate the problem? Here are two ideas.

  1. No author response to reviewers. As a consequence of CS's focus on conferences, most venues (in my area) have no opportunity for authors to answer reviewer criticism. The communication is author ---> reviewer ---> author, with no feedback to reviewer. It's a little like putting papers on trial without a defense team. As a result:

    • Bad reviews are more likely to happen, because the reviewer typically never learns if they have submitted a bad review, and is not really held accountable.
    • Once a bad review does happen, there's no chance to fix it.

  2. Focus on bugs. (This is extremely speculative.) As computer scientists, we are really great at spotting bugs, and that's a good thing when you're writing code. Possibly, some of that carries over into reviewing more than it should. Maybe bug-finding is easier than thinking carefully about contributions of the paper — especially if, once you honestly think there's a bug, you don't have to do any more work even if you're wrong. (Just noticed that someone else had the same idea.)

Fixing the problem

I'm suggesting these as possible directions to discuss, not as solutions I think are guaranteed to work.

  1. Allow authors to respond to reviewers. Just as in TCP's three-way handshake, one would hope that both involved parties get feedback. Responses, at least in theory, (1) create incentive for better reviews and feedback to help improve, (2) allow authors to point out simple misunderstandings in reviews. (Note that some venues, like ASPLOS, have rebuttals. And in fact, CCR has reasonably fast turnaround and allows responses to reviewer comments as they arrive. Apparently that didn't help the July issue, though...)

    One could argue that reviewers already have an incentive to do well, because they have their reviews looked at (or even voted upon!) by other program committee members. But other reviewers don't know the paper and its area as well as the authors; and reviewers have at least as much incentive to maintain a friendly relationship with other reviewers as they do to argue the case for a specific paper. Arguing a case after one reviewer has taken a negative stand involves extra effort and to a certain extent puts one's reputation on the line. I suspect the most effective response comes from the authors. They have the needed incentive and knowledge.

That seems like the most obvious approach, but it does require organizational change. There are some smaller steps that might be easier on an individual level.

  1. Avoid Naughton's checklist for bad reviewing. Quoting him directly:

    • Is it "difficult"?
    • Is it "complete"?
    • Can I find any flaw?
    • Can I kill it quickly?

  2. Focus on what a paper contributes, not on what it doesn't contribute, which is always an infinitely long list. Focusing on the absent results will inevitably lead any paper to the wastebin of rejection and any author to a pit of misery.

    In particular, it seems to me that "This paper didn't show X" is, by itself, not a valid criticism. It is an irrelevant factoid unless it negates or diminishes some other contribution in the paper. If it is fair to argue that particular results are absent, then my first beef with every paper is going to be that it fails to resolve whether P ≠ NP.

    Of course, a paper should get more "contribution points" for a better and more thorough evaluation, but perhaps it's OK to leave some questions unanswered. Particularly since it's often hard to predict which particular dimension or potential inefficiency the reviewers will be interested in. Leaving certain questions unanswered is entirely compatible with the paper making other useful contributions.

  3. Submit to arXiv, bypassing the reviewing process entirely and letting other researchers judge what they want to read. Subscribe to arXiv RSS feeds so you find out about other people's work more quickly. Of course, arXiv currently has limited value for CS systems and networking researchers, since other such researchers tend not to look for papers there. More on that later.

  4. Adopt policies that tolerate some reviewer pessimism. As an example of what seems to me like a bad idea, a recent workshop had a reviewing policy that allowed a single reviewer to effectively veto a paper if they strongly disliked it.

  5. Implement feedback yourself. If a conference doesn't provide a means for author feedback to reviewers, the reviewer could implement this herself by including in the review a way to provide feedback, e.g., a link to a Google Docs form that could preserve the anonymity of the reviewer and authors. Disadvantage: This only fixes a piece of the problem and might seem strange to PC chairs and authors.

Other past suggestions include reducing PC workloads, making reviews public, maintaining memory across conferences (so resubmissions are associated with old reviews), and much more; see links above and below.

The open question is, which of these will best improve the quality of reviews and, ultimately, CS research? My guess is that any good solution will include some form of author response to reviewers, but there are several ways to do that.

There's voluminous past discussion on this topic. Related links:

Update: SIGCOMM 2012 will have rebuttals. Also, Bertrand Meyer has something to say about CS reviewing.

8 comments:

  1. Thanks for writing this post.

    maintaining memory across conferences (so resubmissions are associated with old reviews)

    ICFP'11 just tried this and it was a total catastrophe.

    Several papers marked as "resubmissions" received only one or two reviews, and were summarily rejected due to lack of reviewing (or on the basis of a *single* review!).

    I understand and support the reasoning behind the "memory" idea, but unfortunately researchers -- especially in CS -- fetishize novelty. Any sort of indication that the paper has been peer-reviewed in the past can only be negative. Authors are foolish to take advantage of any such mechanism.

    ReplyDelete
  2. That's interesting to know. At a minimum, the memory proposals seem logistically complicated, and don't fulfill the goal of author response.

    ReplyDelete
  3. Very interesting post! I really like the idea of author response to reviewers - another conference that has started to do this in the last couple of years is CCS. However, it is not so clear if the rebuttals are really having some influence in the decision making ...

    ReplyDelete
  4. However, it is not so clear if the rebuttals are really having some influence in the decision making

    That has been my experience as well. The author response mechanism makes authors feel better, but I'm not sure it does much more than that.

    On the other hand I am very encouraged by the fact that double-blind reviewing has been making a minor comeback in the last year or so.

    At the moment I think the best hope for improving the situation is some sort of authors-rate-the-reviewers'-understanding mechanism so that lazy and/or malicious reviewers feel like they have something to lose.

    For this to be meaningful, reviewer ratings would need to persist in some way across conferences (even ACM-wide?), which raises anonymity concerns. The solution is not obvious, but I'm sure one of the cryptographers in the CS community could come up with something solid; reviewers already trust the program chairs to shield their identity, so continuing to rely on that assumption should make the problem easier.

    ReplyDelete
  5. That has been my experience as well. The author response mechanism makes authors feel better, but I'm not sure it does much more than that.

    I've heard that too, but I don't completely buy it. Author response can help by motivating reviewers to write better reviews and perhaps even be more open-minded in the first place. That benefit would be invisible if you just look at whether rebuttals change anyone's mind.

    Also, it's possible that authors and reviewers are just not used to the mechanism yet and are not using it as well as it could be. I've never served on a PC with author response, so I can't say. But other fields publish in journals use author responses as the normal case and apparently find them useful.

    ReplyDelete
  6. One option you didn't mention is jettisoning the conference entirely, since much of the sins of reviewing can be laid at the feet of the compressed timeline forced by conferences.

    While this would seem overly radical, it's what VLDB (one of the flagship DB conferences) has essentially done, with its rolling monthly deadlines for paper submission throughout the year. All papers submitted within a fixed 12 month window are eligible for the next year's conference if they are accepted, and there's a proper journal review process that goes on.

    It is hard to measure the effect of this, because VLDB is only one of three/four major conferences in DB, but I think it's a model that provides a gentle transition to a more deliberative journal-like review process.

    ReplyDelete
  7. I don't have any direct experience with it, but conceptually, I like that model a lot.

    ReplyDelete
  8. @Suresh: Have people noticed a change in the quality or carefulness of the reviews since that change?

    I submitted my first VLDB paper last year (albeit on something related to distributed query data structures for virtual worlds, something a bit outside the norm), and was quite unimpressed in the brevity and superficialness of our reviews (compared to my normal SIGOPS venues).

    And yes, I realize a single data point is a really bad indicator of the norm :)

    ReplyDelete