Creating research ideas over time

Research is built on ideas: identifying questions to investigate, problems to solve, or new techniques to solve them. Before I started as faculty one of my biggest doubts was whether I would have good enough ideas to lead a research group and help shape five or six years of each student's career. There is no deterministic procedure to sit down and generate an idea.

However, we can think about how to improve the conditions for them to pseudorandomly appear.

Since sometime in my first year of grad school (2003), I've kept a document logging ideas for research projects. The criteria for including an idea was simple and has remained, at a high level, fairly consistent:  When I have an idea that I think would have a reasonable chance at leading to a publishable paper, I jot down a description and notes. This is useful to help remember the idea and the document is also a convenient place to record notes over time, for example if I notice a related paper months later.

Having grown over almost exactly ten years to 169 entries, this document is now an interesting data set in its own right.

The data

Of course, this is not a uniform-random sample of ideas. There are various biases; not every idea makes it into the document, my inclusion standards might have changed over time, and so on. And many of these ideas are, in retrospect, terrible. But let's take a look anyway.

Here is the number of ideas in the document per year. (The first and last were half years and so the value shown is twice the actual number of ideas.)

Now let's probe a bit deeper. The number of ideas might not tell the whole story. Their quality matters, too. To investigate that, I annotated each idea with whether (by 2013) it successfully produced a published paper. I also tagged each of the 169 ideas with an estimate of its quality in retrospect (as subjectively judged by my 2013 self), using a scale from 1 to 10 where

  • 5 = dubious: maybe not publishable, or too vague to have much value
  • 6 = potential to result in a second-tier publication
  • 8 = potential to result in a top-tier publication (e.g. SIGCOMM or NSDI in my field)
  • 10 = potential to result in a top-tier publication and have significant additional impact (e.g., forming the basis of a student's thesis, producing a series of papers, a startup, etc.)

The number of reasonably high quality ideas and the number that produced papers both show significant jumps in 2008-2009, though with different behavior later.  Also, the plot below shows perhaps a gentle increase in the mean idea quality over time, and a bigger jump in the quality of the top 5 best ideas in 2008. Note that even a one-point quality difference is quite significant.

The most prominent feature of this graph is an enormous spike in number of ideas in the 2008-2009 timeframe, and a corresponding increase in higher-quality ideas. What might have caused this? And what can we conclude more generally?

Ideas need time for creative thought

A significant change in my life during 2008-2009 was my transition from PhD dissertation work to a postdoc year (split between working on post-dissertation projects at my PhD institution of UC Berkeley, and working with Sylvia Ratnasamy at Intel Labs).

This appears to show the value of having time to step back and think -- and also the opportunity to interact with a new set of people. By May 2008, I was largely done with my dissertation work (though I did work on it more later and finally graduated in May 2009). I had accepted a position here at the University of Illinois and deferred for a year. So I was largely free of responsibilities and concerns about employment, and had more time to be creative. While there are reasons to be concerned about the surge of postdocs in computer science, I think this indicates why this particular kind of postdoc can be extremely valuable: providing time and space for creative thought, and new inspiration.

If that is the explanation, then it seems I was not sufficiently proactive about creating time to be creative after the flood of professorial tasks hit in late 2009.

There are alternative explanations.  For example, knowing that I was about to enter a faculty position, I might have more proactively recorded ideas for my future students to work on. However, that would not explain another observation -- that my creative expression at this time in other areas of my life outside computer science seemed to increase as well.

John Cleese has argued that creativity takes time, and it's more likely to happen in an "open mode" of playful, even absurd thought, rather than in a "closed mode" of efficiently executing tasks:

His talk makes other points relevant to academic research. In particular, you are less likely to get into an "open mode" of thought if you are interacting with people with whom you're not completely comfortable.  This should certainly affect your choice of collaborators.

It's worth noting that having time to enter an "open mode" of creative thought does not mean that one is thinking free of any constraints whatsoever. I personally find that constraints in a problem domain can provide some structure for creative thought, like improvising around a song's fixed chord changes in jazz.

Ideas need time to germinate

In fact, some ideas need years.

You'll note from the second plot above that the number of paper-producing ideas is zero in 2012 and 2013. This is not just random variation: It's actually fairly unlikely to have an idea and immediately turn it around into a paper.  In fact, it has happened fairly often that an idea takes a year or two to "germinate".  I might write down the seed of an idea, and at that time not recognize whether it is valuable and what it might become.  In coming back to it occasionally, and combining it with other ideas, and bouncing the idea off other people, the context and motivation and focus of the idea gradually takes shape until it is something much stronger and which I can recognize as a worthwhile endeavor.

And that is all before the right opportunity appears to begin the project in earnest -- such as a PhD student who is looking for a new project and is interested in the area -- and the project is actually developed and the paper written and submitted (and resubmitted ...) and finally published. The most extreme example I've been involved with was a 2005 idea-seed that was finally published in a top conference seven years later. In fact, in processing this data I realized there was a second idea from 2005 which lacked sufficient motivation at the time and got somewhat lost until 2011 when it combined with a new take on a similar idea that grew out of a student's work and was published in 2012. The plot below shows 14 lines, each corresponding to a project, with points at the inception year of the seed idea, intermediate ideas if any which combined with it, and finally the year of publication.

Ideas from connections

Reading over the document made it clear that very few if any of the ideas sprang from out of nowhere. They come from connections: with a paper I read, or with a previous project, or in chatting with collaborators. Some of these connections can be quite unexpected. For example, one project on future Internet architecture indirectly inspired a project on network debugging.

Many of the ideas on the list in fact owe at least as much to collaborators as they do to me. This likely is a big part of the rise in number of ideas after becoming faculty. Although I lost some of my open creative time after beginning as faculty, I gained a set of fantastic students.

Conclusions

Generating and selecting among ideas is an art, one of the most important arts to learn over years of grad school. I will never feel that I've truly mastered that art. But studying my own history has suggested some strategies and conditions that seem to help, or at least seem to help me.

Ideas are more likely to appear when I have time or create time to think creatively, rather than simply appearing for free.

They often need to germinate over a period of months or years.

And perhaps most importantly, they are most likely to grow out of connections with other work and other people.