Code reviews: the lab meeting for code

While working as a postdoc at CU Boulder I met Rob Knight, a professor in biochemistry and molecular biology who had a strict discipline of doing, in addition to his group lab meetings, regular code reviews with all of his lab members. Rob’s group is extremely active and productive, and I recently asked him about his continued use of this process. He does still hold them and explained that he feels they are a key element of his lab’s success; he also provided some very useful feedback based on which I put these notes together.

With Rob’s permission, I am posting here a set of guidelines that come from his reply to my questions, which I intend to use in my own work here at UC Berkeley. I have made minor edits to his notes, but the full credit of these ideas, fine-tuned and validated over more than 7 years of practice, goes to Rob, to whom I’m very thankful for allowing me to reuse them here. I do claim full credit on any blunders I may have added in editing.

These meetings bridge research and practical implementation questions: sometimes the discussion may focus on fundamental algorithmic ideas or the approach to a problem, other times the problems will be documentation, testing or implementation clarity. But I feel that as our research sits on an increasingly large and complex pile of software tools, many of them written in-house, we must treat these tools with the attention and expectations of rigor that we have for other parts of the research work. And the only way to do that is to scrutinize the code with the same peer-review process that we use for results within a lab before we put them out in a paper sent for publication (where they do get further reviewed).

Presentation guidelines

The most difficult part of writing code is always to make it understandable to other people, including yourself a few months down the track. There’s certainly no shame in finding out that your code wasn’t as easy to understand or use as you’d hoped, so don’t take it personally when it happens (which it always does, at least in my experience), but treat it as an opportunity to improve.

Here’s some general guidelines for making your code review go smoothly:

  • Make sure the motivation for your code is clear. Someone who isn’t intimately involved with your project should understand from the module documentation and the comments what you are trying to do, what approach you’re taking, and why they should expect it to work.
  • Take some time to prepare a presentation about your code that will answer the above questions even for someone who hasn’t read the code. You’re more likely to get useful feedback, rather than nitpicking about syntax, if the audience can see the big picture.
  • Get the code sent out at least a few days beforehand along with some background about what to look at (if large), whether suggestions should be about architecture/implementation/algorithm/requirements, etc. Make sure everyone has enough time to read the code beforehand, and don’t send a series of updated versions immediately before code review.
  • Don’t try to present too much. 200 lines is an absolute maximum – 50 is usually more reasonable.
  • Include examples, either as unit tests or standalone in the module’s __main__ block. It can often be a lot easier to see an example working than to figure out the algorithm from the code.
  • Ask yourself, before sending the code out, if it could be simpler. Break complex routines down into smaller ones, consider using built-in functions and data types instead of custom ones, see if you can separate complex logic into a dispatch routine and simple individual functions, etc. This also generally helps you make the code more testable.
  • Follow the coding guidelines and Python idioms applicable to your group. At UC Berkeley I am following for all of my projects the Python style guide encoded in PEP 8, supplemented where necessary (in particular with regards to docstring standards) with the Numpy coding style guidelines.
  • Take advantage of existing mature code in your group, the Python cookbook, and other sources of existing code to get an idea of how to do things. Many problems have already been solved for you. It’s always better to reuse well-tested code (if it does what you want) than to reinvent the wheel.
  • It’s OK to present code that doesn’t work yet, or that you think needs improvement, as long as you signal this fact very clearly in your email sending out the code. If there are specific areas you want people to focus on, don’t keep it a secret – it’s a waste of time to give detailed feedback on code that is about to be scrapped.

Suggestions for the meeting leader

Rob also provided the following tips for the person leading the meetings, which I think are very valid and complement the general guidelines above for the entire group. These are mostly about setting the right tone for the discussion, so that a productive culture develops where the process can be both rigorous and friendly.

  • Keep it a safe environment, i.e. make sure chastising is relatively gentle even when deserved (but do point out when code doesn’t meet the required standard – frame it as a learning experience though).
  • Make sure there’s a core of vocal participants so it isn’t always you.
  • Make it clear when you’re presenting yourself that your code isn’t perfect, point out some of those imperfections yourself if audience is slow to do so, do present yourself.
  • Patiently explain when things are not wrong but just stylistic differences (but make it clear that some styles are bad, often helpful to e.g. ask people to guess what a function returns from its name).