Reading Code Is Hard
Escalation Engineer JeremyK asks in his blog this morning:
how do you teach people this “art” of digging deep very quickly into unfamilar code that you had no hand in writing? I myself, I come from a very traditional process of learning how to code, by sitting down and writing it. I am struggling with how to tailor a delivery to focus on reading vs. writing source code. To me the only way you can be truly efficient in this process is by having written code yourself.
No kidding Jeremy -- code is way easier to write than it is to read.
First off, I agree with you that there are very few people who can read code who cannot write code themselves. It's not like written or spoken natural languages, where understanding what someone else says does not require understanding why they said it that way. For example, if I were to say something like
"There are two recipes for producing code: a strict and detailed, and a vague and sloppy. The first produces elegant, tiered wedding cakes, the second, spaghetti."
you would understand what I meant to get across, without having to understand that I'm using the literary techniques of "zero anaphora" and "parallel clauses" to produce a balanced, harmonious effect in the listener/reader. Heck, you don't even have to know what a "verb" is to understand a sentence! But with code, it is vitally important that the both intention of the code's author and how the code produces the intended effect be clear from the code itself.
Therefore, I would turn the question around -- how do we WRITE code that is more easily read by people who need to get up to speed very quickly on the code, but who didn't write any part of it?
Here are some of the things I try to do when writing code so that it can be more easily read:
- Make the code amenable to tools. Object browsers and Intellisense are great, but I'll tell you, I'm old school. If I can't find what I want via grep, I'm not happy. What makes code greppable?
- Variables with names like "i" are badness. You can't easily search code without getting false positives.
- Avoid making names that are prefixes of other names. For example, we have a performance marker in our code called "perfExecuteManifest", and another called "perfExecuteManifestInitialize". Drives me nuts every time I want to grep the source code for the former, I have to wade through all the instances of the latter.
- Use the same name for “tramp data” in both places. By tramp data, I mean those variables that you pass to a method only because they need to be passed on to another method. The two variables are basically the same thing, so it helps if they have the same name.
- Don't use macros that rename stuff. If the method is called get_MousePosition then don't declare it with a macro like GETTER(MousePosition) -- because then I can't grep for the actual function name.
- Shadowing is evil. Please don't do it.
- Pick a consistent naming scheme. If you're going to use Hungarian, use it consistently and universally, otherwise it becomes an impediment rather than a benefit. Use Hungarian to document data semantics, not storage types. Use Hungarian to document universal truths, not temporary conditions.
- Use assertions to document preconditions and postconditions.
- Don't abbreviate English words. In particular, don't abbreviate them in really weird ways. In the script engines, the structure that holds a variable name is called NME. Drives me nuts! It should be called VariableName.
- The standard C runtime library is not a paragon of good library design. Do not emulate it.
- Don't write "clever" code; the maintenance programmers don't have time to figure out your cleverness when it turns out to be broken.
- Use the features of the language do to what they were designed to do, not what they can do. Don't use exceptions as a general flow control mechanism even though you can; use them to report errors. Don't cast interface pointers to class pointers, even if you know it will work. Etc.
- Structure the source code tree in functional units, not in political units. For example, on my team now the root subdirectories are "Frameworks" and "Integration", which are team names. Unfortunately, the Frameworks team now owns the Adaptor subdirectory of the Integration directory, which is confusing. Similarly, the various sub-trees have some subdirectories which are for client side components, some for server side components. Some for managed components, some for unmanaged components. Some for in-process components, some for out-of-proc components. Some for retail components, some for internal testing tools. It's kind of a mess. Of all the possible ways to organize a source tree, the political structure is the least important to the maintenance programmer!
Of course, I haven't actually answered Jeremy's question at all -- how do I debug code that I didn't write?
It depends on what my aim is. If I just want to dig into a very specific piece of code due to a bug, I'll concentrate on understanding data flow and control flow in the specific scenario I'm debugging. I'll step through all the code in the debugger, writing down the tree of calls as I go, making notes on which methods are produces and which are consumers of particular data structures. I'll also keep a watchful eye on the output window, looking for interesting messages going by. I'll turn on exception trapping, because usually exceptions are where the interesting stuff is, and because they can screw up your stepping pretty fast. I'll put breakpoints all the heck over the place. I'll make notes of all the places where my suggestions above are violated, because those are the things that are likely to mislead me.
If I want to understand a piece of code enough to modify it, I'll usually start with the headers, or I'll search for the public methods. I want to know what does this class implement, what does it extend, what is it for, how does it fit into the larger whole? I'll try to understand that stuff before I understand how the specific parts are implemented. That takes a lot longer, but you've got to do that due diligence if you're going to be making changes to complex code.
Comments
Anonymous
June 14, 2004
"Don't write 'clever' code".
Agreed!! We've got a bunch of code laying around with stuff like this:
bComplete = (sState = "complete" ) ? true : false;
Hello? How about simply:
bComplete = (sState = "complete");Anonymous
June 14, 2004
The comment has been removedAnonymous
June 14, 2004
What is shadowing?Anonymous
June 14, 2004
The comment has been removedAnonymous
June 14, 2004
It sounds as though Jeremy is trying to teach non-coders to read code, which may not be entirely realistic.Anonymous
June 14, 2004
I read a great book recently (not yet released) that focused on a very simple premise:
Code craft is not about writing code that computers can read. A computer can understand o.e(q); but that doesn't mean you should use it.
No, being a true coder comes from the understanding that you are not writing code for a computer. You are writing a language that other PEOPLE will need to read and understand.
Some people would say "You should make sure you add lots of comments." But isn't the goal to write code that doesn't need to be commented? Shouldn't your code use a consistent, intuitive structure on every level, from the directory to the class library?
True masters of the craft write code that needs no comments, because it is completely apparent, through variable and function naming, what is going on. The only comments that should be neccessary are API documentation tags, which are wholly different from : 'Loop through the collection to enumerate the names
If reading code is hard, it's only because developers are poor communicators. I talked about this concept in more detail a few months ago here: http://weblogs.asp.net/rmclaws/archive/2004/01/30/65445.aspxAnonymous
June 14, 2004
I know some developers who've moved from VB 6 to C#:
if ( blah == true )
{
}
Aaargh!!Anonymous
June 14, 2004
There's a book on the subject:
http://www.spinellis.gr/codereading/
IMHO, it's very good.Anonymous
June 14, 2004
A good and perhaps unique book, all about how to read code:
http://www.spinellis.gr/codereading/Anonymous
June 14, 2004
Your post was interesting, but it didn't directly help my quandary. KC didn't include my entire post... I have been tasked with delivering a class on how to "read" source to troubleshoot issues. My role routinely involves utilizing these abilities to read, rationalize, and reverse engineer code, but how to convey these to others?Anonymous
June 14, 2004
The comment has been removedAnonymous
June 14, 2004
The comment has been removedAnonymous
June 15, 2004
I agree that if you keep the loops short, you can get away with cheap indexers. But still, loops have this tendency to get longer and longer...
And it's not just grep, it's also searching using my editor's regexp search function which is easier if the variables are unique in the text.
I personally will write loop indexers as "iFoo", to call out that I'm indexing over an array of Foos. I know other people who prefer to use ii, jj, kk as their loop indexers. Short, clear, traditional, easy to search.Anonymous
June 15, 2004
Another way to make the code amenable to tools: use a consistent coding style. If the function names in function definitions always start in column 1 you can find them (with your editor or grep) with the regular expression "^name".Anonymous
June 15, 2004
The comment has been removedAnonymous
June 17, 2004
The comment has been removedAnonymous
June 17, 2004
And what about grep -w perfExecuteManifest ?Anonymous
April 19, 2007
Write Readable Code By Making Its Intentions ClearAnonymous
April 19, 2007
I don’t think it’s too much of a stretch to say that the hardest part of coding is not writing code,Anonymous
September 06, 2007
The comment has been removedAnonymous
October 11, 2007
I was thinking about this a bit more and talking with Larry Osterman yesterday, and I came up with someAnonymous
October 28, 2010
as<input type="text" /> <textarea></textarea>