Notes and quotes from Debugging
These are takeaways from the book Debugging, by David Agans.
This book is general; it’s not about specific problems, specific tools, specific programming languages, or specific machines. Rather, it’s about universal techniques that will help you to figure out any problem on any machine in any language using whatever tools you have.
The book introduces the nine golden rules of debugging, and devotes a chapter to each. It starts each chapter with a war story where the rule proved crucial to success; then it describes the rule and show how it applies to the story. It discusses various ways of thinking about and using the rule that are easy to remember in the face of complex technological problems (or even simple ones). And it gives you some variations showing how the rule applies to other stuff like cars and houses.
DEBUGGING RULES
- UNDERSTAND THE SYSTEM
- MAKE IT FAIL
- QUIT THINKING AND LOOK
- DIVIDE AND CONQUER
- CHANGE ONE THING AT A TIME
- KEEP AN AUDIT TRAIL
- CHECK THE PLUG
- GET A FRESH VIEW
- IF YOU DIDN’T FIX IT, IT AIN’T FIXED
’’’''
Takeaways
Remember: Understand the System
This is the first rule because it’s the most important. Understand?
- Read the manual. It’ll tell you to lubricate the trimmer head on your weed whacker so that the lines don’t fuse together.
- Read everything in depth. The section about the interrupt getting to your microcomputer is buried on page 37.
- Know the fundamentals. Chain saws are supposed to be loud.
- Know the road map. Engine speed can be different from tire speed, and the difference is in the transmission.
- Understand your tools. Know which end of the thermometer is which, and how to use the fancy features on your Glitch-O-Matic logic analyzer.
- Look up the details. Even Einstein looked up the details. Kneejerk, on the other hand, trusted his memory.
Remember: Make It Fail
It seems easy, but if you don’t do it, debugging is hard.
- Do it again. Do it again so you can look at it, so you can focus on the cause, and so you can tell if you fixed it.
- Start at the beginning. The mechanic needs to know that the car went through the car wash before the windows froze.
- Stimulate the failure. Spray a hose on that leaky window.
- But don’t simulate the failure. Spray a hose on the leaky window, not on a different, ``similar’’ one.
- Find the uncontrolled condition that makes it intermittent. Vary everything you can—shake it, rattle it, roll it, and twist it until it shouts.
- Record everything and find the signature of intermittent bugs. Our bonding system always and only failed on jumbled calls.
- Don’t trust statistics too much. The bonding problem seemed to be related to the time of day, but it was actually the local teenagers tying up the phone lines.
- Know that ``that’’ can happen. Even the ice cream flavor can matter.
- Never throw away a debugging tool. A robot paddle might come in handy someday.
Remember: Quit Thinking and Look
You can think up thousands of possible reasons for a failure. You can see only the actual cause.
- See the failure. The senior engineer saw the real failure and was able to find the cause. The junior guys thought they knew what the failure was and fixed something that wasn’t broken.
- See the details. Don’t stop when you hear the pump. Go down to the basement and find out which pump.
- Build instrumentation in. Use source code debuggers, debug logs, status messages, flashing lights, and rotten egg odors.
- Add instrumentation on. Use analyzers, scopes, meters, metal detectors, electrocardiography machines, and soap bubbles.
- Don’t be afraid to dive in. So it’s production software. It’s broken, and you’ll have to open it up to fix it.
- Watch out for Heisenberg. Don’t let your instruments overwhelm your system.
- Guess only to focus the search. Go ahead and guess that the memory timing is bad, but look at it before you build a timing fixer.
Remember: Change One Thing at a Time
You need some predictability in your life. Remove the changes that didn’t do what you expected. They probably did something you didn’t expect.
- Isolate the key factor. Don’t change the watering schedule if you’re looking for the effect of the sunlight.
- Grab the brass bar with both hands. If you try to fix the nuke without knowing what’s wrong first, you may have an underwater Chernobyl on your hands.
- Change one test at a time. I knew my VGA capture phase was broken because nothing else was changing.
- Compare it with a good one. If the bad ones all have something that the good ones don‘t, you’re onto the problem.
- Determine what you changed since the last time it worked. My friend had changed the cartridge on the turntable, so that was a good place to start.
Remember: Keep an Audit Trail
Better yet, don’t remember Keep an Audit Trail.'' Write down
Keep an Audit Trail.''
- Write down what you did, in what order, and what happened as a result. When did you last drink coffee? When did the headache start?
- Understand that any detail could be the important one. It had to be a plaid shirt to crash the video chip.
- Correlate events.
It made a noise for four seconds starting at 21:04:53'' is better than
It made a noise.'' - Understand that audit trails for design are also good for testing. Software configuration control tools can tell you which revision introduced the bug.
- Write it down! No matter how horrible the moment, make a memorandum of it.
Remember: Check the Plug
Obvious assumptions are often wrong. And to rub it in, assumption bugs are usually the easiest to fix.
- Question your assumptions. Are you running the right code? Are you out of gas? Is it plugged in?
- Start at the beginning. Did you initialize memory properly? Did you squeeze the primer bulb? Did you turn it on?
- Test the tool. Are you running the right compiler? Is the fuel gauge stuck? Does the meter have a dead battery?
Ask for Help
There are at least three reasons to ask for help, not counting the desire to dump the whole problem into someone else’s lap: a fresh view, expertise, and experience. And people are usually willing to help because it gives them a chance to demonstrate how clever they are.
Remember: Get a Fresh View
You need to take a break and get some coffee, anyway.
- Ask for fresh insights. Even a dummy can help you see something you didn’t see before.
- Tap expertise. Only the VGA capture vendor could confirm that the phase function was broken.
- Listen to the voice of experience. It will tell you the dome light wire gets pinched all the time.
- Know that help is all around you. Coworkers, vendors, the Web, and the bookstore are waiting for you to ask.
- Don’t be proud. Bugs happen. Take pride in getting rid of them, not in getting rid of them by yourself.
- Report symptoms, not theories. Don’t drag a crowd into your rut.
- Realize that you don’t have to be sure. Mention that the shirt was plaid.
Remember: If You Didn’t Fix It, It Ain’t Fixed
And now that you have all these techniques, there’s no excuse for leaving it unfixed.
- Check that it’s really fixed. Don’t assume that it was the wires and send that dirty fuel filter back onto the road.
- Check that it’s really your fix that fixed it. ``Wubba!’’ might not be the thing that did the trick.
- Know that it never just goes away by itself. Make it come back by using the original Make It Fail methods. If you have to ship it, ship it with a trap to catch it when it happens in the field.
- Fix the cause. Tear out the useless eight-track deck before you burn out another transformer.
- Fix the process. Don’t settle for just cleaning up the oil. Fix the way you design machines.
Remember: The View From the Help Desk Is Murky
You’re remote, your eyes and ears are not very accurate, and time is of the essence.
- Follow the rules. You have to find ways to apply them in spite of your unenlightened user.
- Verify actions and results. Your users will misunderstand you and make mistakes. Discover these early by verifying everything they do and say.
- Use automated tools. Get the user out of the picture with system-generated logs and remote monitoring and control tools.
- Verify even the simplest assumptions. Yes, some people don’t realize you need power to make your word processor work.
- Use available troubleshooting guides. You are probably dealing with known good designs; don’t ignore the history.
- Contribute to troubleshooting guides. If you find a new problem with a known system, help the next support person by documenting everything.
Remember: The rules are ``golden''
Which means that they’re:
- Universal. You can apply them to any debugging situation on any system.
- Fundamental. They provide the framework for, and guide the choice of, the specific tools and techniques that apply to your system.
- Essential. You can’t debug effectively without following all of them.
- Easy to remember. And we keep reminding you: The Debugging Rules Understand the System Make It Fail Quit Thinking and Look Divide and Conquer Change One Thing at a Time Keep an Audit Trail Check the Plug Get a Fresh View If You Didn’t Fix It, It Ain’t Fixed