Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Professional C++ [eng].pdf
Скачиваний:
284
Добавлен:
16.08.2013
Размер:
11.09 Mб
Скачать

Conquering Debugging

out to be quite useful in some cases. It allows you to “force” your program to exhibit a bug at the exact point where that bug originates. If you didn’t assert at that point, your program might proceed with those incorrect values, and the bug might not show up until much later. Thus, asserts allow you to detect bugs earlier than you otherwise would.

The behavior of assert depends on the NDEBUG preprocessor symbol: if the symbol is not defined, the assertion takes place, otherwise it is ignored. Compilers often define this symbol when compiling “debug” builds. If you want to leave asserts in run time code, you must specify your compiler settings, or write your own version of assert that isn’t affected by the value of NDEBUG.

You should use asserts in your code whenever you are “assuming” something about the state of your variables. For example, if you call a library function that is supposed to return a pointer and claims never to return NULL, throw in an assert after the function call to make sure that pointer isn’t NULL.

Note that you should assume as little as possible. For example, if you are writing a library function, don’t assert that the parameters are valid. Instead, check the parameters and return an error code or throw an exception if they are invalid. Asserts should be reserved for cases in which you have no other option. For example, in the start-time debugging example, the function trickyFunction() takes a parameter of type ComplicatedClass*. Instead of assuming that the argument is valid, it might be a good idea to assert it like this:

#include <cassert>

void trickyFunction(ComplicatedClass* obj) throw(exception)

{

assert(obj != NULL);

// Remainder of the function omitted for brevity

}

Be careful not to put any code that must be executed for correct program functioning inside asserts. For example, a line like this is asking for trouble: assert(myFunctionCall() != NULL). If a release build in your code strips asserts, then the call to myFunctionCall() will be missing as well!

Debugging Techniques

Debugging a program can be incredibly frustrating. However, with a systematic approach it becomes significantly easier. Your first step in trying to debug a program should always be to reproduce the bug. Depending on whether or not you can reproduce the bug, your subsequent approach will differ. The next three sections explain how to reproduce bugs, how to debug reproducible bugs, and how to debug nonreproducible bugs. Additional sections explain details about debugging memory errors and debugging multithreaded programs.

Reproducing Bugs

If you can reproduce the bug consistently, it will be much easier to determine the root cause. Any reproducible bug can be root-caused and fixed. Bugs that are not reproducible are difficult, if not impossible,

541

Chapter 20

to root-cause. As a first step to reproduce the bug, run the program with exactly the same inputs as the run when the bug first appeared. Be sure to include all inputs, from the program’s startup to the time of the bug’s appearance. A common mistake is to attempt to reproduce the bug by performing only the triggering action. This technique may not reproduce the bug because the bug might be caused by an entire sequence of actions. For example, if your Web browser program dies with a segmentation violation when you request a certain Web page, it may be due to memory corruption triggered by that particular request’s network address. On the other hand, it may be because your program records all requests in a queue, with space for one million entries, and this entry was number one million and one. Starting the program over and sending one request certainly wouldn’t trigger the bug in that case.

Sometimes it is impossible to emulate the entire sequence of events that leads to the bug. Perhaps the bug was reported by someone who can’t remember everything that he or she did. Alternatively, maybe the program was running for too long to emulate every input. In that case, simply do your best to reproduce the bug. It takes some guesswork, and can be time-consuming, but effort at this point will save time later in the debugging process. Here are some techniques you can try:

Repeat the triggering action in the correct environment and with as many inputs as possible similar to the initial report.

Run automated tests that exercise similar functionality. Reproducing bugs is one benefit of automated tests. If it takes 24 hours of testing before the bug shows up, it’s preferable to let those tests run on their own rather than spend 24 hours of your time trying to reproduce it.

If you have the necessary hardware available, running slight variations of tests concurrently on different machines can sometimes save time.

Run stress tests that exercise similar functionality. If your program is a Web server that died on a particular request, try running millions of browsers simultaneously that make that request.

After you are able to reproduce the bug consistently, you should attempt to determine the simplest and most efficient test case to reproduce it. That makes it simpler to root-cause the problem and easier to verify the fix.

Debugging Reproducible Bugs

When you can reproduce a bug consistently and efficiently, it’s time to figure out the problem in the code that causes the bug. Your goal at this point is to find the exact lines of code that trigger the problem. You can use two different strategies:

1. cout debugging. By adding enough debug messages to your program and watching its output when you reproduce the bug, you should be able to pinpoint the exact lines of code where the bug occurs. If you have a debugger at your disposal, this method is usually not recommended because it requires modifications to the program and can be time-consuming. However, if you have already instrumented your program with debug messages as described earlier, you might be able to root-cause your bug simply by running your program in debug mode while reproducing the bug. This technique may actually be faster than firing up a debugger.

2.Using a debugger. We hope that you are familiar with debuggers, which allow you to step through the execution of your program and to view the state of memory and the values of variables at various points. If you have not yet used debuggers, you should learn to use them as soon as possible. They are often indispensable tools for root-causing bugs. When you have access to the source code, you will use a symbolic debugger: a debugger that utilizes the variable

542

Conquering Debugging

names, class names, and other symbols in your code. In order to use a symbolic debugger you must compile your program with debugging information included. Otherwise, the symbol information is stripped from the program executable and is not available in the debugger.

The debugging example at the end of this chapter demonstrates both these approaches.

Debugging Nonreproducible Bugs

Fixing bugs that are not reproducible is significantly more difficult than root-causing reproducible bugs. You often have very little information and must employ a lot of guesswork. Nevertheless, a few strategies can aid you:

1.Try to turn a nonreproducible bug into a reproducible bug. By using educated guesses, you can often determine approximately where the bug lies. It’s worthwhile to spend some time trying to reproduce the bug. Once you have a reproducible bug you can figure out its root cause using the techniques described earlier.

2.Analyze error logs. Hopefully, you instrumented your program with error log generation as described previously. You should sift through this information because any errors that were logged directly before the bug occurred are likely to have contributed to the bug itself. If you’re lucky (or if you coded your program well), your program will have logged the exact reason for the bug at hand!

3.Obtain and analyze traces. Hopefully you instrumented your program with tracing output via a ring buffer as described previously. At the time of the bug’s occurrence, you hopefully obtained a copy of the traces. These traces should lead you right to the location of the bug in your code.

4.Examine a memory dump file, if it exists. Some platforms generate memory dump files of applications that terminate abnormally. On Unix these memory dumps are called core files. Each platform provides tools for analyzing these memory dumps. Even without symbolic debugging information, you can often obtain a surprising amount of information from these files. For example, you can usually generate a stack trace of the application before its death because global symbols such as function and method names are usually available in stripped binaries.

If you are familiar with the assembly of your platform, you can disassemble the machine code to get assembly code. In addition, you can view the contents of memory, although without symbols it is untyped and unnamed.

5.Inspect the code. Unfortunately, this is often the only strategy to determine the cause of a nonreproducible bug. Surprisingly, it often works. When you examine code, even code that you wrote yourself, with the perspective of the bug that just occurred, you can often find mistakes that you overlooked previously. We don’t recommend spending hours staring at your code, but tracing through the code path by hand will often lead you directly to the problem.

6.Use a memory-watching tool, such as one of those described in the “Debugging Memory Problems” section, which follows. Such tools will often alert you to memory errors that don’t always cause your program to misbehave, but could potentially be the cause of the bug at hand.

7.File or update a bug report. Even if you can’t find the root cause of the bug right away, the report will be a useful record of your attempts if the problem is encountered again. Consult Chapter 19 for details on bug-tracking systems.

Once you have root-caused a nonreproducible bug, you should create a reproducible test case and move it to the “reproducible bugs” category. It is important to be able to reproduce a bug before you actually

543

Chapter 20

fix it. Otherwise, how will you test the fix? A common mistake when debugging nonreproducible bugs is to fix the wrong problem in the code. Because you can’t reproduce the bug, you don’t know if you’ve really fixed it, so don’t be surprised when it shows up again a month later.

Debugging Memory Problems

Most catastrophic bugs, such as application death, are caused by memory errors. Many noncatastrophic bugs are triggered by memory errors as well. Some memory bugs are obvious: if your program attempts to dereference a NULL pointer, it will terminate immediately. However, others are more insidious. If

you write past the end of an array in C++, your program will probably not crash directly at that point. However, if that array was on the stack, you may have written into a different variable or array, changing values that won’t show up until later in the program. Alternatively, if the array was on the heap, you could cause memory corruption in the heap, which will cause errors later when you attempt to allocate or free more memory dynamically. Chapter 13 introduced some of the common memory errors from the perspective of what to avoid when you’re coding. This section discusses memory errors from the perspective of identifying problems in code that exhibits bugs. You should be familiar with the discussion in Chapter 13 before reading this section.

Categories of Memory Errors

In order to debug memory problems you should be familiar with the types of errors that can occur. This section describes the major categories of memory errors. Each memory error includes a small code example demonstrating the error and a list of possible symptoms that you might observe. Note that a symptom is not the same thing as a bug itself: a symptom is an observable behavior caused by a bug.

Memory Freeing Errors

This following table summarizes the five major errors involving freeing memory.

Error Type

Symptoms

Example

 

 

 

Memory

Process grows over time.

void memoryLeak()

leak

Process runs slower

{

 

over time.

int* ip = new int[1000];

 

Eventually, commands

return; // Bug! Not freeing ip.

 

and system calls fail

}

 

because of lack of

 

 

memory.

 

Using

Does not usually cause

void mismatchedFree()

mismatched

a program crash

{

allocation

immediately.

int* ip1 = (int *)malloc(sizeof(int));

and free

Can cause memory

int* ip2 = new int;

commands

corruption on some

int* ip3 = new int[1000];

 

platforms, which might

 

 

show up as a program

delete ip1; // BUG! Should use free

 

crash (segmentation

delete[] ip2; // BUG! Should use delete

 

violation) later in the

free (ip3); // BUG! Should use delete[]

 

program.

}

 

 

 

544

 

 

 

Conquering Debugging

 

 

 

 

 

Error Type

Symptoms

Example

 

 

 

 

 

Freeing

Can cause a program

void doubleFree()

 

memory

crash (segmentation

{

 

more than

violation) If the memory

int* ip1 = new int[1000];

 

once

at that location has been

delete[] ip1;

 

 

handed out in another

int* ip2 = new int[1000];

 

 

allocation between the

delete[] ip1; // BUG! freeing ip1 twice

 

 

two calls to delete.

}

 

Freeing

Will usually cause

void freeUnallocated()

 

unallocated

a program crash

{

 

memory

(segmentation violation

int* ip1 =

 

 

or bus error).

reinterpret_cast<int*>(10000);

 

 

 

// BUG! ip1 is not a valid pointer.

 

 

 

delete ip1;

 

 

 

}

 

Freeing

Technically a special case

void freeStack()

 

stack

of freeing unallocated

{

 

memory

memory. Will usually

int x;

 

 

cause a program crash.

int* ip = &x;

 

 

 

delete ip; // BUG! Freeing stack memory

 

 

 

}

 

 

 

 

As you can see, some of the memory free errors do not cause immediate program termination. These bugs are more subtle, leading to problems later in the run of the program.

Memory Access Errors

The second category of memory errors involves the actual reading and writing of memory.

Error Type

Symptoms

Example

 

 

 

Accessing

Almost always causes

void accessInvalid()

Invalid

program to crash

{

Memory

immediately.

int* ip1 =

 

 

reinterpret_cast<int*>(10000);

 

 

// BUG! ip1 is not a valid pointer.

 

 

*ip1 = 5;

 

 

}

 

 

 

 

 

Table continued on following page

545

Chapter 20

Error Type

Symptoms

Example

 

 

 

Accessing

Does not usually cause a

void accessFreed()

Freed

program crash.

{

Memory

If the memory has been

int* ip1 = new int;

 

handed out in another

delete ip1;

 

allocation, can cause

int* ip2 = new int;

 

“strange” values to appear

 

 

unexpectedly.

// BUG! The memory pointed to by ip1

 

 

// has been freed.

 

 

*ip1 = 5;

 

 

}

Accessing

Does not cause a program

void accessElsewhere()

Memory in

crash.

{

a Different

Can cause “strange” values

int x, y[10], z;

Allocation

to appear unexpectedly.

x = 0;

 

 

z = 0;

 

 

// BUG! element 10 is past the

 

 

// end of the array.

 

 

for (int i = 0; i <= 10; i++) {

 

 

y[i] = 10;

 

 

}

 

 

}

Reading

Does not cause a program

void readUninitialized()

Uninitialized

crash unless you use the

{

Memory

uninitialized value as a

int* ip;

 

pointer and dereference it

 

 

(as in the example). Even

// BUG! ip is uninitialized.

 

then, it will not always

cout << *ip << endl;

 

cause a program crash.

}

 

 

 

Memory access errors are more likely than memory free errors to cause program crashes. However, they don’t always do so. They can instead lead to subtle noncatastrophic bugs in your program.

Tips for Debugging Memory Errors

Memory-related bugs often show up in slightly different places in the code each time you run the program. This is usually the case with heap memory corruption. Heap memory corruption is like a time bomb, ready to explode at some attempt to allocate, free, or use memory on the heap. So, when you see a bug that is reproducible, but shows up in slightly different places, suspect memory corruption. For example, the program might get a segmentation violation one time followed by a bus error the next time.

If you suspect a memory bug, your best option is to use a memory-checking tool for C++. Debuggers often provide options to run the program while checking for memory errors. Additionally, there are some excellent third-party tools such as purify from Rational Software (now owned by IBM) or valgrind for Linux (discussed in Chapter 13). These debuggers and tools work by interposing their own memory allocation and freeing routines in order to check for any misuse of dynamic memory, such as freeing unallocated memory, dereferencing unallocated memory, or writing off the end of an array.

546