Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
C++ Timesaving Techniques (2005) [eng].pdf
Скачиваний:
65
Добавлен:
16.08.2013
Размер:
8.35 Mб
Скачать

Restructuring 451

{

sWord += in[i];

}

}

if ( sWord.length() )

words.insert( words.end(), sWord );

return true;

}

int num()

{

return words.size();

}

string word( int idx )

{

string s = “”;

if ( idx < 0 || idx > (int)words.size()-1 )

return s;

s = words[idx]; return s;

}

};

int main(int argc, char **argv )

{

ParserFile pf( “myfile.txt” ); Parser p(‘:’);

while ( !pf.eof() )

{

p.clear();

if (p.parse( pf.getLine() ) == true )

{

printf(“Parsed:\n”);

for ( int i=0; i<p.num(); ++i ) printf(“Word[%d] = %s\n”, i,

p.word(i).c_str() );

}

}

return 0;

}

Note that this code does not do anything different from our original code listing. It has simply been restructured to be more componentized. The logic and functionality remain the same. The code to read a file into individual lines has been moved into the ParserFile class (shown at 2). This class does more error-checking for input, and has specific methods to read the file and return individual lines, but it is otherwise functionally equivalent to the previous example code. Likewise, the Parser class (shown at 3) still parses a given line, but is no longer reliant on any file input to do its work. It is now a simple parser class that takes in a string and breaks it down into words, using a developer-supplied delimiter, in place of our hard-coded colon of the first example.

Looking at the main program, you can see how much cleaner the interface is, and how much simpler it is to read. It should also be considerably easier to debug, because each piece of the code is in a separate component, meaning that when a problem is encountered, only that component needs to be checked out.

Restructuring

Restructuring (also known as refactoring) is the process of going back through code and eliminating redundancy and duplicated effort. For example, let’s consider the following snippet of code (not from our example, just a generalized piece of code):

int ret = get_a_line();

 

 

if ( ret == ERROR )

 

4

throw “Error in get_a_line!”;

 

ret = get_words_from_line();

 

if ( ret

==

ERROR

)

throw

“Error

in

get_words_from_line!”;

ret = process_words();

if ( ret ==

ERROR

)

throw

“Error

in

process_words!”;

452 Technique 71: Reducing the Complexity of Code

This code is prime territory for refactoring. Why? Because the code contains multiple redundant statements, namely the exception handling (throw lines,

such as the one shown at

 

4) To do this, follow

these steps.

 

1. Examine the code for similar looking statements or processes.

In our case, the code that is similar is the check for the return code and the throwing of the exception string.

2. Extract the redundant code and factor it into a routine of its own.

In this case, we can factor the code into a single routine:

void CheckAndThrowError( int retCode, const char *name )

{

if ( retCode == ERROR ) throw name;

}

3. Replace the existing code with the calls into the refactored code.

CheckAndThrowError( get_a_line(), “get_a_line”);

CheckAndThrowError(get_words_from_line( ),

“get_words_from_line” ); CheckAndThrowError(process_words(),

“process_words” );

4. If necessary, after the code is refactored, reexamine it for other similarities.

In this example, we might consider logging the error within the CheckAndThrowError function. This isn’t really a refactoring case, but rather an observation of what might make the code more complete.

Specialization

Programmers have a habit of writing code that is generalized to the extreme. Why write a routine that can break down a string into four parts at particular

boundaries, when you can write a generalized routine that can handle any number of segments — of any length each? Sounds great in theory . . .

One sad lesson — normally learned when debugging programs — is that generalization is really a pain in the neck. It causes vastly more problems than it solves, and it never turns out that your code is general enough to handle every single case that comes its way. So you hack the code to make it work; it ends up littered with special cases.

Take a look at an example of generalization and how it can get you into trouble. Going back to our original code, assume that your input file has very long strings in it — not really a valid input file at all. Suppose it looked something like this:

This is a really long sentence that doesn’t happen to have a colon in it until it reaches the very end like this: do you think it will work?

If we run our first example program on this input file, it will crash, because we will overwrite the end of the word allocated space. This happens because we generalized the input to handle any sort of file, instead of making it specific to the kind of input we were expecting. We could easily change our code to handle a bigger string, but instead, we should follow the rules of specialization:

Make sure that input files are clearly marked as valid input to the program: In nearly all cases, your program-specific input should contain a version and type identifier. We haven’t added this to this simple example, but it would make sense to modify the ParserFile class to read in a beginning line containing version information.

If your input is fixed-length, check the length before you start loading the data: If you have an input file that is supposed to contain words of no more than 80 characters, then any time you have not encountered a delimiter within 80 characters, you should abort the input process and print out an error message for the user. If the word length is not fixed, then you should never use a fixed-length buffer to store it.

Specialization 453

We already fixed this one in the ParserFile class by using a string in place of the fixed size buffer.

Reject data in any format you do not understand: This precept is a little easier to understand with an example. Let’s suppose that you are reading in a date from the command line or in a graphical user interface. Dates have so many formats that it is almost not worth enumerating them all. However, if you are given a date that is given as 1/1/04, there are numerous ways to interpret it. For example, it could be in M/D/YY format, and be January 1, 2004. Alternatively, it could be in D/M/YY format — which would still

be January 1, 2004, but would change the interpretation. There is no reason to keep the ambiguity. Either force the user to enter in a single format, or use a control that specifies the format.

This one really has no issue in our ParserFile class, because we aren’t dealing with specific data sizes.

If you follow these guidelines, you will cut down on the number of bugs you receive — which makes it easier to debug problems that you do encounter in your application.