Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

You Can Program In C++ (2006) [eng]

.pdf
Скачиваний:
80
Добавлен:
16.08.2013
Размер:
4.33 Mб
Скачать

298

CHAPTER 17

Suitable iterators are associated with each container type. Container types usually provide iterator types as nested types. In general, each container type has four iterator types: iterator, reverse iterator (allowing iteration from back to front), const iterator (providing readonly access to the elements of a container), and const reverse iterator (read-only access in reverse order).

The Standard Library has over 80 function templates, which provide the most common forms of manipulation of the elements of a container. They include sorting for sequences (std::sort, std::stable sort, and std::partial sort), copying, and searching, as well as functions to add all the elements of a container (std::accumulate). If you want to process the elements of a container, always check whether what you want has already been provided as part of the ‘algorithm’ section of the Standard Library.

I do not have room to cover everything provided by the STL, so I am going to use three small demonstration problems to give you a feel for what you can do with the Standard library.

Working with a Set

Problem 1

Read in a text file and print out a list of the words contained in it.

Here is my solution:

1 #include <set>

2 #include <iostream>

3 #include <istream>

4 #include <ostream>

5 #include <fstream>

6

7 int main( ){

8 std::ifstream wordfile("main.cpp");

9std::set<std::string> words;

10std::string word;

11while(true){

12wordfile >> word;

13if(wordfile.eof( )) break;

14words.insert(word);

15}

16std::ofstream wordlist("words.txt");

17wordlist << "main.cpp contains each of the following at least once:\n";

18for(std::set<std::string>::iterator i(words.begin( ));

19i != words.end( ); ++i){

20wordlist << *i << '\n';

21}

22}

I have omitted the exception handling and checking that file streams connect to the specified files, because I want to focus on the essentials of this program.

Lines 1 –5 include all the necessary headers. In practice, several of them will be redundant, because most implementations include <istream> and <ostream> automatically with either <iostream> or <fstream>. <set> provides all the declarations we need if we want to use the std::set<> container.

Lines 8 –10 provide the variables we need. In particular, line 9 provides words as a std::set of std::strings. I initialized wordfile to use main.cpp, because that is what I called the file in which I saved this program. This is a lazy trick that avoids my taking time to create a specific text file. It will also allow me to highlight a couple of problems with the program as it stands.

CONTAINERS, ITERATORS, AND ALGORITHMS

299

Lines 11 –15 constitute a standard ‘forever’ loop with an internal break to terminate it. It simply reads in the source file until the end-of-file marker is read.

Line 16 opens a file into which we can write our results. The code in lines 18 –21 sends the contents of words to the stream wordlist, which results in the data being stored in words.txt. There are several points worth noting. The first is the usage of words.begin( ) and words.end( ). Every standard container type provides a pair of member functions called begin( ) and end( ). The former returns an iterator to the first element of the container and the latter a special value of the iterator type that designates ‘beyond the end of the container’. In the special case that the container is empty, begin( ) and end( ) return the same iterator value.

Look carefully at line 18. I have defined i to be an object of type std::set<std::string>:: iterator and initialized it to locate the first element in the container I have called words. Next, i is checked against the special termination value (so if words is empty, the loop terminates immediately). Then the dereferenced value of i (i.e. *i) is sent to the wordlist stream. i is incremented and the process repeated until we reach the end of words.

Here is the result (if you use whitespace differently when you type the code in, your results may be slightly different):

main.cpp contains each of the following at least once: !=

#include

'\n'

*i

++i){

<<

<fstream>

<iostream>

...

wordfile wordfile("main.cpp"); wordlist wordlist("words.txt"); words.end( ); words.insert(word); words;

}

What do you notice? Well, the first thing is that the program’s concept of a word is not ours. If you check the code, you will see that we applied >> (the extraction operator) to get a std::string value. By default, the extraction operator uses whitespace as a delimiter. Let me modify my program to use something more akin to my notion of a word. I need a function to extract the next word from input – something like this:

std::string get_word(std::istream & in){ std::string word;

while(true){

char const letter(in.get( )); if(in.eof( )) return word;

if(std::isalpha(letter)){ // note this needs <cctype> included word += letter;

break;

}

}

while(true){

char const letter(in.get( ));

300 CHAPTER 17

if(in.eof( )) break;

if(not std::isalpha(letter)) break; word += letter;

}

return word;

}

If you study this code, you will see that it basically skips over the contents of the input stream until it gets a letter (std::isalpha( ) returns true only if its argument is a letter). It then adds that letter to the previously created empty std::string called word, and then enters a second loop to add letters until it gets a non-letter from the input stream. The code also handles the special case of reaching the end of the file, either by returning an empty word or by returning the last word.

After a little reorganization, our previous program becomes

int main( ){

std::ifstream wordfile("main.cpp"); std::set<std::string> words; while(true){

std::string const word(get_word(wordfile)); words.insert(word);

if(wordfile.eof( )) break;

}

std::ofstream wordlist("words.txt");

wordlist << "main.cpp contains each of the following at least once:\n"; for(std::set<std::string>::iterator i(words.begin( ));

i != words.end( ); ++i){ wordlist << *i << '\n';

}

}

and we can build the second version of our solution program. This time the output file is:

main.cpp contains each of the following at least once:

at begin break cctype char const contains cpp

...

this true txt while word

wordfile wordlist words

Now you can see the other characteristics of a std::set<>: the elements are stored in order (alphabetical order by default for std::string objects), and no element repeats. We could have provided a predicate function to give an ordering rule, but the default is to use the < (less-than) operator for the element type.

CONTAINERS, ITERATORS, AND ALGORITHMS

301

Problem 1a

List all the unique words in a file as a list in reverse alphabetical order.

We only need to change a single line of our previous program, but we need to do so carefully because we can no longer use the values provided by begin( ) and end( ). Starting with end( ) is no use, because that does not locate the last element but the actual end of the container. We need to use the reverse iterator

and the two member functions that provide the values needed to walk the container backwards, which are called rbegin( ) (effectively the last element in the container) and rend( ) (a special value to mark that we are off the start of the container).

Replace

for(std::set<std::string>::iterator i(words.begin( )); i != words.end( ); ++i){

with:

for(std::set<std::string>::reverse_iterator i(words.rbegin( )); i != words.rend( ); ++i){

Build and execute the program.

Problem 1b

List all the words in a text file with multiple copies (so each word occurs as many times as it is used).

This is even simpler once we understand the properties of the containers in the STL. Go back to the final program for Problem 1, and declare words as a std::multiset<std::string>. Build and execute the program.

Problem 1c

List all the words in a text file, with each word followed by the number of times it occurs.

We have to make a rather greater change to our program to achieve the desired result. However, once again, by choosing an appropriate container we get most of the work done without having to write explicit code. This time we need a std::map<> (declared in <map>), using words as keys and counts as values for the (key, value) pairs.

1 int main( ){

2 std::ifstream wordfile("main.cpp");

3std::map<std::string, int> words;

4while(true){

5std::string const word(get_word(wordfile));

6++words[word];

7if(wordfile.eof( )) break;

8}

9 std::ofstream wordlist("words.txt");

10wordlist << "main.cpp contains each of the following words:\n";

11for(std::map<std::string, int>::iterator i(words.begin( ));

12i != words.end( ); ++i){

13wordlist << i->first << ", " << i->second << '\n';

14}

15}

302

CHAPTER 17

We create a suitable container at line 3 – one that maps std::string values to numbers. Line 6 uses a special property of std::map<> that allows us to access the value of a key by using the subscript operator (note that operator[ ] is overloaded for std::map<> to allow access by using a key value). In addition, if the index value is not already in the container, it is automatically added as a key and mapped to the default for the value (which is 0 in the case of an int). The first time we find a word in the text file, we add it to the container, the count is set to zero, and that is immediately incremented. Subsequent uses of the same word increment the count.

I have edited line 11 so that we will iterate over the std::map. Line 13 is a good example of the sometimes counterintuitive naming in parts of the Standard Library. We might expect to find key( ) and value( ) member functions to handle the two parts of a mapped element. However, the elements of a map are actually appropriate instances of std::pair<>. That template type from the Standard Library is simply a way of associating a pair of values. In that context, it makes sense to refer to first and second. Remember that iterators are effectively smart pointers and so support operator->.

Problem 1d

List all the words in a text file together with the number of times they occur. The list should be in descending order of frequency, with words having the same frequency listed in alphabetical order.

We can achieve our objective if we can sort the (std::string, int) pairs in descending order of the second value without disturbing the existing ordering for words with the same frequency. We need to apply std::stable sort( ) (declared in <algorithm>) to a suitable container. The problem is that associative containers cannot be sorted, because the elements are always ordered by their keys. We would prefer to keep using a std::map<> to collect the data, but need some form of sequence container to handle the sorting problem.

The Standard Library containers have constructors that support construction of a container from the elements of another. I am going to use a std::vector<> to handle the reorganization problem. I have already mentioned that a std::map contains elements of a std::pair<> type. Putting this together, we get the following definition for moving our collected data into a sequence container:

std::vector<std::pair<std::string, int> > seq_of_words(words.begin( ), words.end( ));

The space between the two > signs used to end template arguments is currently required by the language, to avoid confusion with operator>>.

That statement copies our data from words to seq of words. Next, we want to apply std::stable sort( ) to seq of words. Like most of the algorithms, std::stable sort( ) is provided with default behavior if you do not specify the criterion for sorting. That will not do, so we need a small function to provide this criterion (STL experts often manage to find ways of defining such behavior on the fly, but I am going to keep it simple). The requirement for an ordering function is that it takes two instances of the type concerned and returns a bool value (functions that return bool are called predicates). Here is the one we want:

inline bool compare(std::pair<std::string, int> const & lhs, std::pair<std::string, int> const & rhs){

return lhs.second > rhs.second;

}

We can use that function to sort seq of words with:

std::stable_sort(seq_of_words.begin( ), seq_of_words.end( ), compare);

Notice that we just use the name of the comparison function as the third argument to std::stable sort( ).

CONTAINERS, ITERATORS, AND ALGORITHMS

303

Putting all this together, our main( ) becomes:

int main( ){

std::ifstream wordfile("main.cpp"); std::map<std::string, int> words; while(true){

std::string const word(get_word(wordfile)); ++words[word];

if(wordfile.eof( )) break;

}

std::vector<std::pair<std::string, int> > seq_of_words(words.begin( ), words.end( ));

std::stable_sort(seq_of_words.begin( ), seq_of_words.end( ), compare); std::ofstream wordlist("words.txt");

wordlist << "main.cpp contains each of the following at least once:\n"; for(std::vector<std::pair<std::string, int> >::iterator

i(seq_of_words.begin( )); i != seq_of_words.end( ); ++i){ wordlist << i->first << ", " << i->second << '\n';

}

}

Build and execute the resulting program, and check the output file. Now try changing from std::stable sort( ) to std::sort( ) (just edit out the stable ). Build and execute the result. When you examine the output file, you will see that words with the same frequency are no longer in alphabetical order.

Working with Numeric Algorithms

Problem 2

Given a data file consisting of names and salaries with one name and one salary per line, separated by a colon, compute the average (arithmetic mean) salary, the average for the top 25%, and the average for the bottom 25%.

You will need to create a short text file containing suitable test data (call it salary data.txt) and save it in the chapter 17 folder. Here is the bare bones of a program that solves the problem. You will need

to include various headers. As well as the ones we have seen before, you will need <numeric> to provide a declaration for std::accumulate<> (used in lines 13, 18, and 22) and <functional> for the declaration of std::greater<> (used in line 12).

1 int main( ){

2 std::ifstream input("salary_data");

3 std::vector<double> salaries;

4double salary;

5std::string name;

6while(true){

7 getline(input, name, ':');

8input >> salary;

9if(input.eof( )) break;

10salaries.push_back(salary);

11}

12std::sort(salaries.begin( ), salaries.end( ), std::greater<double>( ));

13double const total(std::accumulate(salaries.begin( ),

14salaries.end( ), 0.0));

304

CHAPTER 17

15std::cout << "\nTotal payroll: " << total;

16std::cout << "\nThe mean salary is: " << total / salaries.size( );

17int const quartile(salaries.size( ) / 4);

18double const top(std::accumulate(salaries.begin( ),

19salaries.begin( ) + quartile, 0.0));

20std::cout << "\nThe mean salary of the top 25% is: "

21<< top / quartile;

22double const bottom(std::accumulate(salaries.end( ) - quartile,

23salaries.end( ), 0.0));

24std::cout << "\nThe mean salary of the top 25% is: "

25<< bottom / quartile << '\n';

26}

I have used the three-argument form of std::getline( ) in line 7 to extract the name part of the data (and effectively discard it). This version of std::getline( ) is often useful when extracting data that has delimiters in it. Line 8 extracts the salary value that follows each name. At line 9, we check that we have not hit the end of the data file, and until we have, line 10 copies the last-read salary to the back of salaries.

Line 12 uses the three-parameter form of std::sort( ) to sort salaries in descending order. Note that when we use such tools as std::greater<>, we have to add a pair of parentheses. Such tools are objects created by using an overloaded operator( ), and so we need to construct a suitable object. std::greater<double>( ) is a constructor for a temporary used to supply the criterion for sorting.

Every Standard Library container type includes a size( ) member function, which reports the number of elements.

The std::accumulate<>( ) numerical algorithm steps through the range of elements provided by the first two arguments and ‘accumulates’ their values into the value provided by the third argument. It returns the value of the third argument at the end. I have placed ‘accumulates’ in quotation marks because it has an optional fourth parameter that can replace the default of addition by some other operation. For example, std::accumulate(salaries.begin( ), salaries.end( ), 0.0, std::multiplies<double>( )) computes and returns the product of all the salaries. So, by replacing lines 13 –16 in the above definition of main( ) with

double const total(std::accumulate(salaries.begin( ), salaries.end( ), 1.0, std::multiplies<double>( )));

std::cout << "\nThe geometric mean salary is " << std::pow(total, 1.0 / salaries.size( ));

we could compute the geometric mean of the company payroll.

Note that the above code assumes that there are at least four entries; otherwise, the value of quartile would be zero and we would get divide-by-zero errors. As a further exercise in writing C++, please edit the code to include suitable checks to ensure that there is enough data for the program to work.

Problem 2a

In the previous version, we ignored the employee’s name by just reading it and discarding it. Use the same data, but output the names of the top 25% and bottom 25% earners.

There are many ways to tackle this problem, but here is a simple solution that builds on what we have covered earlier. First, we define a suitable type to hold the data:

class pay_data{ public:

pay_data(std::istream & in){

CONTAINERS, ITERATORS, AND ALGORITHMS

305

std::getline(in, name_, ':'); in >> salary_;

in.ignore(std::numeric_limits<int>::max( ), '\n');

}

operator double( )const{return salary_;} bool operator<(pay_data const & rhs)const{

return salary_ < rhs.salary_ ;

}

std::string name( )const{return name_;} double salary( )const{return salary_;}

private:

std::string name_; double salary_;

};

Most of that class definition (with everything implemented in the definition to keep this code simple) is straightforward. There is one interesting thing in the pay data constructor, which arose when I was testing the code. The line that starts in.ignore is there because otherwise std::getline( ) picks up the terminating newline character from the previous data item in the file. That line is the idiomatic way to ignore everything left on a line of input. You can ignore everything to the delimiter of your choice, but '\n' is by far the most common.

The definition of a conversion operator is a rather lazy way to make pay data behave like a double (useful for using std::accumulate). Defining operator< for pay data ensures that the sorting algorithms can be called without the need to add an argument specifying how the data is ordered. The sorting algorithms in the Standard Library default to using operator< for the type. If the type does not provide that operator and you do not specify an ordering function, you get a compilation error.

Here is a short program that uses this class definition to solve the problem.

int main( ){

std::ifstream input("salary_data"); std::vector<pay_data> salaries; while(true){

pay_data salary(input); if(input.eof( )) break;

salaries.push_back(salary); // uses compiler-generated copy constructor

}

std::sort(salaries.begin( ), salaries.end( )); int const quartile(salaries.size( ) / 4); std::cout << "\nThe following " << quartile

<< " employees earned in the bottom 25\%:\n"; for(int i(0); i != quartile; ++i){

std::cout << salaries[i].name( ) << " earned " << salaries[i].salary( ) << '\n';

}

std::reverse(salaries.begin( ), salaries.end( )); std::cout << "\nThe following " << quartile

<< " employees earned in the top 25\%:\n"; for(int i(0); i != quartile; ++i){

std::cout << salaries[i].name( ) << " earned " << salaries[i].salary( ) << '\n';

}

306

CHAPTER 17

Note the use of std::reverse( ) to re-sort salaries in descending order. I did not have to do it that way, but it saved me from having to work out exactly where to start the output of the high-salary earners. It also places the higher earners in descending order of salary.

Working with a Multimap

Problem 3

Create a dictionary of anagrams so that we can look up all the words spelled with a given selection of letters. The solution should allow the dictionary to be sent to an output stream (such as the console or a file), and provide for lookup of a word (or collection of letters) with output of a list of all the words that are spelled by rearranging the letters.

I have been rather lazy in my earlier problem solutions, because I have packed rather more code into main( ) than I would consider good design. Let me make amends with this problem and provide a reasonably well designed solution (though many readers may consider they can do better).

First, I am going to provide a couple of typedefs to avoid repeated use of long type names:

typedef std::multimap<std::string, std::string> dictionary;

typedef std::multimap<std::string, std::string>::const_iterator iter;

As you see, I am using a std::multimap<>. I intend that a key will be a string of letters in alphabetical order. Each corresponding value (the second element of a pair) will be a word that can be spelled with those letters. I have chosen to use const iterator because I do not intend to allow changes to entries in the dictionary.

Now let me deal with the three things the problem requires us to do: create a dictionary of anagrams, send it to an output stream, and look up a string of letters.

void make_anagram_dictionary(dictionary & anagrams, std::istream & source){ while(true){

std::string word; source >> word;

if(source.eof( )) return;

if(not std::isalpha(word[0]) return; std::string letters(word); std::sort(letters.begin( ), letters.end( )); anagrams.insert(std::make_pair(letters, word));

}

}

We pass a dictionary by reference (note that the dictionary might already have words in it) and a source of words into this function. It repeatedly extracts the next ‘word’ from source. Before continuing, it checks that it was not at the end of the file and that it did not read a string that began with a non-alphabetic symbol. In either of those cases, it returns to the calling function. As long as it has a legitimate word, it creates a working copy, to which it applies std::sort( ) to arrange the letters in ascending alphabetical order. Finally, it uses std::make pair( ) to compose a single entity of the ordered letters and the original word, which it inserts into the dictionary.

The following function is the simplest solution to the problem of sending the dictionary to an output stream. It prints each entry on a single line – first the sorted letters, then the word they make:

void print_anagram_dictionary(dictionary const & anagrams, std::ostream & out){

for(iter i(anagrams.begin( )); i != anagrams.end( ); ++i){ out << i->first << ", " << i->second << '\n';

}

}

CONTAINERS, ITERATORS, AND ALGORITHMS

307

Here is our lookup function:

void find_anagram(dictionary const & anagrams, std::string const & word, std::ostream & out = std::cout){

// note default argument allowed if definition doubles as a declaration std::string letters(word);

std::sort(letters.begin( ), letters.end( )); iter lower(anagrams.lower_bound(letters)); iter upper(anagrams.upper_bound(letters)); if(lower == upper){

std::cout << "No word using the letters of " << word << " found in the anagram dictionary.\n";

return;

}

std::cout << "The following anagrams of " << word << " were found in the anagram dictionary:\n";

for(iter i(lower); i != upper; ++i){ std::cout << i -> second << ", ";

}

std::cout << '\n'; return;

}

By now, most of the definition of this function should be clear. lower bound( ) and upper bound( ) are member functions of each of the standard associative containers (std::map, std::multimap, std::set, and std::multiset). lower bound( ) returns an iterator value (or const iterator value if the container is const-qualified) to the first element that matches its argument. upper bound( ) returns an iterator value (or const iterator value if the container is const qualified) to the first element after the last element that matches its argument. Both return the value of end( ) if the element is not found. In addition, upper bound( ) returns the value of end( ) if the found element is the last in the container.

Finally, here is a short test program that uses a file provided on the CD called words data as a source to test the functions above:

int main( ){ try{

std::multimap<std::string, std::string> anagrams; std::ifstream input("word_data"); make_anagram_dictionary(anagrams, input); print_anagram_dictionary(anagrams, std::cout); find_anagram(anagrams, "vile"); find_anagram(anagrams, "level");

}

catch(...){

std::cerr << "An exception was thrown.\n";

}

}

Preloading a Container

Sometime we want to create a container with a certain number of values loaded into it at the start. The problem we face is how we can do that. If we use an array as a container, we can often write something such as:

int primes[ ] = {2, 3, 5, 7, 11};