A few days ago I was doing a code review in which (in essence) the following loop occurred:
class Foo;
bool matches(const Foo& inVal);
std::vector<Foo> x;
std::vector<Foo>::iterator i = x.begin();
while (i != x.end())
{
if (matches(*i))
i = x.erase(i);
else
++I;
}
I raised two issues with it: one, that there was a bug in it (it left out the else, meaning that where consecutive members of the vector matched the condition the second would be skipped). The second was that the cost was fairly high; at the time I raised this by asking whether it made sense to replace the vector by a deque (reduces the cost for large sets of data) or a list (reduces the cost for all sets of data, but can raise the cost of other operations significantly), and was told that it would be a short vector and that the overall expense would therefore be low. (The cost is high because every time an element is deleted all the following events have to be shifted left once.)
About 24 hours later I asked about the use of remove_if and erase, shown below:
class Matcher {
public:
bool operator()(const Foo& inVal) const {
return matches(inVal);
}
};
x.erase(std::remove_if(x.begin(), x.end(), Matcher()), x.end());
I received the answer that the use of remove_if required extra work for a minor bit if code and it was left there, as far as that issue went.
But I remained curious. How much extra work was it? So I actually drew up the code bits above and counted lines. If we leave out the setup (first three lines) the STL code snippet is actually one line shorter.
(This had to be workable in a C++03 compiler. A C++11 implementation with the STL algorithm using lambdas would be shorter still.)
What are the benefits of the STL approach? Well, first, it's not fragile; the bug caught on the code review is impossible in the equivalent STL code. The bug is directly related to another minor weakness of the iterator-based code: because of the variable incrementing logic it can't use an idiomatic for loop, but has to use while().
Secondly, it's faster; remove_if is an O(N) operation and the final erase, which only resets the end of the vector, is very cheap indeed.
Third, it is immediately clear what the code is doing. Because of some additional verbiage in the actual code (plus the missing else, which obscured the firm of the loop) I had to look twice to be sure of what was going on.
This is, in a nutshell, an illustration of why, in general, STL algorithms are preferable to hand-rolled loops: more reliable and generally faster. There are some subtle ways of using find_if and transform in unusual ways[1] but remove_if is clear about what it does.
But it's also an illustration of the implicit prejudice I find in developers against using the STL algorithms. The combined use of remove_if and erase is a very well-known idiom; there is nothing difficult about it. But the perceived - but not actually significant - overhead of having to provide a functor to feed to the algorithm [2] seems to constitute a mental barrier.
[1] For example, using find_if for "process every item in this ordered vector below a given value", for which it is admirably suited. Using for_each is less efficient, as find_if will cut off processing once the relevant value is found or surpassed if your test is written correctly. But you throw away the result of the search, which is not normally expected.
[2]Of course, even that is not real. If your test is in a function, as above, you can just pass a function pointer, although it is likely to be slightly less efficient. If it's not in a function, it's a logical unit and either should be in a function or in a functor in any case.[3] (In the actual code the test was actually not a function, but a comparison expression. This actually did mean that the number of lines needed for the STL approach would be a little greater, as the test needed to be encapsulated for the STL where it had been merely dropped into the loop.)
[3] The marginal case being where this test is used only here. If it expresses a meaningful concept in the problem domain, though, it's a good bet that it is not.