jsburbidge | Dec. 21st, 2022

If you (for C++ developers values of you) happen to be in the happy possession of a C++20 compiler, one facility it provides is the range-based filter view which allows for iterating over a range while filtering out certain elements.

If you don't have one, there are several options.

At a simple level, for use with for_ each(), there's simple composition. If you have a filter predicate Pred for Foo:

class Pred
{
public:
bool operator()(const Foo& inVal) const;
};

And a functor that does something with them, Op:

class Op
{
public:
void operator()(const Foo& inVal);
};

you can always create a composite:

class FilteredOp
{
public:

FilteredOp(Op& inOp, const Pred& inPred);

void operator()(const Foo& inVal)
{
if (m_op(inVal))
m_pred(inVal);
}
private:
Op& m_op;
const Pred& m_pred;
};

(We will refer to this as the naive version. This could easily be turned into a template to do more general composition.)

This works just fine - if all you want to invoke is for_each(). But if you want to use, e.g. transform() or rotate_copy(), it won't work. (Some operations provide a filtered option with their _if variants. Many do not. Many of those operate in such a way that a valid return value is expected for every application. In other cases, e.g. sample(), there is no predicate functor to be extended in this way.)

It is also very slightly more elaborate to write

Op op;
FilteredOp fop(op, Pred());

std::for_each(seq.begin(), seq.end(), fop);

than, say,

Op op;

filtered_for_each(seq.begin(), seq.end(), op, Pred());

even if you legitimately want to use for_each(), but only very slightly.

(The same applies a fortiori if FilteredOp is a closure; the difference lies in how closely you have to look at what is happening to discern intent; a closure has no name to assist the maintainer.)

The next alternative, if you have C++11, is to create a temporary filtered copy using copy_if:

typedef Seq std::vector<Foo>;

Op op;
{
Seq tempSeq;
std::copy_if(seq.begin(), seq end(), std::back_inserter(tempSeq), Pred());

std::for_each(tempSeq.begin(), tempSeq.end(), op);
}

This is not a great improvement on the naive version, and costs more. It does avoid multiplying entities. The big downside is that if you are processing the filtered data once, the copying costs in both time and memory.

It has the advantage of bring idiomatic. And, of course, it works for a use of std::sample().

It is a better choice if you want to process the filtered data in any way more than once - by far the best choice, as the costs of subsequent iterations will be cut by the initial filtering, unless you have memory constraints. (Also note that in C++20, you can separate the filtered and unfiltered elements by using remove_copy_if and have two sequences ready for subsequent operations.)

One other advantage of the copy_if approach is that you can change the nature of the collection - you can, for example, iterate over a vector and insert into a set, effectively carrying out a sort on your filtered items at the same time. This may not be as efficient as copying to a vector and then applying a sorting algorithm - but again, a second stage in processing of this type is the sort of thing the copy_if approach enables.

The other alternative is to turn to boost. Boost has a filter iterator, which skips elements satisfying Pred without doing any modifications. Thus:

Op op;

std::for_each(boost::filter_iterator<Pred, Seq::iterator>(tempSeq.begin()), boost::filter_iterator<Pred, Seq:: iterator> (tempSeq.end()), op);

This works generally.

If you need to filter once only, then this is preferable. (It can also be used to emulate copy_if if you have a C++03 compiler but also have boost, by using it with std::copy.) If you need to operate on the filtered set more than once, it is suboptimal, since every iteration has to visit every element in the full sequence each time - unless you are optimizing memory (large sequences) and care less about time; this is the option using the least memory.

You can compose filters if you need to. This gets confusing unless you use typedefs.

Whether it's idiomatic or not depends on how much you consider boost fundamental. The naming does declare exactly what you are doing, though.

The original use case that got me thinking about this was one with a switch driven by context. If a flag was set, we iterate over everything; if not, we iterate over just the subset. In a for_each example, the naive implementation looks like:

Op op;

If (flag)
{
FilteredOp fop(op, Pred());

std::for_each(seq.begin(), seq.end(), fop);
}
else
std::for_each(seq.begin(), seq.end(), op);

The copy_if example looks like:

Op op;

If (flag)
{
Seq tempSeq;
std::copy_if(seq.begin(), seq end(), std::back_inserter(tempSeq), Pred());

std::for_each(tempSeq.begin(), tempSeq.end(), op);
}
else
std::for_each(seq.begin(), seq.end(), op);

This can be simplified by filtering in a function

class Filter
{
public:
const Seq& getFilteredRange(const Seq& inSeq, book inFlag)
{
if (inFlag)
{
std::copy_if(inSeq.begin(), inSeq end(), std::back_inserter(m_temp), Pred());
return m_temp;
}
return inSeq;
}

private:
Seq m_temp;
};

Filter f;
Op op;
const Seq& toProcess = f.getFilteredRange(seq, flag);

std::for_each(toProcess.begin(), toProcess.end(), op);

Effectively the generator replaces the FilteredOp class, so it's a tradeoff in complexity but clearer at the call site.

The functor can avoid using if/else if implemented as a strategy. This is useful if it will be used multiple times, always with the same value of flag (e.g. passed at the command line).

The boost example looks like:

Op op;

If (flag)
std::for_each(boost::filter_iterator<Pred, Seq::iterator>(tempSeq.begin()), boost::filter_iterator<Pred, Seq:: iterator> (tempSeq.end()), op);
else
std::for_each(tempSeq.begin(), tempSeq.end(), op);

What about that notional filtered_for_each I threw in at the beginning?

Well, it can just be implemented by wrapping the boost version in a template function call. I'm not sure that the syntactic cleanup is better than a typedef, though. Once you have

typedef boost::filter_iterator<Pred, Seq::iterator> PredFilteredIterator;

instead of

template<typename T, typename U, typename V> void filtered_for_each(V& begin, V& end, T& inOp, const U& inPred)
{

std::for_each(boost::filter_iterator<U, V>(begin),
boost::filter_iterator<U, V>(end), inOp);
}

(declaration more complex than that, and needing more policies, but you get the picture...)

then

filtered_for_each(seq.begin(), seq.end(), op, Pred());

versus

std::for_each(PredFilteredIterator(tempSeq.begin()), PredFileredIterator(tempSeq.end()), op);

isn't a big improvement in clarity, and involves a lot more finicky work getting the function definition both correct and general.

It's generally a good idea to avoid for_each() as less expressive (and often less efficient) than the more specific algorithms in the STL. And anything that is complex enough that it doesn't fit any more specific algorithms is frequently complex enough that adding the filtering logic internally on a custom basis may make sense. (An example would be something aimed at generating elements in a collection based on another collection, but with a variable number based on the characteristics of the input parameter. This will not work with std::transform or std::generate_n. If you already have selection logic, integrating filtering logic may very well be more efficient on a custom basis inside your functor than doing so via any form, direct or indirect, of composition. Likewise, if you are processing a set of inputs and converting them into database insertions but skipping some, the field access you are doing to build the database insertions can double for filtering.)

In general, too, using a more precisely targetted algorithm instead of for_each() tends to move complexity out of the functor you have to write. In some cases it can move a lot complexity into library code. (Using remove_if() plus erase() is much, much simpler than implementing the behaviour in a general for loop of any sort.) But even using std::transform plus an inserter to fill a container means that you have separated out the code dealing with the target container from the code doing the element generation, even though the container logic remains in the application space.

For all these reasons putting effort into writing an extended for_each is probably always using energy and attention which can be better expended elsewhere.

Matt Wilson's Extended STL has a chapter on the implementation of filtering iterators. It may be worth emphasizing one thing that he notes; a filtered iterator cannot have the semantics of a random access iterator: not only will indexing be an issue, but so will the concept of distance between two iterators; both can in theory be supported but only a considerable expense, and may give unexpected values. (If we apply a filter F to a sequence of length 10, the effective length of the filtered sequence can't be determined without traversing the entire sequence, and an indexing operator might not even be anything but the end of the sequence (at one extreme) depending on how many elements were filtered out). If you need random access semantics, using copy_if to generate a standard sequence is by far a preferable option.

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Sunt Lacrimae Rerum

Dec. 21st, 2022

Dec. 21st, 2022

Filtering

Profile

April 2025

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags