Sentinels and the STL
May. 8th, 2023 09:24 pm There's an old trick in searching a C-style string that you actually own (i.e. you can't do this in an implementation of strchr(), which requires a const argument): you can use a sentinel to speed up searching.
If you have a 300-character long C-style string, and it's not in read-only memory and is safe to modify, and you want to find the first instance of a character in the string, you can use std::strchr(), but that requires that at every point in the search the function has to check for two conditions: first, is it the character that you are searching for, and, secondly, is it the null/end-of-string character. Or you can very temporarily assign the value of the character you are looking for to the location of that terminal null and use memchr() instead, changing it back when the call is over. If you get the address of the artificial sentinel, there were no instances in the string.
Note that in this context this is a pure, old-fashioned optimization, of the sort that you don't do unless it's genuinely useful, as it complicates the code and makes it more brittle. That being said, it's a very long-established trick which shouldn't be particularly confusing.
However, there are other domains where using a sentinel can be a win from the design and maintenance as well as the efficiency perspective.
I had a set of conditions, settable at run time by command-line options, which set filters on a set of records. These were implicitly anded together - if a record didn't match a criterion, it was out. (There was special handling to allow two or more conditions of certain types to be ored together.)
Writing this as a hardcoded set of if/else choices would have been horrendous. So I implemented them as a family of function objects stored in a vector. The conditions could be set up initially in a fairly straightforward manner. If you set the most common reasons for filtering out up front, you could optimize to reduce the number of comparisons. The conditions could be set up in a simple std::any_of call.
However, if there was filtering out, there was still some stuff to do; this wasn't a simple case where you have to do stuff only when an item was found. So it looked like
if (std::any_of(a.begin(), a end(), [&](const auto& arg) {... condition ...})
(
// Do some stuff
// Return X
}
else
{
// Do some other stuff
// Return Y
}
This is ugly. It's maintainability isn't awful, but it's not great, either. And every run has an extra branch after the end of the STL algorithm.
(Branching can mess with pipelining and slow down performance. This is in addition to being, frequently, maintenance problem points. In many cases both clarity and efficiency argue in favour of replacing branching by polymorphism.)
But I had created these function objects. I could do anything I wanted with their interfaces. (If I hadn't, I could have used a wrapper.) So I added an additional execAfterFind() function to the function object, and all of the real criteria for exclusion had a (common, inherited) implementation corresponding to the if part of the test. I then created a new type which *always* matched and placed it at the end of the vector of tests in every case. It, and it alone, had an implementation of the new function corresponding to the else branch.
Now the call looked roughly like this:
auto foo = std::find_if(a.begin(), a end(), [&](const auto& arg) {... condition ...});
foo->execAfterFind();
This is cleaner overall, not only at this site. What about performance?
For the case where the record ends up not being filtered out, there's probably no gain: unless a really good optimizing compiler optimizes the test away on the "always matches" object through a polymorphic call (unlikely) the new object just moves an if/else test around. There might be a small cache benefit because all the object tests with their functions were allocated one after another and we just *might* have improved some cache locality, but I wouldn't count on that, either.
However, most records are expected to be filtered out. Consider a record that gets booted by the first, most likely, test. In the old implementation there were two successive branches, one for the test, one for the branch after the STL algorithm has run its course. Now there is only the one branch. So we probably gained overall; we're certainly not likely to have made anything worse. So we have an improvement in readability / maintainability and efficiency, both at once.
If your concern were strictly time optimization, and you have the space, and a known, small enough, set of criteria, by the way, this is not the way to go about it. For that you give every condition its own return value as a different power of 2 and use std:: accumulate rather than anything with a test. After running std::accumulate you can use if/else if all you care about is matching at all; otherwise, use an array of 256 (or 128, or whatever best suits your use case; table sizes corresponding to larger sets are probably not ideal unless you really need the speed over the space[1]) function objects addressed from a known point and just invoke them with the returned value as the array index. I do not recommend using jump tables of this sort as an approach supporting more maintainability, though: they are tremendously fragile in the context of code changes. They can be extremely fast.
Even a simple array of two functions can be used if you are setting values as only 1:
std::array<std::unique_ptr<IFoo>, 2> funcs;
//Set up array
...
int val = std::accumulate(... parameters including initial value of 0...);
funcs[val].execAfterFind();
The drawback is that you always visit every test; any_of and find_if truncate your search. You'd have to give very, very careful thought to whether this would actually be a benefit or a cost, and you probably want to do careful profiling over a range of cases. (In the case I had, the majority of records would be screened out quickly; this would not have been an appropriate solution. If most had been retained, that would be another question.) The other drawback is that the table setup is rather more complex and uglier than the preparation for the sentinel approach.
[1] If you have more than 8 tests then the gains from not branching are going to be counterbalanced by the need to process all eight tests rather than short-circuiting as find_if does.