jsburbidge: (Default)
 A few months ago, I ran across a reference by Jo Walton in one of her Tor reading lists to a work by Victoria Goddard, whom I had never encountered before. A quick check around the net revealed a large number of very positive reviews of Goddard's work, so I decided to check her work out.

(Note that as Goddard is self-published, her work is quite reasonable in price if bought as e-books but fairly pricey if bought as hardcopy. E-books are available directly from her website or via various other sites (though not from Google Books).)

Goddard is good, and worth recommending, although she is not quite as good as many of her more enthusiastic reviewers would make her out to be. The discussion below is (of necessity) rather full of spoilers.

Spoilers below... )
jsburbidge: (Default)
 There's an old trick in searching a C-style string that you actually own (i.e. you can't do this in an implementation of strchr(), which requires a const argument): you can use a sentinel to speed up searching.
 
If you have a 300-character long C-style string, and it's not in read-only memory and is safe to modify, and you want to find the first instance of a character in the string, you can use std::strchr(), but that requires that at every point in the search the function has to check for two conditions: first, is it the character that you are searching for, and, secondly, is it the null/end-of-string character. Or you can very temporarily assign the value of the character you are looking for to the location of that terminal null and use memchr() instead, changing it back when the call is over. If you get the address of the artificial sentinel, there were no instances in the string.
 
Note that in this context this is a pure, old-fashioned optimization, of the sort that you don't do unless it's genuinely useful, as it complicates the code and makes it more brittle. That being said, it's a very long-established trick which shouldn't be particularly confusing.
 
However, there are other domains where using a sentinel can be a win from the design and maintenance as well as the efficiency perspective.
 
I had a set of conditions, settable at run time by command-line options, which set filters on a set of records. These were implicitly anded together - if a record didn't match a criterion, it was out. (There was special handling to allow two or more conditions of certain types to be ored together.)
 
Writing this as a hardcoded set of if/else choices would have been horrendous. So I implemented them as a family of function objects stored in a vector. The conditions could be set up initially in a fairly straightforward manner. If you set the most common reasons for filtering out up front, you could optimize to reduce the number of comparisons. The conditions could be set up in a simple std::any_of call.
 
However, if there was filtering out, there was still some stuff to do; this wasn't a simple case where you have to do stuff only when an item was found. So it looked like
 
if (std::any_of(a.begin(), a end(), [&](const auto& arg) {... condition ...})
(
// Do some stuff
// Return X
}
else
{
// Do some other stuff
// Return Y
}
 
This is ugly. It's maintainability isn't awful, but it's not great, either. And every run has an extra branch after the end of the STL algorithm.
 
(Branching can mess with pipelining and slow down performance. This is in addition to being, frequently, maintenance problem points. In many cases both clarity and efficiency argue in favour of replacing branching by polymorphism.)
 
But I had created these function objects. I could do anything I wanted with their interfaces. (If I hadn't, I could have used a wrapper.) So I added an additional execAfterFind() function to the function object, and all of the real criteria for exclusion had a (common, inherited) implementation corresponding to the if part of the test. I then created a new type which *always* matched and placed it at the end of the vector of tests in every case. It, and it alone, had an implementation of the new function corresponding to the else branch.
 
Now the call looked roughly like this:
 
auto foo = std::find_if(a.begin(), a end(), [&](const auto& arg) {... condition ...});
foo->execAfterFind();
 
This is cleaner overall, not only at this site. What about performance?
 
For the case where the record ends up not being filtered out, there's probably no gain: unless a really good optimizing compiler optimizes the test away on the "always matches" object through a polymorphic call (unlikely) the new object just moves an if/else test around. There might be a small cache benefit because all the object tests with their functions were allocated one after another and we just *might* have improved some cache locality, but I wouldn't count on that, either.
 
However, most records are expected to be filtered out. Consider a record that gets booted by the first, most likely, test. In the old implementation there were two successive branches, one for the test, one for the branch after the STL algorithm has run its course. Now there is only the one branch. So we probably gained overall; we're certainly not likely to have made anything worse. So we have an improvement in readability / maintainability and efficiency, both at once.
 
If your concern were strictly time optimization, and you have the space, and a known, small enough, set of criteria, by the way, this is not the way to go about it. For that you give every condition its own return value as a different power of 2 and use std:: accumulate rather than anything with a test. After running std::accumulate you can use if/else if all you care about is matching at all; otherwise, use an array of 256 (or 128, or whatever best suits your use case; table sizes corresponding to larger sets are probably not ideal unless you really need the speed over the space[1]) function objects addressed from a known point and just invoke them with the returned value as the array index. I do not recommend using jump tables of this sort as an approach supporting more maintainability, though: they are tremendously fragile in the context of code changes. They can be extremely fast.
 
Even a simple array of two functions can be used if you are setting values as only 1:
 
std::array<std::unique_ptr<IFoo>, 2> funcs;
//Set up array
...
int val = std::accumulate(... parameters including initial value of 0...);
funcs[val].execAfterFind();
 
The drawback is that you always visit every test; any_of and find_if truncate your search. You'd have to give very, very careful thought to whether this would actually be a benefit or a cost, and you probably want to do careful profiling over a range of cases. (In the case I had, the majority of records would be screened out quickly; this would not have been an appropriate solution. If most had been retained, that would be another question.) The other drawback is that the table setup is rather more complex and uglier than the preparation for the sentinel approach.
 
[1] If you have more than 8 tests then the gains from not branching are going to be counterbalanced by the need to process all eight tests rather than short-circuiting as find_if does.
jsburbidge: (Default)
 The shepherds sing; and shall I silent be?
My God, no hymn for thee?
My soul’s a shepherd too; a flock it feeds
Of thoughts, and words, and deeds.

The pasture is thy word: the streams, thy grace
Enriching all the place.
Shepherd and flock shall sing, and all my powers
Out-sing the day-light houres.

Then we will chide the sunne for letting night
Take up his place and right:
We sing one common Lord; wherefore he should
Himself the candle hold.

I will go searching, till I finde a sunne
Shall stay, till we have done;
A willing shiner, that shall shine as gladly,
As frost-nipt sunnes look sadly.

Then we will sing, shine all our own day,
And one another pay:
His beams shall cheer my breast, and both so twine,
Till ev’n his beams sing, and my musick shine.

-- George Herbert

Filtering

Dec. 21st, 2022 09:55 am
jsburbidge: (Default)
 If you (for C++ developers values of you) happen to be in the happy possession of a C++20 compiler, one facility it provides is the range-based filter view which allows for iterating over a range while filtering out certain elements.

If you don't have one, there are several options.

At a simple level, for use with for_ each(), there's simple composition. If you have a filter predicate Pred for Foo:

class Pred
{
public:
bool operator()(const Foo& inVal) const;
};

And a functor that does something with them, Op:

class Op
{
public:
void operator()(const Foo& inVal);
};

you can always create a composite:

class FilteredOp
{
public:

FilteredOp(Op& inOp, const Pred& inPred);

void operator()(const Foo& inVal)
{
if (m_op(inVal))
m_pred(inVal);
}
private:
Op& m_op;
const Pred& m_pred;
};

(We will refer to this as the naive version. This could easily be turned into a template to do more general composition.)

This works just fine - if all you want to invoke is for_each(). But if you want to use, e.g. transform() or rotate_copy(), it won't work. (Some operations provide a filtered option with their _if variants. Many do not. Many of those operate in such a way that a valid return value is expected for every application. In other cases, e.g. sample(), there is no predicate functor to be extended in this way.)

It is also very slightly more elaborate to write

Op op;
FilteredOp fop(op, Pred());

std::for_each(seq.begin(), seq.end(), fop);

than, say,

Op op;

filtered_for_each(seq.begin(), seq.end(), op, Pred());

even if you legitimately want to use for_each(), but only very slightly.

(The same applies a fortiori if FilteredOp is a closure; the difference lies in how closely you have to look at what is happening to discern intent; a closure has no name to assist the maintainer.)

The next alternative, if you have C++11, is to create a temporary filtered copy using copy_if:

typedef Seq std::vector<Foo>;

Op op;
{
Seq tempSeq;
std::copy_if(seq.begin(), seq end(), std::back_inserter(tempSeq), Pred());

std::for_each(tempSeq.begin(), tempSeq.end(), op);
}

This is not a great improvement on the naive version, and costs more. It does avoid multiplying entities. The big downside is that if you are processing the filtered data once, the copying costs in both time and memory.

It has the advantage of bring idiomatic. And, of course, it works for a use of std::sample().

It is a better choice if you want to process the filtered data in any way more than once - by far the best choice, as the costs of subsequent iterations will be cut by the initial filtering, unless you have memory constraints. (Also note that in C++20, you can separate the filtered and unfiltered elements by using remove_copy_if and have two sequences ready for subsequent operations.)

One other advantage of the copy_if approach is that you can change the nature of the collection - you can, for example, iterate over a vector and insert into a set, effectively carrying out a sort on your filtered items at the same time. This may not be as efficient as copying to a vector and then applying a sorting algorithm - but again, a second stage in processing of this type is the sort of thing the copy_if approach enables.

The other alternative is to turn to boost. Boost has a filter iterator, which skips elements satisfying Pred without doing any modifications. Thus:

Op op;

std::for_each(boost::filter_iterator<Pred, Seq::iterator>(tempSeq.begin()), boost::filter_iterator<Pred, Seq:: iterator> (tempSeq.end()), op);

This works generally.

If you need to filter once only, then this is preferable. (It can also be used to emulate copy_if if you have a C++03 compiler but also have boost, by using it with std::copy.) If you need to operate on the filtered set more than once, it is suboptimal, since every iteration has to visit every element in the full sequence each time - unless you are optimizing memory (large sequences) and care less about time; this is the option using the least memory.

You can compose filters if you need to. This gets confusing unless you use typedefs.

Whether it's idiomatic or not depends on how much you consider boost fundamental. The naming does declare exactly what you are doing, though.

The original use case that got me thinking about this was one with a switch driven by context. If a flag was set, we iterate over everything; if not, we iterate over just the subset. In a for_each example, the naive implementation looks like:

Op op;

If (flag)
{
FilteredOp fop(op, Pred());

std::for_each(seq.begin(), seq.end(), fop);
}
else
std::for_each(seq.begin(), seq.end(), op);

The copy_if example looks like:

Op op;

If (flag)
{
Seq tempSeq;
std::copy_if(seq.begin(), seq end(), std::back_inserter(tempSeq), Pred());

std::for_each(tempSeq.begin(), tempSeq.end(), op);
}
else
std::for_each(seq.begin(), seq.end(), op);

This can be simplified by filtering in a function

class Filter
{
public:
const Seq& getFilteredRange(const Seq& inSeq, book inFlag)
{
if (inFlag)
{
std::copy_if(inSeq.begin(), inSeq end(), std::back_inserter(m_temp), Pred());
return m_temp;
}
return inSeq;
}

private:
Seq m_temp;
};

Filter f;
Op op;
const Seq& toProcess = f.getFilteredRange(seq, flag);

std::for_each(toProcess.begin(), toProcess.end(), op);

Effectively the generator replaces the FilteredOp class, so it's a tradeoff in complexity but clearer at the call site.

The functor can avoid using if/else if implemented as a strategy. This is useful if it will be used multiple times, always with the same value of flag (e.g. passed at the command line).

The boost example looks like:

Op op;

If (flag)
std::for_each(boost::filter_iterator<Pred, Seq::iterator>(tempSeq.begin()), boost::filter_iterator<Pred, Seq:: iterator> (tempSeq.end()), op);
else
std::for_each(tempSeq.begin(), tempSeq.end(), op);

What about that notional filtered_for_each I threw in at the beginning?

Well, it can just be implemented by wrapping the boost version in a template function call. I'm not sure that the syntactic cleanup is better than a typedef, though. Once you have

typedef boost::filter_iterator<Pred, Seq::iterator> PredFilteredIterator;

instead of

template<typename T, typename U, typename V> void filtered_for_each(V& begin, V& end, T& inOp, const U& inPred)
{

std::for_each(boost::filter_iterator<U, V>(begin),
boost::filter_iterator<U, V>(end), inOp);
}

(declaration more complex than that, and needing more policies, but you get the picture...)

then

filtered_for_each(seq.begin(), seq.end(), op, Pred());

versus

std::for_each(PredFilteredIterator(tempSeq.begin()), PredFileredIterator(tempSeq.end()), op);

isn't a big improvement in clarity, and involves a lot more finicky work getting the function definition both correct and general.

It's generally a good idea to avoid for_each() as less expressive (and often less efficient) than the more specific algorithms in the STL. And anything that is complex enough that it doesn't fit any more specific algorithms is frequently complex enough that adding the filtering logic internally on a custom basis may make sense. (An example would be something aimed at generating elements in a collection based on another collection, but with a variable number based on the characteristics of the input parameter. This will not work with std::transform or std::generate_n. If you already have selection logic, integrating filtering logic may very well be more efficient on a custom basis inside your functor than doing so via any form, direct or indirect, of composition. Likewise, if you are processing a set of inputs and converting them into database insertions but skipping some, the field access you are doing to build the database insertions can double for filtering.)

In general, too, using a more precisely targetted algorithm instead of for_each() tends to move complexity out of the functor you have to write. In some cases it can move a lot complexity into library code. (Using remove_if() plus erase() is much, much simpler than implementing the behaviour in a general for loop of any sort.) But even using std::transform plus an inserter to fill a container means that you have separated out the code dealing with the target container from the code doing the element generation, even though the container logic remains in the application space.

For all these reasons putting effort into writing an extended for_each is probably always using energy and attention which can be better expended elsewhere.

Matt Wilson's Extended STL has a chapter on the implementation of filtering iterators. It may be worth emphasizing one thing that he notes; a filtered iterator cannot have the semantics of a random access iterator: not only will indexing be an issue, but so will the concept of distance between two iterators; both can in theory be supported but only a considerable expense, and may give unexpected values. (If we apply a filter F to a sequence of length 10, the effective length of the filtered sequence can't be determined without traversing the entire sequence, and an indexing operator might not even be anything but the end of the sequence (at one extreme) depending on how many elements were filtered out). If you need random access semantics, using copy_if to generate a standard sequence is by far a preferable option.

Editing

Dec. 4th, 2022 05:29 pm
jsburbidge: (Default)
When I was a student, back in the day, I had an Elite portable typewriter which had belonged to my grandfather. (The fact that it was Elite meant I got an extra couple of lines of words per page: N-page essay requirements always assumed Pica, which had fewer lines per page.)

My practice throughout my university days was, invariably, to write out every paper in longhand, edit the draft heavily, and then transfer the edited draft to the typewritten form for handing in, doing a second edit as I went.

A period in publishing as an editor taught me the standard markup formats, which I had not bothered with up until that point. But by that time computers were coming in and editing tended to become a continuous process onscreen; I rarely had the opportunity to use paper editing on my own texts.

(I may add, in passing, and as qualification in what follows, that I still think that printing out and editing a text is the only really effective way to end up with a good text. I have occasionally edited source code in this way. It is the best way, bar none, to attain to brevity.)

These days all my work is onscreen; i don't even have a printer. I do, however, find that the discipline of separating writing and editing remains critical.

I will regularly make a first draft of a class, sleep on it, and decide, on sleeping on it, that the design needs significant changing. This is too close to the original composition to be refactoring; it is, fundamentally, part of the original design process, with no re- about it.

Much of the time this leads to simplification; when it does not, it is because it leads to generalization, more complex in one place, less complex overall.

(I might as well throw in a note about "emergent design". I tend to agree with James Coplien that design is something which has to happen as its own discipline, and can't just emerge from work with concrete classes plus some general patterns principles. When I work on a feature, or on a bug once it has been analyzed, I never work without hewing to an explicit design, even when that design is not written down. But from that perspective design is not so much part of composition but its prerequisite: I couldn't have done that longhand writing without a good sense of what my overall structure was to begin with.)

A lot of the code that I see in production looks as though it was produced by developers who left off as soon as they got the first draft that actually worked. It's verbose and full of copy-and-paste antipatterns. Boolean flags are used instead of strategies and in some extreme cases independent access to global variables is used instead of parameter passing.

It uses idioms which were learned early but are not optimal. For example, most developers started out writing loops using for and while; but in C++ maintainability, clarity, concision, and speed of execution are better served by using the STL algorithms. One might draft out one's thoughts using a for loop, but finished code should have the additional thought put into it of using an appropriate algorithm.

In all cases these are less clear and less maintainable and in most cases also less efficient at runtime. But it's a first cut that's left in that state because it works and people won't edit.
jsburbidge: (Sky)
 I see headlines talking about risks to democracy after the election of a far-right party in Italy. I do not see headlines suggesting that that election shows the weaknesses of democracy in action.

Populist leaders tend not to be antidemocratic, at least by inclination. Many of the distinctive policies of the right-wing populist parties have heavy popular support - though they are policies which tend not to be supported by the major parties, and are frequently policies which run directly into constitutional limits in countries which have such limits (sorry, UK).

A simple example is the death penalty for murder.  In Canada, for years if not decades after the death penalty was abolished, it had broad general support in the population. None of the major political parties supported it (partly because it's very hard to find lawyers who support it - they are too much aware of the possibilities of miscarriages of justice) and so it remained off the agenda of Parliament.

The policies of the current Quebec government under Legault regarding dress and religious symbols, and restricting language choice, have broad support in the province as a whole - so much so that the various federal parties are unwilling to oppose them publicly - but would run directly into Charter challenges were it not for the use of the Notwithstanding clause.

Anti-immigrant policies play well with general populations almost everywhere. Opposition tends to come from an odd alliance of progressives and business groups (who need the labour pool).

The recent experience of COVID, and the current rush back by a majority of the population to "normalcy", including not wearing masks in public (which is, when you consider it, a pretty minimal-cost step) isn't just driven by oligarchic leaders (however much they want people back in the office [1]) but comes up from the grassroots. It does lead to a lack of confidence in the judgement if the people asawhole on other issues.

Populations in general are covering their ears regarding appropriate steps to take on climate change. Acceptance of anthropogenic causes has become general, but willingness to take steps with any immediate cost is present in only a tiny segment of the population.

Much general discourse treats democracy as an end in itself. It isn't. To begin with, "representative democracy" is not, at least as practiced, democracy; it's a way of selecting between governments generally made up of representatives of much smaller slices of the population, generally in the top quintile of income.  This is further tempered in many jurisdictions by permanent civil services (ENArques in France, at an extreme) who represent a broad professional consensus of what policies are acceptable.

Secondly, most jurisdictions constrain political rulemaking by constitutional bills of rights.  These provisions regularly get applied. In some cases (the US Second Amendment, for example) there may be serious issues around the nature of the constraints, but most such rights are unambiguously "good" in principle. Consider the regular striking down of things like minimum sentencing provisions under the Charter, or rulings providing immigrants with some rights of review of immigration board decisions.

Democracies have typically worked better than other choices because they impose more constraints on arbitrary exercise of power. These constraints are intermittent - Liz Truss is essentially an unelected dictator until the next general election (unless she falls to internal party revolt) but they do exist.

I, at least, do not as such want a democratic government so much as I want a just, prescient, and wise government. Unfortunately, nobody has ever devised a method to select for justice, prescience, and wisdom in the rulers.

Churchill's aphorism applied to this. The ideal government may very well be a truly enlightened despot, but it's difficult to find good monarchs, let alone genuinely enlightened ones.[2] Democracy has been the leat bad model we have.

Democracies seem to have worked at their best when rising tides are lifting all boats. But if one current factor in the failure of governments generally to confront issues such as climate change is the failure to counter the pressure of money in politics, a more fundamental failure is the visible strong tendency of populations as a whole, when insecure, to prefer easy but obviously wrong nostrums peddled by populists to realistic but more challenging fixes.

So we have figures like Johnson and Truss, in England, or Poilievre and Smith, in Canada, or Trump and De Santis in the US, or Meloni in Italy, who  peddle long-term poison not despite, but because of, the broad wishes of the population.

The problem, as always, is finding a better solution. There is no obvious practical one - i.e. one reachable from here - on the horizon. And any path which could reach a different structural model would likely have to wade through a fair amount of blood to get there.

[1] Going by the messaging of my own employer's higher echelons, I think that they would be happy to see the offices full of employees all wearing masks, especially as the latter reduces the incidence of sick leave. Instead what they are getting sparse attendance, but almost everyone who shows up is not wearing a mask.

[2]Most monarchs historically were not unconstrained despots; they did a careful balancing act between competing groups of nobles. If the nobility as a whole turned against you, you were gone, or at least in deep trouble (John, Edward II, Richard II, Henry VI, at least, in England).
jsburbidge: (Default)
 I am not, in general, a great defender of As You Like It. The two themes in the air of the time which it made fun of - pastoral and melancholy - have long passed out of the common ken; it has even less plot than most Shakespeare comedies; it has remained popular primarily because of the character of Rosalind.

That being said, it has its points.  It is full of clever speech; it is, I believe, the first Shakespeare play with a Robert Armin fool rather than a Will Kemp fool, and so the first of his philosophic fools. It has a really clear distinction between the comic and everyday worlds which makes it a sort of concentrated template for Shakespeare's other festive comedies. ("How full of briars is this workaday world" over against Arden).

The performance by the Canadian Stage Company at the Dream in High Park mangled the play so badly that none of its virtues survived. This wasn't just the usual cutting in the interests of length, though it involved liberal cutting. It also involved adding extended amounts of slapstick, not just where the text might call for it, but in many places where it could get in only by beating the text over the head. It has the most distracting costumes; I gather, after the fact, that for some reason the production presents all the characters as flowers. The court looks just as bizarre as Arden.

Much of Jaques and Touchstone was mangled or dropped, much to the detriment of both. At least they were not among the players who simply shouted their lines, or abbreviated versions of their lines.

It was, in short, an appalling production. I am disinclined to see any of their other productions.
jsburbidge: (Default)
There is a post on Charlie Stross's blog regarding a pledge by Rishi Sunak to eliminate degrees which lead to less well-paying jobs. Aside from noting that such a programme would likely lead to the abolition of Greats, the degree held by the current PM - as a rule, degrees in classics are not roads to riches - this would seem to be irrelevant, as Tory party members seem to give Liz Truss a pronounced edge (not because she's any brighter, but because she is more in their image).

But what is the value of a university degree? In the STEM area, generally, the "useful" (engineering end of the scale) degrees apparently now have a genuinely useful life of about five years. If you have a degree in pure math, it doesn't age at all, but it is about as useful as a degree in philosophy (which also doesn't age at all, at least if it covered core subjects).

On the other hand, my current employer not only wanted proof of my degrees from the early 1980s in an unrelated field (well, two unrelated fields) but apparently had the same demand of a colleague whose degree is from the late 1970s. It didn't care what they were in - experience rendered that irrelevant - but they certainly wanted proof of graduation.

Whatever Sunak believes, most degrees which are not specialized professional degrees have about the same value: employers want "a degree" for a vast number of middle-class office jobs and don't particularly care what in. For all that university calendars pitch the practical application of the most abstract of disciplines to students, a course of studies spent studying Peter Abelard, Guido da Montefeltro, and Dante is generally just as useful as a credential as one spent studying the most "relevant" of subjects.

Not that the former is very likely, these days. Many if not most smaller universities have abandoned anything even loosely related to the kinds of education which would satisfy anyone with a real appetite for systematic or eccentric knowledge. (Larger universities retain them because they need to support schools of graduate studies across a full range of disciplines.)

I have three degrees, each with its own lesson in later years.

The first was a BA from Trent University. In those days - which are now, I gather, considered part of the "early days" despite my very clear sense that I was nearly a decade after the real early days - it was a reasonable place to go for a small-class humanities degree, even if its tutorial system did not approach real Oxbridge tutorials. I did a major in English Literature and minors in Mathematics and Classics. What I did would now be impossible; the calendar no longer supports the courses I took.

My second was an MA taken with the course work from a doctoral programme at The Johns Hopkins University. I got out because I disagreed with where the discipline (and the humanities in general) were going. I cannot say in retrospect that my assessment was mistaken.

I then proceeded to a law degree at the University of Toronto. I was really the only student in my year who approached it out of an interest in law as such, and got the greatest amount out of courses in jurisprudence and legal history, including a directed research course in legal history. I did not get an offer to article at any of the firms I interviewed at. However, the degree did give me the one actual "practical" use of any of my degrees: it gave me a foothold as a legal editor at a Toronto publishing firm.

While there, I eventually shifted function and became a software developer, which is a long and complex story in itself. By the end of the 1990s I was experienced enough to get a place at a dot com startup, and went from there into development in the financial sector. At no time from that time on did anyone ever show any interest whatsoever in what I had studied at university, or what my grades were.

In retrospect what I "should" have done from a professional point of view was taken the Descartes scholarship the University of Waterloo was happy to offer me and, instead of taking pure math (which was my then current interest) should have taken a course in math and computer science. I would have taken a short cut of nearly 15 years to the same career with better credentials and a better choice of employers. I'm not sure that would have been my best choice otherwise; my collections of classics and mediaevalia argue otherwise. (Though there's certainly an argument to be made that taking a second bachelor's degree in Computer Science rather than going to law school would have been a better idea.)

So what was the economic value of the degrees I have? Relatively limited; indeed, a single four-year degree that I did not take would have almost certainly had a bigger impact than the three degrees I did take. Their benefit was not at the vocational level but at a purely intellectual level. Most of the skills I have I had when I graduated from high school, although with less practice (with the exception of software development, which I did not take up until after I had finished university entirely).

There is a frequently made case for abstract knowledge that it eventually turns out to be more useful than practically-directed research (a classic example is the applicability of Lie algebras to particle physics; or, a level down, of understanding of particle physics and quantum mechanics to the use of semiconductors in computing). I am more inclined to make the argument that abstract knowledge is a value in itself, and that the willingness to support the extension of abstract knowledge is one of the things society is judged on.
jsburbidge: (Default)
 When I work from home, I work online, but I use Bell for internet, Telus for phone, and have an employer who seems not to rely on Rogers at all.[1]. So although I knew that some fellow employees were having to use public WiFi sites because their Rogers connections were down, I gave little thought to the outage until a planned release was deferred on account of the outage, and even then that was because it affected the availability of support staff.  Only when my daughter phoned me to tell me that debit in general was down did I find out that the failure of a single vendor has essentially brought whole blocks of commerce (plus services like 911) to a screeching halt.[2]

Aside from noting that both Rogers and other large services should be looking very carefully at their architectures for redundancy - the easy fix is probably for vendors like Interac which ought to be able to shift to having parallel vendors providing load-balanced access to communications; Ghu knows what Rogers' architecture is like and they are not being very clear - I see that there are calls for steps to be taken to provide more vendors and less dominance by a few vendors. (Essentially two: Bell and Telus share much of the same backbone system.)

This is not a new idea, although usually the reason has been the concern at limited commercial competition, not system reliability. The previous attempts to provide for more vendors have not been successful, at least from the point of view of stability and robustness of the economy as a whole. (The smaller vendors use the large vendors' hardware and rent access in blocks.) This is because the substantial cost of building another backbone is a sizeable barrier to entry.

If the government wants to have another active competitor in the market, it would either have to provide massive subsidies to a startup (this would not fly, politically and perhaps legally) or enter the market itself under a Crown corporation. (Note that the aim of such a corporation would not be to provide monopoly services, as Bell used to or as the LCBO and Ontario Hydro do; that would defeat the purpose. The aim would be to provide more different vendors.) For practical purposes this also means that prices would effectively be set by the government, not just regulated by the CRTC as they are now. (Whatever price was charged by such a Crown corporation would become the de facto ceiling for basic internet services.). It would also see considerable reductions in planned growth for the telcos and probable actual shrinkage of their markets.

Would the mandate of such a company cover all, or most, residents, or would it be confined to the areas more critical to general commerce? Practicality would argue for the former, but politics would probably demand the latter. Costs are higher as a result.

The new system itself would have to provide at every level for a high degree of redundancy and have significant overcapacity in order to handle unexpected eventualities. (Consider an existing vendor choosing to exit the market and its customers moving to the new Crown corporation; and unexpected eventualities are exactly the driving reason for such a system.)

So it would be an expensive, highly contentious, and lengthy initiative which would have to last through multiple governments. (Cheap alternatives like heterodyning IP over the power supply are most useful at the final delivery stage, and do not address the problem of providing for redundancy in the backbone.)

Do I expect this to happen? Not on the basis of a 24-hour incident - though it would be wisest to consider it a shot by the future across our bows.

[1]To the level that it was the only one of the major banks whose debit system was unaffected by the outage. Which didn't help them much, as Interac was affected, which meant that although they were up in principle connections from merchants were down.

[2] I realize that credit was not affected. In some ways that's worse, because the cost of the outage would have fallen disproportionately on the poor, who are less likely to have credit.
jsburbidge: (Default)
 A few days ago, one of my co-workers contacted me about a possible bug in some code he had been going over. Part of the code went back to the original check-in of the code when it was migrated from a system called Harvest about four years ago (and lost all if its history in the process), and a number of lines had my name in them in git blame.

A little bit of checking showed that the algorithm was essentially as it had been four years ago. Several lines were marked as mine because I had converted the macro TRUE to the boolean value true on a couple of lines, and one because I had taken a freestanding C function and turned it into a member of the class in which it operated.  For practical purposes, the code was the same as it had always been - but my name was all over it. In addition, the problem would be expected to take the form of a line having dropped out, and there is no blame tracking attached to deletions.

In actual fact, the conclusion to be drawn was that the code was legacy code. Minor tweaks obscured that fact.

Git blame operates on a per-line basis. But any change to the line - tweaking parentheses, for example, or converting a C-style cast to a C++ cast - makes you the owner of the line.

On a greenfield project where responsibility is doled out in blocks it might be useful, but on a legacy projects it's worse than useless.

By coincidence, I had been looking at the blame record for a makefile the day before.  The makefile had an if-else block where both branches had the same statements. (In other words, there should have been no if-else block, but just the list of statements.) Blame shows five different names associated with the block of the code (all of whom, except (I think) the oldest one have some passive responsibility for the poor structure) but not in any coherent manner. 

When I look at a block of code and want to see its history, I want to see how the algorithm evolved, not how different lines were tweaked while retaining the same algorithm.

You can't avoid this problem as long as your algorithms are line-based. It's a whole different level of difficult to provide a program which divides a program into logical chunks and applies that analysis to the raw record; or (worse) to determine when apparently minor changes create new logic but skip over better implementations using the same logic. (A for loop and a find_if statement may be exactly formally equivalent, but don't expect any automated help to know that.)

So I will continue to avoid git blame. If I really need to look at the history of code .... I'll look up diffs from the history of the codebase and look at them as integral wholes.

Jubilee

Jun. 2nd, 2022 06:24 am
jsburbidge: (Default)
 From Clee to heaven the beacon burns,
      The shires have seen it plain,
From north and south the sign returns
      And beacons burn again.

Look left, look right, the hills are bright,
      The dales are light between,
Because 'tis fifty years to-night
      That God has saved the Queen.

Now, when the flame they watch not towers
      About the soil they trod,
Lads, we'll remember friends of ours
      Who shared the work with God.

To skies that knit their heartstrings right,
      To fields that bred them brave,
The saviours come not home to-night:
      Themselves they could not save.

It dawns in Asia, tombstones show
      And Shropshire names are read;
And the Nile spills his overflow
      Beside the Severn's dead.

We pledge in peace by farm and town
      The Queen they served in war,
And fire the beacons up and down
      The land they perished for.

"God save the Queen" we living sing,
      From height to height 'tis heard;
And with the rest your voices ring,
      Lads of the Fifty-third.

Oh, God will save her, fear you not:
      Be you the men you've been,
Get you the sons your fathers got,
      And God will save the Queen.

- A. E. Housman

God of our fathers, known of old,
   Lord of our far-flung battle-line,
Beneath whose awful Hand we hold
   Dominion over palm and pine—
Lord God of Hosts, be with us yet,
Lest we forget—lest we forget!

The tumult and the shouting dies;
   The Captains and the Kings depart:
Still stands Thine ancient sacrifice,
   An humble and a contrite heart.
Lord God of Hosts, be with us yet,
Lest we forget—lest we forget!

Far-called, our navies melt away;
   On dune and headland sinks the fire:
Lo, all our pomp of yesterday
   Is one with Nineveh and Tyre!
Judge of the Nations, spare us yet,
Lest we forget—lest we forget!

If, drunk with sight of power, we loose
   Wild tongues that have not Thee in awe,
Such boastings as the Gentiles use,
   Or lesser breeds without the Law—
Lord God of Hosts, be with us yet,
Lest we forget—lest we forget!

For heathen heart that puts her trust
   In reeking tube and iron shard,
All valiant dust that builds on dust,
   And guarding, calls not Thee to guard,
For frantic boast and foolish word—
Thy mercy on Thy People, Lord!

- Rudyard Kipling

The Housman poem is from 1887, reflecting the actual meaning of "jubilee" (a fifty-year festival) going back to the Mosaic law. The Kipling poem is from 1897: Victoria was the first English monarch to pass the fifty-year mark (Edward III just managed 50 years). Elizabeth has passed both.

The current monarchy is a bit of a paradox: the great advantage of a monarchy in a parliamentary democracy is that the head of state is not appointed by the government and is in no way beholden to it, providing an independent check on extreme misuse of power. (The risks of a weak head of state can be seen in the facedown of Michaelle Jean by Stephen Harper.) But we want that check only in extremis; in day to day life we want the monarch to be a figurehead only.

Remove

May. 21st, 2022 10:36 am
jsburbidge: (Default)
 A few days ago I was doing a code review in which (in essence) the following loop occurred:

class Foo;

bool matches(const Foo& inVal);

std::vector<Foo> x;

std::vector<Foo>::iterator i = x.begin();
while (i != x.end())
{
    if (matches(*i))
        i = x.erase(i);
    else
        ++I;
}

I raised two issues with it: one, that there was a bug in it (it left out the else, meaning that where consecutive members of the vector matched the condition the second would be skipped).  The second was that the cost was fairly high; at the time I raised this by asking whether it made sense to replace the vector by a deque (reduces the cost for large sets of data) or a list (reduces the cost for all sets of data, but can raise the cost of other operations significantly), and was told that it would be a short vector and that the overall expense would therefore be low. (The cost is high because every time an element is deleted all the following events have to be shifted left once.)

About 24 hours later I asked about the use of remove_if and erase, shown below:

class Matcher {
public:
     bool operator()(const Foo& inVal) const {
         return matches(inVal);
     }
 };

x.erase(std::remove_if(x.begin(), x.end(), Matcher()), x.end());

I received the answer that the use of remove_if required extra work for a minor bit if code and it was left there, as far as that issue went.

But I remained curious.  How much extra work was it? So I actually drew up the code bits above and counted lines. If we leave out the setup (first three lines) the STL code snippet is actually one line shorter.

(This had to be workable in a C++03 compiler. A C++11 implementation with the STL algorithm using lambdas would be shorter still.)

What are the benefits of the STL approach? Well, first, it's not fragile; the bug caught on the code review is impossible in the equivalent STL code. The bug is directly related to another minor weakness of the iterator-based code: because of the variable incrementing logic it can't use an idiomatic for loop, but has to use while().

Secondly, it's faster; remove_if is an O(N) operation and the final erase, which only resets the end of the vector, is very cheap indeed.

Third, it is immediately clear what the code is doing. Because of some additional verbiage in the actual code (plus the missing else, which obscured the firm of the loop) I had to look twice to be sure of what was going on.

This is, in a nutshell, an illustration of why, in general, STL algorithms are preferable to hand-rolled loops: more reliable and generally faster. There are some subtle ways of using find_if and transform in unusual ways[1] but remove_if is clear about what it does.

But it's also an illustration of the implicit prejudice I find in developers against using the STL algorithms. The combined use of remove_if and erase is a very well-known idiom; there is nothing difficult about it. But the perceived - but not actually significant - overhead of having to provide a functor to feed to the algorithm [2] seems to constitute a mental barrier.


[1] For example, using find_if for "process every item in this ordered vector below a given value", for which it is admirably suited. Using for_each is less efficient, as find_if will cut off processing once the relevant value is found or surpassed if your test is written correctly. But you throw away the result of the search, which is not normally expected.

[2]Of course, even that is not real. If your test is in a function, as above, you can just pass a function pointer, although it is likely to be slightly less efficient.  If it's not in a function, it's a logical unit and either should be in a function or in a functor in any case.[3] (In the actual code the test was actually not a function, but a comparison expression. This actually did mean that the number of lines needed for the STL approach would be a little greater, as the test needed to be encapsulated for the STL where it had been merely dropped into the loop.)

[3] The marginal case being where this test is used only here. If it expresses a meaningful concept in the problem domain, though, it's a good bet that it is not.
jsburbidge: (Default)
 Well, not precisely.  But it's method of play would count as cheating for a human, and its measure of skill is based on a "cheating" algorithm.

The Wordle Bot internalises the finite set of all the answers set for the game. Note - and this is important - that this is not a complete set of the five-letter words in English, nor even a complete set of the common five-letter words: it does not, for example, include slats or thine.

At any stage it analyses the finite set for the pattern which will eliminate the most possibilities by what it will include/exclude. (If aeiou were a word, guessing it would allow you to cross off all the words which use letters which it does not use and all the words which do not have green letters in the same place.) Given the size of the set, it can brute-force this; I suspect that it actually internalises the number of words left after each valid word as a starting guess at the beginning of the day, so that it can provide that feedback to the user quickly.

The fact that it always recommends "crane" as an opening word reflects a brute-force analysis. A guess merely based on letter frequencies would omit c and probably include t. Later and rants would be better guesses based on general letter frequencies.

After each guess, it partitions the set into the remaining finite subset of the starting finite subset for that guess which match all the known conditions. It calculates the number of gusses left in the list if each word were chosen, and selects the choice with the smallest value.

If you have guessed a valid English word which is not in the complete set of Wordle answers it assigns that guess a skill of zero.  What it really means is that the user has a bigger vocabulary (which should be a plus, not a minus, in analyzing skill).

There are two points to make:

1) This is effectively cheating. By doing a brute-force analysis against the internal list kept by the game, it performs the equivalent of looking in the back of the textbook for the answer.

2) This is also not paying a game. As with any other game, just using brute force to calculate one's moves - what it effectively recommends as strategy - is not in any meaningful sense (other than the von Neumann and Morgenstern one) a game.

Expected play behaviour is for players to bring their general prior knowledge, in this case a general knowledge of the English language and not of the arbitrary subdomain which is "the set of all Wordle answers".

To approach it in a fair manner it should use as a basis all the five-letter words in the OED. Heuristics allowing solutions would have to take into use the probability that a word is current enough to be considered. (Thine I use every week; I haven't seen hight ("named", not misspelled height) in the wild in current use ever; lossy is a technical term with limited use, etc.)

Plunder

May. 14th, 2022 07:19 am
jsburbidge: (Default)
 Odd how things come up together...

I had just finished West's book on Indo-European myth and in particular noted at the end of it his identification of the deep roots of the comitatus in Indo-European culture - the king as the leader of a band of warriors whom he attached via, essentially, handing out booty or providing the opportunity to pillage.

The next day I was reading a book dealing with (inter alia) the conversion of Anglo-Saxon England, and read: "Seventh-century English kings did not 'govern' in any sense that we should recognize today. Their primary business was predatory warfare and the exaction of tribute from those they defeated. The spoils of successful war - treasure, weapons, horses, slaves, cattle - were distributed to their retainers as payment for past and lien upon future loyalty."

So the pattern described above has deep roots in Indo-European culture. Traditional poetry, whether about the Trojan War or a successful cattle-raid, reflects this.

In Europe generally, it is the Twelfth and Thirteenth Centuries which see the movement away from this pattern, at least at the local leader level. Prior to that point it was common for leaders well below the level of monarch to carry out raids and low-level warfare against their neighbours; it is in this period that the state begins to exert its centralizing powers to curtail this activity.  In areas where national borders were involved it persisted for rather longer (such as the Scots Borders). The general basis for lordship becomes, not the distribution of booty, but the conferring and defending of rights to land (or patents, or other privileges) which can generate a continuing stream of revenue for the holders.

(Booty doesn't entirely go away. Soldiers continued to be given the implicit, and sometimes explicit, permission to pillage on campaign, and even after it became entirely frowned upon (consider Wellington's army in the Peninsular War) remained (and remains) a problem. (How many Allied homes have "souvenirs", like the Roman bust discovered recently in Texas, picked up by soldiers in the Second World War?) But it was no longer the systematic basis of lordship.)
jsburbidge: (Default)
 From a Guardian article on archaeological finds under Notre Dame Cathedral :

"The find included several ancient tombs from the middle ages ..." 

No, it included mediaeval tombs from the middle ages. Ancient tombs would have to go back to the days of Lutetia. These are 13th to 14th Century, not ancient.
jsburbidge: (Default)
 It is worth noting that the decisions to drop vaccination requirements and mask mandates are, insofar as they are data-driven at all, are based, not on an evaluation of how many people may contract COVID-19, but on an evaluation of how many people will end up in hospital, and, in particular, in ICUs.

Put more bluntly, the government doesn't care about people getting sick, they care about hospital overcrowding. (This has been visible and explicit since the start of the pandemic.) They also badly want the whole thing to be "over" by the time of the June election.

It is also important to recognize that the primary benefit of masks - except at the high end, which involves respirators proper - is (1) at a population level and (2) that they protect other people from the mask-wearer more than vice-versa. So saying that people can still choose to wear masks misses the point; the people who choose not to wear masks are likely to skew less cautious in other ways and therefore as higher risk.

Masking is a low-cost high-benefit practice if generally adopted in public places. Dropping a general mask mandate does not just verge on the irresponsible but goes well into that territory.

At an individual level the obvious strategy to take is to limit going to places where there is a significant number of Individuals one does not know and to wear a respirator, not just a cloth mask, when inside public places. Shopping online is still a better option in most cases; if one chooses to shop in brick and mortar locations, relatively smaller locations with better ventilation are better choices. Voting against the government in June is also a good idea - assuming that the opposition parties are willing to back continued restrictions. (I hold no real hope of this.)
jsburbidge: (Default)
 There's a not uncommon use case in programming where you want to do something once in a specific context: the typical example being where you want to log an event the first time, and only the first time, it happens (and it's not part of the program's startup, and typically may never happen at all, but if it happens at all may occur many, many times, which is why you want to limit the logging).

There's a common idiom for this:

static bool logged = false;
if (!logged)
{
    DoLogging(message);
    logged = true;
}

We will assume for the sake of discussion that this is not a routine which can be called by multiple threads at once, which would complicate the issue somewhat.

There are two small problems with this. The first is that it's an ugly little block, especially if it's used multiple times. (You can't extract it into a function called in multiple contexts because that static variable has to be unique for each context.) The second is that it's inefficient: we have to check the flag every time through. That means branching (always inefficient) and, worse, because it's a static variable it will almost certainly not be in the memory cache, making for a slow load.

So what can we do? We need a mechanism which chooses one path once, another path after that, and which neither branches nor uses static data on subsequent runs.

If we think of it in terms of typical one-time activities, we might think of a constructor. We can do the following:

class OneTimeLogger
{
     public:
     OneTimeLogger(const std::string& inMessage)
     {
         DoLogging (inMessage);
     }
 };

In context:

//...do stuff
static OneTimeLogger logger(message);
//...do other stuff

This looks attractive, but actually it solves nothing. First, because it's not a standard idiom it's going to confuse the hell out of a maintenance programmer. Any line which requires a detailed comment saying "do not delete this apparent no-op" is a bad thing. Secondly, it actually hides an expensive if/else switch. The compiler has to emit code checking for a first time on initializing a static object, and, worse, at least post-C++11 it has to make that check, and the initialization, thread-safe. (If this *is* a multi-threaded situation with potential contention, this might mean that the maintenance cost is worth it; it's the simplest thread-safe way of doing this I know. In that case, you might want to add a no-op log() function to the class and call it in every pass, so that it looks normal to a maintainer, although then you have to explain the no-op function where it's defined. The next alternative involves replacing the unique_ptr in the solution below with a shared_ptr or putting a mutex around the point where the smart pointer is updated.)

The expensive part of the test is one-time, but the test is still there, and it's going to be on static data. All that we've done is hidden the if/else test.

The other way out is polymorphism. Assuming that the calling context is in a class and not a freestanding function, we can do the following:

class MyClass
{
    public:

    class IThisEventLogger 
    {
        virtual ~IThisEventLogger() { }
        virtual void log() = 0;
    };

    class OneTimeEventLogger : public IThisEventLogger
    {
                       OneTimeEventLogger(std::unique_ptr<IThisEventLogger>& inParent, const std::string& inMessage):
        m_parent(inParent), m_message(inMessage)
    { }
    
    class NullThisEventLogger: public IThisEventLogger {
    virtual void log() { }
    };

    virtual void log()
    {
        DoLogging (m_message);
        m_parent.reset( new NullThisEventLogger());
    }
    private:
        std::unique_ptr<IThisEventLogger>& m_parent;
        std::string m_message;
    };
 
    MyClass(): m_logger(new OneTimeEventLogger(m_logger, "Message to be logged"))
    { }
    
    void doSomething()
    {
        //...stuff
        m_logger->log();
        //... more stuff
    }
    
    private:
     std::unique_ptr<IThisEventLogger> m_logger;
};

The trick is that the reset call in the logger is a fancy way of doing "delete this" (i.e. committing suicide), which is entirely legal in C++. (Also, passing in the address of the parent while constructing the parent is fine, because nothing happens with it until the object is fully constructed.) We choose to pass the message in the constructor because it reduces the cost of the no-op call to a bare minimum.

The first time log() is called, the message gets printed, and then the execution path for all future calls is changed to a no-op function. We still have a virtual call, but that should be on an object in the cache, and virtual calls are typically cheaper than branching. The call site is simplified; the only time complexity appears is when looking into the very detailed implementation, well away from the business logic in doSomething();

If the logging logic is sufficiently generic, the machinery for managing this can be extracted and reused so that it doesn't clog up the interface of the calling class. If the logic is complicated internally then the logger and the interface will have to be created locally. (If we have to log four variables, two of them integers and one floating-point as well as a string, we want to put the expense of generating the message to log into the one-time call as well, so a generic "pass a string to log as a parameter" may not be a good match, and pre-generating the message in the constructor, as above, may be impossible -- though usually something logged once and for all will have a very generic message).

The downside is that you need a separate logger for every message you want to print once - a classic trade of space for time. Of course, if your parent class is well-designed, it will have limited responsibilities, and the number of instances it will need will be correspondingly small. And the logger itself is cheap to construct - not much larger or costlier than the text of the message it logs; if it builds a custom message its content model is even simpler and its associated costs are even smaller.

ETA: you can make this more generic and more encapsulated by making the smart pointer a custom class which hides the delegation entirely and implements the logging interface (and, naturally, holds a smart pointer as a delegate).
 
jsburbidge: (Default)
 ... will Justin have the guts to follow his old man and say "Just watch me."?
jsburbidge: (Default)
 1) On checking my spam folder, I see that I have received two invitations to join the Illuminati, one in Italian.
 
If the AISB is going to contact anyone it will not be by cleartext e-mails. They will use proper tradecraft.
 
2) It is remarkable just how awful protesters' historical education is. The position of the Prime Minister has always, since the time of Walpole, been determined by the House of Commons. The Crown has no power to dismiss the PM and only very limited powers to prorogue Parliament (essentially, when the PM has lost the confidence of the house, or at the request of the PM). This was firmly established in 1649 and 1688, with tweaks in the 18th Century as the office of the Prime Minister developed.
 
3) I am getting tired of public health officials who are reported in the media as talking about masks and vaccines as though they were purely about individual risk rather than looking at the impact in populations of general adoption/dropping of particular activities. A 30 year old with two doses of vaccine who goes out without a mask is at a low risk of contracting symptomatic Covid and at very low risk of serious disease. But if 30-year olds in general do that, there will be a calculable increase in the spread of COVID to other parts of the population. Wearing a mask or being vaccinated is not principally about personal risk, in many cases; it is about being a responsible member of the body politic and of society.
 
4) Seen in real code, names slightly adjusted: 
 
class XKey
{
    public:
    XKey(const int inIdA, const int inIdB, const std::string& inName):
        m_idA(inIdA), m_idB(inIdB), m_name(inName)
    { }
 
    bool operator<(const XKey& inOther) const
    {
         return (m_idA < other.m_IdA) && 
             (m_idB < other.m_idB) &&
             (m_name < other.m_name);
     }
     private:
     const int m_idA;
     const int m_idB;
     const std::string m_name;
 };
 
Surprising things will happen when you try to use a map with that as a key.
 
Don't do this.
jsburbidge: (Default)
 There's an old adage about being very careful about how you align a system to match its professed goals: if the goals are out of alignment, even though the intent may be clear, at least some actors will try to game the system.

(A classic example in software development is to measure productivity in lines of code, and make compensation dependent on productivity. At best, this provides no incentive for clean, concise code; frequently, it actively encourages coders to write deliberately verbose code to boost their lines of code.)

I have recently observed a specific example of this in another domain, namely, the TTC.

I regularly use a bus route which is relatively short and which is essentially a direct line from the station for most of the route, but splits into a loop at the end. I catch it on the leg going back to the station.

During rush hour, the predictions provided by the feed based on real-time data from the TTC are fairly accurate. Mid-day, they regularly would result in missing the bus.

The route is scheduled with busses N minutes apart, based on the number of busses on the route at the time; every 15, 20, 30 minutes (10 at rush hour).

To maintain the schedule, the drivers are supposed to get to the far point of the loop where there is a stop they can wait at and then come back at a scheduled time.

Because the apps available for TTC predictions track busses in near-real time, I can see what happens: busses wait at the wait point until they are three minutes early, and then leave. This means that what was a ten-minute prediction when I looked at it, and a six-minute prediction at the point they start to move, suddenly collapses by a significant amount.

What is going on? The problem lies with the TTC's metrics. They count a bus as being on time if it is within a three-minute window on either side. This is meant to allow for problems with heavy traffic or construction which drivers can do little about, or for random times when fewer people are waiting for stops than the statistical average, speeding up the bus by reducing the number of stops. It is not meant to encourage drivers to start moving as soon as they are technically "on time", but that is what systematically happens.

They start moving early because they will get to the station "on time", but really three minutes early, and then have a three-minute longer break. (The same drivers tend to arrive on one side of the station, let off their passengers, and then sit there until just before they have to leave, proceeding to their loading bay only at the last moment, even in very unpleasant weather. The TTC consistently conveys a sense of being run for the benefit of its employees rather than its customers.)

At rush hour there are no delays built into the schedule; the only divergences from longer-term live predictions will be as a result of heavy traffic, which is rare on this route.

This leaves users with no recourse. The drivers have committed no formal infraction. The TTC could analyze the collected route data and penalize this behaviour if a driver engages in it systematically, but I'm willing to bet that the likelihood of the Union to grieve such a change in the rules is a barrier to such a step.

Profile

jsburbidge: (Default)
jsburbidge

April 2026

S M T W T F S
   1234
567891011
12 131415161718
19202122232425
2627282930  

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Apr. 14th, 2026 11:15 pm
Powered by Dreamwidth Studios