It's worse than you think. Thus was citation data going back to
pre-Confederation times, and it had been originally input by low-paid temp
data entry sub-editors, not trained compositors.
That project - the one where the input took place - had been run by a prime
example of Dunning-Kroeger syndrome, and the data from it was utterly
useless, for reasons I won't go into here. So we had to go for the tapes
from the compositors, which was classic format-only markup.
I used flex as a front end (with some frightening regexps) with a C++
engine behind it, and a ton of lookups to regularise to known data.
A good block of the core regexp logic ended up being reused for an entirely
different purpose - marking up citations via pattern-recognition and
database lookups in report data input in the Philippines and, later still,
coming in from the courts, and for all I know is still used. The last I
heard of it was a few years ago when I ran into an old friend of my
sister's who had met someone from Carswell who, when she mentioned my name,
found that Carswell wss basically running my code but had nobody with the
skills to maintain it, as they had moved away from C and C++ to Omnimark
and other 4GL tools.
no subject
It's worse than you think. Thus was citation data going back to pre-Confederation times, and it had been originally input by low-paid temp data entry sub-editors, not trained compositors.
That project - the one where the input took place - had been run by a prime example of Dunning-Kroeger syndrome, and the data from it was utterly useless, for reasons I won't go into here. So we had to go for the tapes from the compositors, which was classic format-only markup.
I used flex as a front end (with some frightening regexps) with a C++ engine behind it, and a ton of lookups to regularise to known data.
A good block of the core regexp logic ended up being reused for an entirely different purpose - marking up citations via pattern-recognition and database lookups in report data input in the Philippines and, later still, coming in from the courts, and for all I know is still used. The last I heard of it was a few years ago when I ran into an old friend of my sister's who had met someone from Carswell who, when she mentioned my name, found that Carswell wss basically running my code but had nobody with the skills to maintain it, as they had moved away from C and C++ to Omnimark and other 4GL tools.