One-time conversions are a different beast; I spent a chunk of time at Carswell doing that sort of thing back in the 90s. It's reasonable to use iterative approximations to deal with the Pareto 90/10 distribution of difficulty, and reuse tends to be only very basic parts of the toolkit.
I recall particularly converting ten volumes of typesetting files into SGML so that the content could be edited for semantic correctness and then uploaded into an Oracle database, doing as much regularisation of content automatically as possible at the same time, e.g. standardising court abbreviations.
no subject
Date: 2020-01-24 12:49 pm (UTC)I recall particularly converting ten volumes of typesetting files into SGML so that the content could be edited for semantic correctness and then uploaded into an Oracle database, doing as much regularisation of content automatically as possible at the same time, e.g. standardising court abbreviations.