regex RULES

Regex stands for regular expressions. Don’t you hate it when you look up a word in a digital dictionary and the entry says, “[meaning], ib. 58”. Or, “Id., 207e”.  It forces you to look at the preceding entry, to figure out what exact citation is meant. Of course, because of the way LSJ was printed, that previous entry was often only a line away, so in the print version, all was well. Often, also, the LSJ files were processed so that the indication “ib. 58” was in fact enough to formulate a valid URN (the magic that gets you to the right citation in Diogenes or philolog.us, and in the PhiloLogic4 load of LSJ as well, for the texts that we actually have). But sometimes, the vagueness of ‘ib.’, especially, led the scripts astray. When a friend mentioned seeing an entry like that, I complained that the right search for such incomplete citations at the beginning of entries was just too complex.. That turned out to be a good thing to say. Next morning, there was a regex in my inbox, which I used a few variations on. The final one (for now😱):

\<div2[^\>]+?\>(?!\<\/title|author\>)(.(?!\<\/title|author\>))+(ib|Ib|Id|ibid)\.

(Translation, in an entry, find me an ib or id that is not preceded by an end of a citation of an author or work)

OK, there were some false positives (PHib) but LSJ is now a tiny bit bulkier, but, I hope, easier to use.

Necessary illustration here: https://xkcd.com/208/

Leave a Reply

Your email address will not be published. Required fields are marked *