the books they were based on, in English and French. We used some of the methods from DSC #9: Text-Comparison-Algorithm-Crazy Quinn to compare the graphic novels vs. Scanning and OCRing the whole original English BSC corpus was one thing in fall 2019 (when, in retrospect, all things felt attainable) slogging through so many French translations during 2020 was another matter.Īt the beginning of the fall, Lee and I took a look at the French translations of the graphic novels that we’d explored in DSC #5: The DSC and the Impossible TEI Quandaries for a talk called “ Layered Adaptation and Warped Nostalgia: Francophone Translations of the Baby-Sitters Club Graphic Novels” at virtual Flyover Comics. Other days, I shuddered at the prospect of fixing apostrophes and characters with accents on the clunky Windows laptop running ABBYY FineReader. Some days scanning and then proofreading the OCR seemed attainable. (Her daughters are fans of the French graphic novels.)īut the pandemic wore on and on.
The Czur ET 16 plus showed up on my 35th birthday and was the best present I could imagine many thanks to Cécile Alduy, my department chair, for making it happen. Cut off from the flatbed scanner I’d used at work due to the lockdown, I inquired if my department could buy me a document camera / scanner that I’d seen on an Instagram ad, of all places. Lee arranged for a used bookseller in Quebec to ship every one of their Les Baby-Sitters books to my house. We raced forward into April 2020, adopting translation corpus-building as a pandemic hobby. While everyone else was buying up all the yeast and flour and baking bread with it, Lee and I were scraping a fan wiki and making the ultimate metadata sheet for our corpus. In retrospect, it’s easy to recognize that book for what it was: 10,000 words of early-pandemic stress energy. It’s been over a year since Lee and I cleaned up ghost cat data-hairballs with webscraper.io and OpenRefine.