The Dust-Heap of the Database and the Specters of the Spectator
In 2014, networks launched some 1,715 new television series, a staggering number that prompted many articles to declare variations on the theme “there are too many shows to watch.” Same story, different medium, I say. Franco Moretti, a contemporary literary scholar, writes that while twenty-first century Victorianists may (may) read around two-hundred Victorian titles, that barely counts as a drop in the bucket of the 40,000 titles published in the nineteenth century. And the other 39,800 novels? The short version: gone. The longer version: maybe not.
The plethora of “lost” Victorian novels challenges any sweeping claims about Victorian society based on the fourteen or so (depends on how you count) full-length novels of Charles Dickens. But it becomes even more daunting if one’s studies include explorations of Victorian popular magazines and journals. The Waterloo Directory of English Newspapers and Periodicals 1800-1900 lists 50,000 titles. If each of those titles published a single, twenty-page issue—and certainly they published more—that alone would amount to 1,000,000 pages to read.
The imbalance between what we read, what we could read, and what we can’t read makes Victorian studies (and, I suspect, other historical studies) a strange beast. Any decent Victorianist monograph will address the familiar tunes (Dickens, the Brontës, Eliot, etc.), but it will probably do so through ephemera and periodicals that maybe only the author has read thanks to hours of archival digging. The internet makes the strange Victorian studies beast even stranger. The internet not only changes how I do history because I can do most of my archival work from the back corner of Mello Velo (the local coffee shop, to which I owe my doctorate, whenever I finally defend). Historical research online changes academic reading practices, the kinds of arguments we can make, and finally, how we teach historical reading in the classroom. Internet archives make available texts virtually nobody has read. Electronic archives offer the chance to reinvigorate the dust-heap of forgotten novels—although with the change in what we can read, there comes an inevitable and sometimes ineffable change in how we read. It also makes it possible to discover a text nobody has read, without leaving the comfort of your favorite coffee shop table.
And yet, when I say a text nobody has read, this isn’t quite true. These texts do not simply appear on one’s screen. These historical documents already bear the marks of their nineteenth-century readers, but they now bear the marks of my search terms, the database algorithms and tags, scanners, computer processing, and somewhere in a basement, other people who plugged this material into the database. These extra, mostly ineffable hands mark the text like the fingerprint of electronic ghosts—and these spectral hands can sometimes offer us bizarre, fortuitous accidents.
I’m sorry, Peter. I’m afraid I you can’t read that.
Here’s an example. My dissertation is in part about Charles Dickens, because of course it is. I’m also heavily invested in Victorian literary criticism; that is, as opposed to Victorianist literary criticism of the twentieth- and twenty-first centuries, I gravitate toward the theories and ideas the Victorians themselves used to analyze their own work. I’m specifically interested in Dickens’s serial publications (stories told in installments, like a modern television show), and I wanted to see what the Victorians thought about serialization.
So, off I go to sundry databases and metadatabases, where I search terms like “serial,” “part,” “periodical,” “novel,” and “publication.” As part of my search, I examined the Spectator Archives (1.5 million pages, by the way), where I found this priceless artefact: “Doe’s Oliver Twist.”
Wait, didn’t Dickens write Oliver Twist? you ask. Who on earth is “Doe”?
Welcome, Dear Reader, to the dust-heap of the archival database. Archives like the Spectator Archive use something called Optical Character Recognition (OCR), which is the process by which a computer converts scanned images of pages from something like an 1838 edition of a magazine into searchable text. It’s built in part by programs like reCAPTCHA, the obnoxious text you have to enter before buying or registering at some websites to prove that you’re a human, because only humans scream obscenities at their computers after the thirtieth failed entry. It’s pretty incredible, when you think about it.
And it’s also terrible, as proven by the title: the Spectator Archive’s OCR rendered “Boz” as “Doe.” Wait, didn’t Dickens—
Yes, Dickens wrote Oliver Twist. But before that, he published Sketches by Boz, a series of wonderfully liberal musings on life in London. And so, when Dickens began to serialize Oliver in Bentley’s Miscellany in 1837, the author’s name was “Boz.” But the Spectator Archive doesn’t know that. In fact, it doesn’t know anything. It’s a scanner, and a computer that runs OCR software, tags its garbled production, and then throws it into the ether for some random grad student to stumble across. And behind that, someone—probably a random grad student or intern—in the basement of the Spectator building on Old Queen Street—could have read this article. Because someone had to put the page on the scanner and press “go.” Behind the Spectator is a series of spectral readers: the Victorians who may have read the article in 1838, the person who scanned the article, the scanner, the computer, the series of algorithms and programs that brought me from Google to the Archive and to that article.
“Doe’s Oliver Twist” is a gold-mine for Victorian theories of reading, serial publication, and distinctions between common readers and academic readers. But in order to find it, one has to enter the right search terms, and—here’s the real punchline—those search terms may abound in a document and not show up in the algorithm because the OCR is wrong. But there’s one final twist, and it isn’t Oliver.
No, it’s not that, either.
In fact, “Doe’s” showed up in my search results because something was OCR’d incorrectly. While it thought it recognized one of my terms, in fact, that term does not appear in the document.
Internet archives allow scholars to dive into the dust-heap of history. In their clunky, unintuitive ways, they cough up garbage and leave us to sort the mess. And as I will argue in future posts, they fundamentally alter the ways we perform these readings. Welcome to twenty-first century history: a tangled heap of trashed treasures and treasured trash.
Cover image: Stone, Marcus and Dalziel. The Bibliomania of the Golden Dustman. Scanned by Phillip V. Allingham. Victorian Web.
Peter Katz is a fifth-year Ph.D. student in Victorian Literature and Culture. His dissertation focuses on sensation fiction, the history of science, and the history of the novel.
You may also like
Related
No comments
Archives
- September 2024
- February 2024
- January 2024
- October 2023
- May 2023
- March 2023
- February 2023
- December 2022
- November 2022
- October 2022
- May 2022
- April 2022
- March 2022
- February 2022
- December 2021
- November 2021
- October 2021
- June 2021
- May 2021
- April 2021
- March 2021
- March 2020
- February 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- September 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- September 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
Reblogged this on Queerly Different.
Very enjoyable read. How fascinating that there are so many Victorian era titles out there that are now becoming discoverable. It’s exciting and overwhelming to think about, and I wonder if the further reading of these forgotten texts may bring about re definitions of the Victorian era and if its literature.