The Dust-Heap of the Database and the Specters of the Spectator

T

In 2014, networks launched some 1,715 new television series, a staggering number that prompted many articles to declare variations on the theme “there are too many shows to watch.” Same story, different medium, I say. Franco Moretti, a contemporary literary scholar, writes that while twenty-first century Victorianists may (may) read around two-hundred Victorian titles, that barely counts as a drop in the bucket of the 40,000 titles published in the nineteenth century. And the other 39,800 novels? The short version: gone. The longer version: maybe not.

The plethora of “lost” Victorian novels challenges any sweeping claims about Victorian society based on the fourteen or so (depends on how you count) full-length novels of Charles Dickens. But it becomes even more daunting if one’s studies include explorations of Victorian popular magazines and journals. The Waterloo Directory of English Newspapers and Periodicals 1800-1900 lists 50,000 titles. If each of those titles published a single, twenty-page issue—and certainly they published more—that alone would amount to 1,000,000 pages to read.

The imbalance between what we read, what we could read, and what we can’t read makes Victorian studies (and, I suspect, other historical studies) a strange beast. Any decent Victorianist monograph will address the familiar tunes (Dickens, the Brontës, Eliot, etc.), but it will probably do so through ephemera and periodicals that maybe only the author has read thanks to hours of archival digging. The internet makes the strange Victorian studies beast even stranger. The internet not only changes how I do history because I can do most of my archival work from the back corner of Mello Velo (the local coffee shop, to which I owe my doctorate, whenever I finally defend). Historical research online changes academic reading practices, the kinds of arguments we can make, and finally, how we teach historical reading in the classroom. Internet archives make available texts virtually nobody has read. Electronic archives offer the chance to reinvigorate the dust-heap of forgotten novels—although with the change in what we can read, there comes an inevitable and sometimes ineffable change in how we read. It also makes it possible to discover a text nobody has read, without leaving the comfort of your favorite coffee shop table.

And yet, when I say a text nobody has read, this isn’t quite true. These texts do not simply appear on one’s screen. These historical documents already bear the marks of their nineteenth-century readers, but they now bear the marks of my search terms, the database algorithms and tags, scanners, computer processing, and somewhere in a basement, other people who plugged this material into the database. These extra, mostly ineffable hands mark the text like the fingerprint of electronic ghosts—and these spectral hands can sometimes offer us bizarre, fortuitous accidents.

I’m sorry, Peter. I’m afraid I you can’t read that.

Here’s an example. My dissertation is in part about Charles Dickens, because of course it is. I’m also heavily invested in Victorian literary criticism; that is, as opposed to Victorianist literary criticism of the twentieth- and twenty-first centuries, I gravitate toward the theories and ideas the Victorians themselves used to analyze their own work.  I’m specifically interested in Dickens’s serial publications (stories told in installments, like a modern television show), and I wanted to see what the Victorians thought about serialization.

So, off I go to sundry databases and metadatabases, where I search terms like “serial,” “part,” “periodical,” “novel,” and “publication.” As part of my search, I examined the Spectator Archives (1.5 million pages, by the way), where I found this priceless artefact: “Doe’s Oliver Twist.”

Wait, didn’t Dickens write Oliver Twist? you ask. Who on earth is “Doe”?

Welcome, Dear Reader, to the dust-heap of the archival database. Archives like the Spectator Archive use something called Optical Character Recognition (OCR), which is the process by which a computer converts scanned images of pages from something like an 1838 edition of a magazine into searchable text. It’s built in part by programs like reCAPTCHA, the obnoxious text you have to enter before buying or registering at some websites to prove that you’re a human, because only humans scream obscenities at their computers after the thirtieth failed entry.  It’s pretty incredible, when you think about it.

And it’s also terrible, as proven by the title: the Spectator Archive’s OCR rendered “Boz” as “Doe.” Wait, didn’t Dickens—

Yes, Dickens wrote Oliver Twist. But before that, he published Sketches by Boz, a series of wonderfully liberal musings on life in London. And so, when Dickens began to serialize Oliver in Bentley’s Miscellany in 1837, the author’s name was “Boz.” But the Spectator Archive doesn’t know that. In fact, it doesn’t know anything. It’s a scanner, and a computer that runs OCR software, tags its garbled production, and then throws it into the ether for some random grad student to stumble across. And behind that, someone—probably a random grad student or intern—in the basement of the Spectator building on Old Queen Street—could have read this article. Because someone had to put the page on the scanner and press “go.” Behind the Spectator is a series of spectral readers: the Victorians who may have read the article in 1838, the person who scanned the article, the scanner, the computer, the series of algorithms and programs that brought me from Google to the Archive and to that article.

“Doe’s Oliver Twist” is a gold-mine for Victorian theories of reading, serial publication, and distinctions between common readers and academic readers. But in order to find it, one has to enter the right search terms, and—here’s the real punchline—those search terms may abound in a document and not show up in the algorithm because the OCR is wrong. But there’s one final twist, and it isn’t Oliver.

deadpeople

No, it’s not that, either.

In fact, “Doe’s” showed up in my search results because something was OCR’d incorrectly. While it thought it recognized one of my terms, in fact, that term does not appear in the document.

Internet archives allow scholars to dive into the dust-heap of history. In their clunky, unintuitive ways, they cough up garbage and leave us to sort the mess. And as I will argue in future posts, they fundamentally alter the ways we perform these readings. Welcome to twenty-first century history: a tangled heap of trashed treasures and treasured trash.


 

Cover image: Stone, Marcus and Dalziel. The Bibliomania of the Golden Dustman. Scanned by Phillip V. Allingham. Victorian Web.

Peter Katz is a fifth-year Ph.D. student in Victorian Literature and Culture. His dissertation focuses on sensation fiction, the history of science, and the history of the novel.

About the author

Peter Katz
By Peter Katz

Subscribe

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 4 other subscribers

Recent Posts

Social Media