# Recovering the London Stage Information Bank: Lessons from an Early Humanities Computing Project

## Abstract

This paper traces the little-known history of the London Stage Information Bank, a digital initiative that ran from 1970 to 1978 under the direction of Professor Ben R. Schneider, Jr. at Lawrence University. With support from the National Endowment for the Humanities, the American Council of Learned Societies, and the Mellon Foundation, Schneider’s team produced a database from the multi-volume reference work The London Stage 1660-1800 (Southern Illinois University Press, 1960-68). Today, however, most of the project’s outputs are lost or damaged, and its history has been largely forgotten in both theater studies and eighteenth-century studies. This essay traces the history of the Information Bank and my efforts to recover its damaged data and code, offering the project as an object lesson in questions of access, preservation, and institutional memory that digital humanities practitioners continue to confront in 2017. I argue that the project faded into obscurity, not only because of technological obsolescence, but also because the development team was unable to promote the kinds of research questions and behaviors that would enable their tool's widespread adoption and survival. The indifference of literary and theater scholars to the Information Bank throughout the late 1970s and early 1980s demonstrates how vital it is that digital and computational humanities work articulate its meaningfulness within existing intellectual and disciplinary traditions. While digital scholars build new avenues for inquiry that expand and transform humanities research, the survival of these approaches depends on their relationship to current humanities questions, methods, commitments, and epistemologies.

The year is 1970. Scholars of British theater are celebrating the recent publication of The London Stage, 1660-1800, an 11-book, 8000-page calendar of theatrical performances in London over a 140-year period. After more than a decade conducting the archival and editorial work necessary to produce this reference work, the advisory board seeks outside help constructing an index, which they hope will allow researchers to cross-reference information about a given play title or actor across volumes. The theater scholar they choose for the job, a dabbler in the new field of humanities computing, is confident that a program can perform this task better than a person. He assembles an international team of humanities scholars, technologists, and students, garners hundreds of thousands of dollars in private and federal funding, and spends the better part of a decade producing a database and a suite of tools for accessing and manipulating it. In the end, however, his team’s work is lost to technological obsolescence, as well as to the indifference of a scholarly community unconvinced of the need for such digital and computational approaches to humanities research.
This is the story of the London Stage Information Bank, an early humanities computing project that presents irresistible parallels to many issues facing digital humanities today. The project ran from 1970 to 1978 under the direction of Professor Ben Ross Schneider, Jr. at Lawrence University in Appleton, Wisconsin, and it was enabled by grants from the National Endowment for the Humanities, the American Council of Learned Societies, the Mellon Foundation, and other major funders. Today, however, most of the project’s outputs are lost or damaged, its history largely forgotten. This essay presents a study of the archival record and material artifacts of this project, positioning the Information Bank as an object lesson in issues of access, preservation, and institutional memory that digital scholars continue to confront in 2017.
More than a mere cautionary tale about data loss, however, the history of the Information Bank illuminates the need for researchers to articulate the relationship of digital research methods to existing intellectual and disciplinary traditions. I will argue that the project faded into obscurity, not only because it was ahead of its time or because the rapid advancement of computing technology took the team by surprise, but also because Schneider made key assumptions about the intellectual dispositions of his user base. Proceeding from those assumptions, his team did not promote the kinds of research practices that would ensure their tool’s widespread adoption and survival. Their presentations and publications exhibited the capabilities of the database but largely assumed that its usefulness would be self-evident–a tendency that continues today in many demoes of digital humananities datasets and tools. While digital scholars can and should build new avenues for inquiry that expand and transform humanities research, the survival of these approaches depends on their relationship to current humanities questions, methods, commitments, and epistemologies. Digital humanities practitioners must model the modes of inquiry our work enables and demonstrate that these modes allow us to produce new, exciting knowledge that is legible to humanists who do not identify as digital scholars.
This essay first presents a history of the London Stage Information Bank and my ongoing effort to recover it, then synthesizes the lessons that can be learned from the project and suggests future applications of those lessons. In doing so, it contributes to the growing body of work that recognizes the importance of a reciprocal exchange of perspectives between theater and performance studies and digital humanities. As Debra Caplan has recently pointed out, the historical lack of interaction between the two communities has led to serious issues of accessibility and preservation in digital theater studies: “Without disciplinary-wide best practices for creating, funding, disseminating, or reviewing digital humanities work, projects are often idiosyncratic in the technologies and approaches they choose, and projects are not always equally accessible”  [Caplan 2015, 358–59]. At the same time, as Sarah Bay-Cheng has argued and as Caplan echoes, theater studies researchers also have much to offer to the digital humanities, particularly in their long tradition of attention to the ephemerality of interactions–a problem that is inherent both to performance history and to digital media and culture [Bay-Cheng 2012].[1] This interplay between ephemerality and durability, dramatized in the ability of a fleeting performance event to leave traces and create impacts that ripple out across time, resonates with recent media archaeological work within digital humanities. Jussi Parikka reminds us that digitally encoded information, often imagined in terms of its “immaterial virtuality,” is in fact dependent on “hardware, software, and other material contexts” that are “prone to deterioration;” for Parikka, “[t]he digital is not eternal, nor is it simply ephemeral”  [Parikka 2012, 118–19]. Matthew Kirschenbaum similarly highlights the tension between the mutability and resilience of digital inscriptions on physical media in his book Mechanisms: New Media and the Forensic Imagination [Kirschenbaum 2008], his emphasis often falling on the curious survival of seemingly evanescent digital texts.[2] This is a tension that accords especially well with theater studies, where an interest in ephemerality must be balanced with an awareness of the mediated afterlife of performance.
In both theater studies and media archaeology, then, poststructuralist notions of the “always already disappearing” object of study are confronted by a growing awareness of the stubborn material residue of those objects’ transmission [Bloom et al. 2013, 167]. The London Stage Information Bank exemplifies this tension: it was lost to technological change, yet it survives in at least partially recoverable forms. Furthermore, notions of residue and resonance drive us to recognize not only how the material stuff of hardware, software, and data persist, but also how more abstract artifacts–data models, for instance, that reflect particular orientations toward the objects under consideration–are passed down and inherited from collection to collection, from tool to tool.[3] My attempt to recover both the history and the material artifacts of this forgotten project illustrates the productive interplay between the theoretical orientations of theater history, performance studies, and media archaeological approaches within digital humanities.

# History of the Information Bank

The eleven books of The London Stage, 1660-1800: A Calendar of Plays, Entertainments & Afterpieces, Together with Casts, Box-Receipts and Contemporary Comment. Compiled from the Playbills, Newspapers and Theatrical Diaries of the Period were published by Southern Illinois University Press between 1960 and 1968. These books contain extensive information about nearly all recorded performances thought to have taken place in London over the course of the long eighteenth century, based on archival evidence held in libraries throughout the U.S. and the U.K. The volumes are organized by theatrical season, with a typical entry representing a performance at a specific theater on a specific evening. It usually gives the date or approximate date of the performance, the title of the main play staged, the cast list if known, and any other entertainments that accompanied the main attraction. Some entries specify the amount of money that the theater made that evening, mention prominent audience members, or detail the provenance of the information in the entry (Figure 1).
The London Stage is clearly a valuable resource for theater history, but it also has serious limitations. One major issue is that its indices are volume-specific, making it difficult to trace particular actors or play titles across decades. This obstacle has been partially redressed by the recent release of searchable full-text scans of the reference books through HathiTrust, but one must still perform a single keyword search on each of the eleven books individually.[4] Importantly, neither the reference books nor their digitized counterparts can readily support complex queries that go beyond indexing to describe relationships among objects and persons, a limitation that has been recognized from the beginning. Schneider and his programmer partner, Will Daland, detailed the books’ limited operability in 1971, in an article that appeared in the journal Computers and the Humanities: “If, for example, one wished to determine how many times actor X and actress Y performed in the same play together during their careers, it might be necessary to scan a period of fifteen or twenty years (possibly 800 to 1,000 pages) to exhaust all the possibilities of intersection”  [Schneider and Daland 1971, 209]. The previous year, Schneider had been approached by the editorial board of the series to create a computerized index of all of the volumes.[5] By 1979, Schneider’s team had published The Index to The London Stage, which contained entries for the entities they thought researchers would be most interested in, such as actor names; in the process, they created the Information Bank. Schneider’s Index, now available on HathiTrust, can provide some guidance for researchers, but it represents only a fraction of all possible questions one could ask of the database.
It is possible to reconstruct the workflow involved in the transformation of the reference book into data, using the introduction to the Index along with a memoir Schneider wrote about the process titled Travels in Computerland [Schneider 1974]. Schneider recruited a small group of editors, mostly PhD students in eighteenth-century British theater, who were promised first use of the new tool in exchange for their contributions to the database. The editors, along with Schneider, created a marked-up copy of the printed reference book, using colored pencils to denote items that were to be coded in specific ways, such as performance headers, cast lists, or extraneous text that should be delimited from the main entry.[6] The hand-annotated pages were then shipped to China Data Systems in Hong Kong, where professional typists transcribed the marked-up text. They simultaneously standardized elements like punctuation according to rules defined by Schneider, and they also coded the editorial markups using a pre-defined custom schema.
The transcribed and coded text was sent to Information Control Incorporated in Kansas City, where it underwent Optical Character Recognition (OCR). The results were stored on magnetic tapes that were returned to Appleton, where an interactive program called ICIFIX, created by Daland, was used to perform additional correction and standardization.[7] Daland also developed a suite of programs, using the PL/1 language, to perform a variety of tasks: translating the markup used by the typists into human-readable tags, expanding the cast lists that were abbreviated by the reference book editors to save on production costs, and sorting or querying the data according to predetermined fields (date, theatre, title, role, actor, type of act) as well as tagged named entitles (historical people and places, textual sources).[8] These programs were combined into a system called GWSJR1 (after George Winchester Stone, Jr., one of the original London Stage editors and patron of the Information Bank project) and stored on an IBM 2311 disk.
During the project’s second phase–outlined in a second memoir by Schneider, titled My Personal Computer and Other Family Crises [Schneider 1984]–a programmer named Reid Watts developed a word-processing and concordance program called SITAR specifically for the project. Completed in 1974, SITAR then was used by as many as eighteen undergraduate student assistants to iteratively edit the underlying data for errors and inconsistencies before it was run through GWSJR1 to produce the Index[9] (Figure 2).
The Index was completed in 1978 and published in 1979, signaling the end of the grant-funded phases of the project. However, Schneider continued to try to establish an ongoing maintenance and preservation plan for its various products. The underlying data for the Information Bank was stored on tapes that Lawrence intended to host in perpetuity, fulfilling scholarly inquiries at the base cost of computing power, in accordance with NEH guidelines (Figure 3). However, at the end of the 1970s, Lawrence stopped paying to time-share the computer they were using at a nearby research institute. In 1980, approaching retirement, Schneider tried to find a new home for the database where it would have ongoing technical support. Unfortunately, no one wanted to host it–in part because the technology was becoming gradually more obsolete, and in part because there had turned out to be very limited scholarly interest in the Information Bank.[10] In 1983, the tapes were transferred to the Harvard Theatre Collection, as a storage repository rather than as a new base for operations. Shortly thereafter, the curators lent out the tapes to be migrated onto a new medium, at which point they appear to have been lost.[11]
As this brief sketch of the London Stage Project’s history suggests, it was a highly ambitious endeavor involving dozens of personnel, including professional typists, OCR experts, programmers, graduate student editors, and numerous undergraduate assistants. It was awarded a total of $200,000 in funding over eight years, the equivalent of roughly$750,000 today.[12] It was considered a success insofar as it produced a number of significant deliverables. One of these was a flat-file database of the original reference book that categorized the data within the performance entries. Other products included a system (GWSJR1) that expanded the abbreviated cast lists and allowed the data to be queried based on pre-defined categories; an interactive program (SITAR) that allowed the data to be edited and updated iteratively by non-programming specialists, and that also enabled concordance-style sorting and searching of the data; and the printed Index to the London Stage, which contains about 500,000 references to over 25,000 items.
Despite the successes of the project, however, there are serious material and technological barriers to accessing it today. Not only were the data tapes lost after being transferred to the Harvard Theatre Collection, but even if they could be found, and even if an appropriate machine could be located to read them, it is likely that they would be materially degraded and might be damaged by being run through a machine. Given these realities, it is possible to see the London Stage Information Bank as a cautionary tale of a near-decade of work lost. The next section explores in more detail why the scholarly community was uninterested in the Information Bank and what lessons the project has to offer for digital humanities work today.

# Lessons from the London Stage Project

In order to discover why the Information Bank was lost to history, I turned to Schneider’s own writings about the project–the two memoirs mentioned above, along with a series of articles published in scholarly journals and edited collections–as well as to the records of the project, now housed in the Lawrence University Archives. Taken together, these documents reveal several factors that contributed to the Information Bank’s fate: the difficulty of developing tools that speak to current humanities research questions while also opening up new avenues of inquiry; the need to work within the incentive structures of academe, which are sometimes ill-suited to new forms of intellectual productivity; the tendency of scholar-programmers to develop custom software and datasets from scratch rather than consulting the past or the community for models and precedents; and the relentless pace of advancement in computing technology, which all but ensures that projects developed on the time scales of humanities scholarship will be obsolete by the time they are complete. This list is likely to strike a chord with digital humanities practitioners today grappling with many of the same issues.
The archive of the London Stage Project immediately reveals a central tension between the novelty of what computers could do with humanities information and the assumption that computers were self-evidently useful tools for pursuing existing humanities research questions. In Travels in Computerland, Schneider extolls the virtues of the Information Bank, boasting that it will enable researchers to see new kinds of patterns in theater history:

We can study trends (the rise of pantomime; the interest in Shakespeare; the rise and fall of theatres; the decline of the drama); we can look for patterns (In what ways is one season like another? What is a typical stage career like? To what extent do actors specialize? What is the effect of the repertory system?). There’s too much information about 18th century theatre; without computer help we can’t see the forest for the trees.  [Schneider 1974, 230]

Yet despite all the new questions a database of The London Stage could potentially raise and answer, Schneider’s Information Bank garnered little attention from eighteenth-century scholars. Five years after his optimistic assessment in Travels in Computerland, he wrote an essay for an edited collection on Data Bases in the Humanities and Social Sciences lamenting the lack of interest. As the article explains, the Information Bank was publicized widely to the research community as a search service; in the database’s first three years of availability to the public, it was advertised in seven newsletters sent to 1000 subscribers, as well as fourteen scholarly journals. In that time Schneider received 126 requests for information, including 34 requests for price quotes, but by 1979 not one researcher had followed through and paid for the results of his or her queries [Schneider 1980, 31–34].
While Schneider states bitterly and with only minimal irony that “the failure of the world to beat a path to my door is truly a mystery to me,” he does offer a provisional explanation of his colleagues’ indifference to the new tool:

Most of the research that goes on in theatre history today is precisely the kind of thing one can do just as well without the computer. At the point where a good computer printout of the repertoires of all the actors who played Shylock might reveal a great deal about the staging of The Merchant of Venice, it would never occur to a writer on that subject (or to his reader) that the question deserved further research in the form of a complete list of all the roles of all the actors of Shylock. The kind of thing you can do now by computer is not the kind of thing that anyone ever did, or felt the need to do.  [Schneider 1980, 34]

This assertion that the database can answer unprecedented, unintuitive questions for humanities researchers is somewhat at odds with Schneider’s earlier insistence that the database was of obvious relevance to current scholarship. Indeed, the questions about generic trends, acting careers, and theater finances that Schneider listed off in 1974 were relevant to theater historians in the mid-twentieth century, and they remain relevant today; the database altered the scope and speed of inquiry into these topics, but not necessarily the range of possible queries. Yet by 1980, Schneider had also identified a mismatch between the kinds of questions that interested scholars at the time and the kinds of questions a computer could answer. For Schneider, the fundamental issue was that his database was ahead of its time–and certainly, it was an uphill battle to ask humanities researchers to think quantitatively or at scale. Then as now, scholars of both literature and theater tended to focus a given study on a limited number of texts or performances, sometimes even a single work or writer. On the other hand, there is a way in which the Information Bank could be seen in the 1980s as behind the times. The potentially new possibilities for inquiry that the database offered were simultaneously aligned with old-fashioned forms of theater history characterized by the search for factual information about past performances and by the urge to count them. In the early decades of the “cultural turn,” applications of poststructuralist theories represented the leading edge of humanities research–and stood firmly against the arguably positivistic orientation of the Information Bank.
As this account suggests, the Information Bank struggled to define itself simultaneously as part of an existing intellectual tradition and as the next wave of scholarship, a balancing act that continues to challenge digital humanities practitioners. The archive furthermore indicates that Schneider and his team missed key opportunities to model the new kinds of inquiry that the Information Bank offered and to articulate these techniques’ role within ongoing conversations in the field. The climax of Travels in Computerland revolves around the last-minute rush to process queries for graduate student editors Leonard Leff and Muriel Friedman to use in their presentations at the MLA 1971 annual meeting in Chicago.[13] These results, originally expected in late summer, were delivered in mid-December, about two weeks before the conference. This lag occurred because the technical side of the operation was focused on error-correction while the research side wanted only good-enough data to produce sample results as a proof of concept for fellow scholars [Schneider 1974, 143, 202, 212–13]. The final narrative chapter of Travels ends with an unplanned trans-Atlantic flight and an eleventh-hour triumph over programmatic errors, leading to the successful production of the output needed for the MLA seminar; unfortunately, help came too late, as Schneider is forced to admit in the memoir’s postscript:

The reader may still be wondering how the seminar turned out at the Modern Language Association….Well, although Muriel brought her printout to the meeting, she did not treat it specifically, and Leonard, not having time to study his, left it at home and gave a theoretical discussion of the subject. Two months later I heard from a scholar who’d been there that he’d gotten the distinct impression that the project had fallen short of its goals.  [Schneider 1974, 244]

The team was unable to demonstrate the kinds of results that could be obtained from querying the new database and the ways that those results could shed new light on current scholarly problems–such as, in Leff’s case, the casting of Richard Brinsley Sheridan plays over time and the relationship of anti-Irish sentiment to the casting of controversial roles [Schneider 1974, 194–95]. Furthermore, this anecdote points to the misfit between the London Stage Project’s outputs and the incentive structures within which its personnel operated. The graduate student editors who dedicated their time to the project were not able to translate that work into concrete findings that could be used for their dissertations, an issue that the scholarly community continues to work through today as we debate the best ways to articulate the value of data curation and digital tool development for hiring, promotion, and tenure–particularly when that labor leads to productive failures rather than peer-reviewed products like articles and books.
In another parallel to the present day, Schneider found himself up against other academics’ impulse to reinvent the wheel. While offering Information Bank queries at cost to researchers, Schneider also offered his programs for purchase. Yet his 1980 essay indicates that there, too, interest fell short of expectations. He found that new humanities computing projects were unwilling to invest in prefabricated programs that could be lightly tailored to their purposes: “For some reason, almost everyone would rather write software from scratch than get it at a fraction of the cost ready-made, thoroughly tested and debugged.…It does not seem to occur to scholars embarked for the first time on computer projects that what they want to do has ever been done before”  [Schneider 1980, 33]. The unnecessary duplication of effort continues in the digital humanities community today, as granting agencies have historically tended to award projects that are building something new, rather than those that are drawing on or sustaining existing resources. This funding situation creates incentives to make new tools instead of adapting, maintaining, or updating ones that have already been made. The result is the disappearance of many projects that might otherwise form the foundation for subsequent work: Robin Camille Davis has found that nearly half of the projects presented at the 2005 Digital Humanities conference were no longer online a decade later [Davis 2015].
As the imperative to incentive sustainability and avoid duplication of effort has become more visible, sites like DHCommons and the Mellon-funded DiRT Directory have emerged to help people with similar interests find pre-existing tools as well as collaborators. The NEH’s Office of Digital Humanities has taken the important step of extending its Advancement Grants to projects “revitalizing and/or recovering existing digital projects,” rather than only incentivizing the creation of brand new ones. Likewise, the ACLS Digital Extension Grants are aimed at “enhancing established digital projects and extending their reach to new communities of users,” rather than providing startup funding for nascent projects. In addition, many digital humanities research centers are beginning to take a tiered approach, differentiating between researchers whose projects can be accomplished using out-of-the-box solutions and those who truly need to build their projects from the ground up. These signs point to a growing commitment to raise the survival and adoption rates of digital humanities projects, ending the cycle of reinvention and reduplication.
In his efforts to market his tools, Schneider not only discovered that his humanities computing colleagues preferred to build their own software from the ground up; he also learned that his products were considered obsolete, making them increasingly difficult to market to those who might adopt or adapt them to other purposes. In response to this growing threat, Schneider was defiant. The Lawrence University archive houses a poignant 1976 letter in which Schneider responds to a potential funder’s concerns about his technology being out of date:

Our system cannot become obsolete from advances in hardware, because our programs are written in BASIC and PL/1, assiduously supported by Digital Equipment Corporation and IBM, the two leading computer manufacturers: there is no chance that either will build computers that are incompatible with these programming languages, or that Lawrence would be so foolhardy as to buy computers incompatible with ten years programming work.[14]

Such a statement may sound hubristic, but four decades later, digital humanists still tend to underestimate the speed with which our projects will become outdated. Digital preservation specialists continue to warn humanities researchers and digital content producers that the apparently durable file formats and access mechanisms of the present day are less stable than they may appear [Conway 2010] [Library of Congress 2013]. Schneider’s story stands as a warning: if we wish for future projects to be able to build on our tools and datasets without having to start from scratch, then we must ensure that our outputs are designed to be accessible for years and decades to come.

# Recovery Efforts and Directions Forward

While it is true that much of Schneider’s work was lost, in recent years I have unearthed the project’s partial remains and, with the help of many collaborators, begun recovering their functionality. Erin Dix, Archivist at Lawrence University, helped me retrieve not only the paper files from the project, but a set of 3.5" floppy discs labeled “LSP_data” containing plain-text ASCII files and metadata suggesting they were written in 1990.[15] The data appears to represent a large majority of the performances from the original reference book, although some gaps have been identified; notably, several seasons from the 1730s and the 1780s are missing. It is unclear whether the data represents the edits made over the years by research assistants, or whether it represents the raw data as it arrived from Information Control Incorporated on magnetic tapes; it may also represent some intermediate stage.[16] What is clear, however, is that it has not been run through the programs that parsed the data for querying and expanded the cast lists; equally clear is that it has been converted from EBCDIC to ASCII, with some unintended consequences. The underlying hexadecimal code has been shifted such that most of the performance dates are represented as special characters rather than numerals.[17] Derek Miller has developed a script that corrects the hex; it is important to note, however, that the data itself cannot be forensically reverted, so programmatically corrected versions will always represent an approximation of the original.[18]
The Lawrence University Archives preserved this data as well as the grant applications, correspondence, and press from the London Stage Project referenced throughout this essay; they did not, however, preserve Daland’s code base, perhaps because it was thought to be archived at Harvard.[19] Fortunately, Daland kept printouts of the programs and their documentation in his personal files, and he scanned and emailed these papers to me as a combination of PDF and TIF files. Although we experimented extensively with scan settings and image processing techniques to optimize the images for OCR, we ultimately found them to be resistant to character recognition and resorted to hand-transcription of sample sections of the code. Working with a mainframe computer at the University of Wisconsin-Madison, we tried to compile and run a hand-transcribed and -corrected version of STRUCTUR, the main parsing program. After exhaustive efforts to reconstruct and compile the programs in their original form, we determined that doing so would be impractical; it would force us to reproduce and deal with numerous constraints around memory, character sets, and encoding scheme conversion that need not be factors in a modern computing environment. As a result, I am currently collaborating with Todd Hugie (Director of Library Information Technology, Utah State University) to re-engineer the code base in Python, creating a new parser for the recovered flat-file data based on the principles represented in Daland’s code. From there, we plan to transform the data into XML and JSON formats for preservation and sharing, then import the data into a relational database such as MySQL or MariaDB.[20]
Our work to restore the London Stage Information Bank responds to several needs in the scholarly community, including a fundamental need for a database of the performance records in The London Stage. The reference books remain one of the most frequently consulted resources in eighteenth-century studies, and while they are available in searchable form through HathiTrust, keyword search remains an unsatisfactory method of querying the rich, relational information contained in those pages. Scholars continue to perform hand-counts of performances of interest, as Elaine McGirr does in her monograph on playwright, actor, and theater manager Colley Cibber [McGirr 2016], and as the Cambridge Ben Jonson project did in constructing their searchable “Performance Archive” of stagings of Jonson plays from the seventeenth century to the present.[21] In 2016, publisher Adam Matthew released a new primary source collection, Eighteenth Century Drama: Censorship, Society and the Stage, which is based on the Huntington Library’s Larpent Collection and includes a searchable database based on The London Stage. This release generated significant interest in the eighteenth-century and theater studies communities, but the limitations quickly became apparent: the collection carries a high subscription fee and only institutions, not individual scholars, are permitted to subscribe, making the collection inaccessible to all but members of the wealthiest institutions. Furthermore, the database itself is designed to permit queries only along specific parameters such as title or date, rather than any kind of exploratory statistical analysis of the full dataset. For these reasons, an open-access, open-source database remains desirable to many researchers in these fields.
As I and my collaborators undertake to re-engineer and revitalize the London Stage Information Bank, we are mindful of the obvious lessons in preservation and sustainability offered by the history sketched out above; any new version of the database will need to adhere to current best practices for sustainability, developed in consultation with librarians and archivists who have expertise in this area. However, we are also aware that one of the best ways to ensure preservation is to attract and maintain a large and engaged user base. In order to avoid the disconnect explored above between the original tool’s affordances and the dispositions and concerns of its target users, the new London Stage database would need to be adapted in several key ways to meet the needs of today’s humanities scholars: it would need to acknowledge its genealogy by building on the past iterations; it would need to accommodate and even highlight the ambiguity and messiness of the data; and it would need to contribute to current efforts to develop data ontologies that make sense for theater studies. The rest of this essay considers how these features would enable the database to speak to current questions and debates in humanities scholarship.
In the case of the London Stage Information Bank, we can already begin to suggest the orientation towards humanities data that it represents, based on the query results published as the Index to the London Stage. Domain-specialist users of the Index have long recognized in its entries an orientation towards historical data that is ill suited to the ambiguity surrounding much of this material. For instance, a recent case study of eighteenth-century London playbills shows that the many theatrical adaptations of Aphra Behn’s Oroonoko, which often go by the same name in The London Stage, actually represent a variety of responses to and interpretations of the Oroonoko legend [Vareschi and Burkert 2016]. This kind of uncertainty in the historical record, a reminder of our mediated access to the past, is not a concern built into the architecture of the Information Bank. Instead, Schneider’s database takes an equivalent string of characters (play title, personal name, theater location, etc.) to signify a self-same historical entity.
The Information Bank also inherited many of the problems built into its source material–such as, for instance, the lack of reliable premiere dates for plays that debuted before the leading theaters began running daily newspaper advertisements around 1705. Prior to that time, most performance dates in The London Stage are largely conjectures based on publication dates and references in published editions of plays [Milhous and Hume 1974]. This kind of guessing results in entries like one for a presumed February 1697 performance of Timoleon: “It is not certain what company produced this play, if it was acted; and it may not have been staged”  [Avery et al. 1960–1968, 473]. Schneider’s database has no way of representing this kind of ambiguity, which is central to how scholars today approach theater history and culture; McGirr, for example, devotes several pages of her book on Cibber to a discussion of the limitations of the available data about performances of his plays as represented in The London Stage [McGirr 2016, 10–13]. The Information Bank’s approach is therefore out of step with today’s theories about quantitative inquiry in the humanities, which aim to acknowledge the provenance, limitations, and fuzziness of the data. While 1970s-era computing may have required researchers to sacrifice complexity in order to conserve memory and storage resources, today we have the capacity to encode more information about the provenance and limitations of data, and many digital humanities projects are actively invested in finding ways to register and visualize uncertainty.
The acknowledgment of ambiguity in the data takes on a particular importance for theater studies, which, as Caplan notes, takes as its objects “incomplete records of performance events rather than the live event itself.” For Caplan, databases of theatrical records “tackle a recurring and significant challenge in our field–the ephemerality of our medium and the dispersal of theatrical ephemera that may shed light on a performance event”  [Caplan 2015, 356–57]. One prominent example is AusStage, an ambitious database of “programs, ticket stubs, newspaper clippings and so on” that seeks to document all dramatic performances across Australia from 2001 to the present, as well as some additional historical performance events.[24] As Miller points out, projects like AusStage “force us to define performance’s ontology,” articulating foundational premises such as “the relationship between performances and works, sex and gender identities, and how our contemporary vocabulary for performance translates that of previous eras”  [Miller forthcoming]. Building on his experience as a research coordinator for AusStage, Jonathan Bollen has recently published an essay surveying the convergences and divergences of data models for theater history; the comparison raises critical questions about such fundamental issues as the basic unit of inquiry (A single evening’s performance? An event lasting several evenings but having a common title and cast? A production that might span multiple locations and years or even decades of runs?). As more theater databases are developed, each must decide whether to articulate its own unique ontology or to strive for interoperability with other databases by adopting shared structures and assumptions about the objects of study [Bollen 2016]. Our revitalization of the London Stage Information Bank would likewise need to engage in such critical self-reflection, deciding where to maintain the existing data model and where to refine it, as a contributor to ongoing efforts to develop data models tailored to theater studies.
Ultimately, eighteenth-century literary, theatrical, and historical research would be best served by a new London Stage database that finds ways to foreground the situated, captured nature of the data and the layered history of its transmissions and remediations; this is the direction we hope to take the project in the future. At the same time, however, it is important to realize that even our recognition of these gaps in the data is mediated by the ways scholars have accessed and thought about these archival records in the past. As Johanna Drucker and others have reminded us, the history of the data’s collection and transmission is an intrinsic part of the object of study; when we query a database of theater records, we actually query a database of composite objects that encode the history of how performance has been seen, recorded, preserved, cataloged and studied.[25] If we aim to acknowledge the messiness that lies behind the apparent solidity of digital or numerical representation, a necessary step is to make visible the projects that have helped to reveal or occlude that messiness over the years, and to unpack the ways they worked to do so. Only then can we design, make, and analyze in ways that resist reproducing unchecked assumptions that went into the collection and curation of existing cultural archives and their past digital remediations.
The reengineered Information Bank will not only embody a cautionary tale about digital preservation and provide a revitalized, open-access resource for exploring data about eighteenth-century theatrical performances; just as importantly, it will model and enable a critical approach to the architecture of that data. In doing so, it will align with the interests and research dispositions of today’s humanities scholars, who often seek to harness the possibilities of quantitative methods while maintaining a critical stance towards the digital condition. In reflecting these current concerns, the new resource will aim to reach a wider and more invested audience than Schneider’s Information Bank was able to. Like an eighteenth-century adaptation of an Elizabethan play recorded in the pages of The London Stage, this database project performs an act of recovery and revival that is simultaneously an act of re-imagination–one that pushes against the temptation to view the current moment as new and unprecedented, and instead invites its audience to a more layered understanding of its place in larger cultural and historical processes.

# Acknowledgments

I wish to thank Mark Vareschi for planting the seed of this project by pointing me to Schneider and Daland’s 1971 Computers and the Humanities essay; Erin Dix at the Lawrence University Archives for her assistance locating and navigating the records of the London Stage Information Bank; members of Schneider’s original team, including Cindy Serikaku and Nick Schneider, for helping to reconstruct the history of the project, and especially Will Daland for his involvement with the recovery effort and his careful review of the accuracy of my findings; Derek Miller for writing a script to repair underlying hexadecimal errors in the recovered data; Todd Hugie for his ongoing effort to reengineer Will’s code base; Brad Pasanek for giving me the opportunity to share this work in its early stages at the 2015 MLA annual meeting; Brianna Marshall and the Research Data Services community at the University of Wisconsin-Madison, as well as Bronwen Maseman and her graduate students, for their feedback on this work; Susan Barribeau and Betty Rozum for connecting me with the right people to keep the project going at key moments; Dorothea Salo, Cal Lee, Kam Woods, and Carl Stahmer for patiently answering my media forensics questions; Steven Dast for his help with image processing for OCR; Jack Keel for his assistance compiling and debugging PL/1 code; and Steel Wagstaff, Irene Zimmerman, Angela Moore-Swafford, and Angelina Zaytsev for their help making The London Stage available to the public through HathiTrust. This work has been partially supported by funding from the Department of English at Utah State University.

## Notes

[1]  Theatre Journal has recognized and begun attempting to bridge this divide. In 2016, the journal published two special issues on “Digital ‘Issues’: Rethinking Media in/and/as Performance” (68.3, Sept. 2016) and “Theatre, the Digital and the Analysis and Documentation of Performance” (68.4, Dec. 2016).
[2]  Against the view of software as immaterial, Kirschenbaum emphasizes the “material circumstances that leave material (read: forensic) traces–in corporate archives, on whiteboards and legal pads, in countless iterations of alpha versions and beta versions and patches and upgrades” [Kirschenbaum 2008, 15]. These are precisely the kinds of forensic traces I bring to bear on my study of the London Stage Information Bank and my efforts to recover its layers of history and inscription.
[3]  Jonathan Bollen describes vividly the persistence of data models even as the data is transformed for use in other projects: “datasets have the fluidity of plastic; they flow when prodded, pushed, or pressed, when the heat is turned up and when placed under stress. In the vocabulary of project management, datasets are dumped, scanned, stretched, squeezed, shoved, massaged, cleansed, and washed–in plainer terms, refactored and reformatted–and when all else fails, rekeyed…Inevitably, to migrate data is to manipulate: exports reveal imperfections, imports throw exceptions. And in the process of manipulation, one feels the resistance of the data model. Like a habit to be retrained or plastic’s retention of its form, a dataset retains the memory of its model even as it is transformed”  [Bollen 2016, 619]. Bollen draws here on his experience coordinating research for AusStage (see Note 24), an ambitious theatrical database that absorbed data from several precursors developed in the 1970s and 1980s.
[4]  In collaboration with Southern Illinois University Press and HathiTrust, I was able to negotiate the release of Google’s scans of The London Stage reference books to the public. You can now search, view, and download all eleven books at https://catalog.hathitrust.org/Record/000200105, as well as the Index at https://catalog.hathitrust.org/Record/000299859. See Burkert 2014 for more information.
[5]  The compilers and editors of the original reference series who served as the Advisory Board of the London Stage Information Bank were William Van Lennep, Emmett L. Avery, Arthur H. Scouten, George Winchester Stone, Jr., and Charles Beecher Hogan. Additional Advisory Board Members included Allardyce Nicoll, Sybil Rosenfeld, Cecil Price, Philip Highfill, Kalman Burnim, Carl Stratman, John Robinson, and William Armstrong.
[6]  The graduate student markup editors were Leonard Leff, Marcia Heinemann, Muriel Friedman, and Mark Auburn. Additional non-specialist markup editors included Devon Schneider, Ben Schneider III, and Dorothy Church.
[7]  This workflow was reviewed for accuracy by Will Daland in 2015.
[8]  The so-called “ladder system” of abbreviating casts lists was developed by the editors to conserve ink and paper, but it remains a perpetual frustration to users of the reference books. An illustrative example is offered by an entry for Tuesday, June 25, 1708. The Constant Couple was performed at the Theatre Royal – Drury Lane with a cast “As at Queen’s, 20 Oct. 1707, but Clincher Sr – Pinkethman; Lady Lurewell – Mrs Knight; Parly – Mrs Moor” [Avery et al. 1960–1968, 172]. The researcher turns back 17 pages to the entry for that date and finds a fuller cast list: “Sir Harry – Wilks; Col. Standard – Mills; Smugler – Johnson; Vizard – Husband; Clincher Sr – Bowen; Clincher Jr – Bullock; Dicky – Norris; Lady Lurewell – Mrs Oldfield; Lady Darling – Mrs Powell; Angelica – Mrs Bradshaw”. Combining the information from these two entries, it is possible to determine that the cast list for the June 25 performance was as follows: Sir Harry – Wilks; Col. Standard – Mills; Smugler – Johnson; Vizard – Husband; Clincher Sr – Pinkethman; Clincher Jr – Bullock; Dicky – Norris; Lady Lurewell – Mrs Knight; Lady Darling – Mrs Powell; Angelica – Mrs Bradshaw; Parly – Mrs Moor. This illustration involves only two entries for the sake of simplicity. However, in many instances, a researcher must follow the trail back through numerous dates and many actor-role substitutions in order to determine the cast list for a performance of interest.
[9]  “Final report on Phase 2 (1972-1975),” retrieved from Lawrence University Archives. The sixteen editors whose names I have been able to gather are Catherine Boggs, Catherine Steiner, Marc Weinberger, Joseph Jacobs, Ruth Steiner, Connie Hansen, Sarah Larsen, Laurie Johnson, Sue Kock, Peter Pretkel, Lynn Seifert, Louise Freiberg, Elizabeth O’Brien, Jan Surkamp, Mark Burrows, and Kathy Rosner. Accounts of the project, including Schneider’s books as well as archival documents, vary as to whether the total number of student editors was seventeen or eighteen.
[11]  Private correspondence with curator Susan Pyzynski in May 2014 indicated that the tapes may have been lent to a faculty member at Harvard; according to a February 2015 correspondence between Derek Miller and curator Micah Hoggatt, the tapes were lent to the IT department.
[12]  The full list of funders includes the National Endowment for the Humanities, the American Council of Learned Societies, the American Philosophical Society, the Andrew Mellon Foundation, the United States Steel Foundation, the Billy Rose Foundation, Lawrence University, and individual gifts from Mrs. John A. Logan, Charles Beecher Hogan, Faith Bradford, Dr. and Mrs. J. Merrill Knapp Jr., and an anonymous Friend of Lawrence University.
[13]  The session in question is listed in the MLA 1971 program as Seminar 28, “The Future and Expansion of The London Stage 1660-1800: Computerized Information Bank,” led by George Winchester Stone, Jr. on Tuesday, December 28. PMLA 86.6 (Nov. 1971): 1131.
[15]  No documentation of the specific provenance of these discs accompanied them, nor did a search through the IT department or Archive logs turn up evidence of the chain of transmission. It is unclear when or how the data was migrated forward from magnetic tapes to floppy.
[16]  Daland explained that the student editors on the project would have corrected the raw source data, after it had been run through the correcting program (ICISCAN) but before it had been run through the parsing program (STRUCTR) and the cast-list-expansion program (LADDR). He compared this to correcting the master of a tape, rather than the copies. For that reason, it remains entirely possible that the data files recovered from the floppy disks at Lawrence contain the data that continued to be corrected throughout the 1970s and was queried to produce the Index to the London Stage[Schneider 1979]. (Phone correspondence with Daland, April 2015).
[17]  Forensic investigation of disk images of the floppies (using BitCurator) was inconclusive but suggested the hexadecimal errors were not introduced through the process of accessing or copying the data from the floppy disks. I consulted with Carl Stahmer for an outside opinion, and he concurred that the damage most likely happened during the EBCDIC to ASCII conversion that produced the data on the disks, rather than during the process of accessing the disks at a later date (correspondence with Carl Stahmer, April 2016).
[18]  Correspondence with Derek Miller, July 2015.
[19]  They did, however, keep copies of SITAR and SORTSIT, programs developed after Daland left the project that enabled the data to be continuously updated and sorted (correspondence with Erin Dix, March 2016).
[20]  All of the data and programs recovered from Lawrence, along with their documentation, have been deposited with permission in MINDS@UW, the secure supported repository of the University of Wisconsin-Madison. In addition, the scans Daland produced of the printed code base were deposited in the same location, along with all relevant documentation. The collection can be accessed at https://minds.wisconsin.edu/handle/1793/71768. This represents an attempt to preserve the project’s outputs against the kind of total loss that they nearly suffered once before.
[21]  The Performance Archive, based in part on hand counts from The London Stage, is available at http://universitypublishingonline.org/cambridge/benjonson/reference/k/browse/performance. It is part of the interactive online companion to The Cambridge Edition of the Works of Ben Jonson Online (2015).
[22]  For evidence of the growing understanding of the need to understand the formative moments of digital humanities, when many projects were begun that exert an invisible but powerful influence on digital humanities work today, see the recent special issue of Digital Humanities Quarterly on “Hidden Histories: Computing and the Humanities c. 1965-1985.” The editors of that issue, Julianne Nyhan and Andrew Flinn, recently published a related book titled Computation and the Humanities: Towards an Oral History of Digital Humanities [Nyhan and Flinn 2016].
[23]  For a case study of the dangers of failing to recognize these biases in the process leading to a dataset’s creation, see Pechenick, Danforth, and Dodds on the limitations of the Google Books Corpus [Pechenick et al. 2015].
[25]  Here I refer to Johanna Drucker’s influential characterization of humanities data as “capta,” a term that “acknowledges the situated, partial, and constitutive character of knowledge production, the recognition that knowledge is constructed, taken, not simply given as a natural representation of a pre-existing fact”  [Drucker 2011]. Christof Schöch rejects the idea that a new term is needed, but aligns with Drucker in his definition of data within humanities inquiry: “a digital, selectively constructed, machine-actionable abstraction representing some aspects of a given object of humanistic inquiry.” This definition importantly draws attention to the added “layer of mediation” created by transforming cultural artifacts into discrete units of digital information [Schöch 2013]. Along similar lines, see Gitelman and Jackson [Gitelman and Jackson 2013].

