The systematic study and analysis of literature dates back to the beginnings of literary "text production"; even the earliest forms of oral literature were practiced in a context of descriptive and prescriptive aesthetics. With the rise of written literature emerged a canon of rules that could be applied to text in order to evaluate its adherence to poetic norms and values, and very soon quantitative and qualitative methods of text analysis were applied in textual exegesis. But the analysis of literature is traditionally seen as a subjective procedure. Objectivity, based on empirical evidence, does not seem to figure prominently in studies that elucidate meaning from literary texts. In most studies, however, some kind of exemplary textual sampling does take place, and scholars occasionally arrive at value judgments that are based on the observation of frequent occurrences or the absence of certain textual features. The exact number of occurrences and/or their distribution in long texts is difficult to establish, because literary texts, in particular novels, make a thorough analysis of every single word or sentence almost impossible. Empirical evidence that is truly representative for the whole text is extremely difficult to come by, and mainstream literary scholarship has come to accept this limitation as a given fact:
A simultaneous possession by the reader of all the words and images of Middlemarch), À la recherche du temps perdu, or Ulysses may be posited as an ideal, but such an ideal manifestly cannot be realized. It is impossible to hold so many details in the mind at once.(Miller 1968: 23)
The first computer-assisted studies of literature of the 1960s and 1970s used the potential of electronic media for precisely these purposes – the identification of strings and patterns in electronic texts. Word lists and concordances, initially printed as books but later made available in electronic format, too, helped scholars come to terms with all occurrences of observable textual features. The "many details", the complete sets of textual data of some few works of literature, suddenly became available to every scholar. It was no longer acceptable, as John Burrows pointed out, to ignore the potential of electronic media and to continue with textual criticism based on small sets of examples only, as was common usage in traditional literary criticism: "It is a truth not generally acknowledged that, in most discussions of works of English fiction, we proceed as if a third, two-fifths, a half of our material were not really there" (Burrows 1987: 1).
Literary computing was seen to remedy this shortcoming, and it has provided substantial insights into some questions of style and literary theory. Most studies on patterns and themes that were published in the last twenty years question concepts of text and method, and by investigating literature with the help of a powerful tool these studies situate themselves in a context of meta-discourse: the question of "method" remains at the heart of most electronic analysis of literature.
Seen as a mere tool without any inherent analytical power of its own, the computer in literary studies enhances the critic's powers of memory electronically, thereby providing a complete database of findings that meet all predefined patterns or search criteria. As error-prone manual sampling becomes obsolete, textual analysis as well as the ensuing interpretation of a text as a whole can be based on a complete survey of all passages that promise results, no matter how long the text is. Comparative approaches spanning large literary corpora have become possible, and the proliferation of primary texts in electronic form has contributed significantly to the corpus of available digital texts. In order to be successful, literary computing needs to use techniques and procedures commonly associated with the natural sciences and fuse them with humanities research, thereby bringing into contact the Two Cultures: "What we need is a principal use of technology and criticism to form a new kind of literary study absolutely comfortable with scientific methods yet completely suffused with the values of the humanities" (Potter 1989: xxix).
The history of literary computing, however, shows that only a limited number of textual phenomena can be analyzed profitably in the context of quantitative and qualitative computer-based analyses of style. These phenomena have to be linked to some surface features that can be identified by electronic means, usually by some form of pattern matching. Computers are exceptionally well suited for this kind of analysis, and only human intuition and insight, in combination with the raw computing power of machines programmed to act as highly specialized electronic tools, can make some texts or textual problems accessible to scholars. As Susan Hockey writes:
In the most useful studies, researchers have used the computer to find features of interest and then examined these instances individually, discarding those that are not relevant, and perhaps refining the search terms in order to find more instances. They have also situated their project within the broader sphere of criticism on their author or text, and reflected critically on the methodology used to interpret the results.(Hockey 2000: 84)
The methodological implications of such approaches to literary texts accommodate computer-based and computer-assisted studies within the theoretical framework of literary-linguistic stylistics. In this context, texts are seen as aesthetic constructs that achieve a certain effect (on the reader) by stylistic features on the surface structure of the literary text. These features are sometimes minute details that the reader does not normally recognize individually but that nevertheless influence the overall impression of the text. The presence or absence of such features can only be traced efficiently by electronic means, and while the reader may be left with a feeling of having been manipulated by the text without really knowing how, the computer can work out distribution patterns that may help understand how a particular effect is achieved. "[U]nexpectedly high or low frequencies or occurrences of a feature or some atypical tendency of co-occurrence are, in their very unexpectedness or atypicality, noteworthy", Michael Toolan maintains, but then continues that "[elaborate statistical computations are unlikely to be illuminating in these matters of subtle textual effect" (1990: 71). This view, frequently expounded by scholars who see literary computing more critically, points at one of the central shortcomings of the discipline: in order to be acknowledged by mainstream criticism, computer-based literary studies need to clarify that the computer is a tool used for a specific result in the initial phases of literary analysis. No final result, let alone an "interpretation" of a text, can be obtained by computing power alone; human interpretation is indispensable to arrive at meaningful results. And in particular the aim of the investigation needs to be clarified; every "computation into criticism", to use Burrows's term, has to provide results that transcend the narrow confines of stylo-statistical exercises.
As for studies of punctuation, sentence length, word length, vocabulary distribution curves, etc., the numbers have been crunched for about twenty years now. It is clearly established that the distribution of such features is not random, or normal in the statistical sense. The extent of such variance from the models has been measured with great precision. But since no one ever claimed that a literary text was a random phenomenon, or a statistically normal distribution, it is difficult to see the point of the exercise.(Fortier 1991: 193)
Statistics, in conjunction with quantifiable data and a (supposedly) positivistic attitude toward textual phenomena, have contributed to the image of computer-based literary analysis as a "difficult" or "marginal" pursuit. And in the context of a shift away from close reading toward a more theory-oriented approach to literary texts, new models of textuality seemed to suggest that literary computing was occupied with fixed meanings that could be elucidated by counting words and phrases. This image was further enhanced by references to this procedure in literature itself, as David Lodge shows in his novel Small World:
"What's the use? Let's show him, Josh." And he passed the canister to the other guy, who takes out a spool of tape and fits it on to one of the machines. "Come over here", says Dempsey, and sits me down in front of a kind of typewriter with a TV screen attached. "With that tape", he said, "we can request the computer to supply us with any information we like about your ideolect." "Come again?" I said. "Your own special, distinctive, unique way of using the English language. What's your favourite word?" "My favourite word. I don't have one." "Oh yes you do!" he said. "The word you use most frequently."
The simplistic view of computer-based studies as "counting words" has been a major factor for later studies that were seen in this light. Contrary to received opinion, studies of literature that use electronic means are mostly concerned with questions of theory and method. Especially the notion of what constitutes a "text" and how, therefore, a given theory of text influences the procedures of analysis and interpretation, form the basis of every literary analysis.
A literary text, interpreted as an aesthetic construct that achieves a certain effect through the distribution of words and images, works on various levels. Without highly elaborate thematic – and therefore by definition interpretative – markup, only surface features of texts can be analyzed. These surface features are read as significant in that they influence the reader's understanding and interpretation of the text. This has a number of theoretical implications: if a literary text carries meaning that can be detected by a method of close reading, then computer-assisted studies have to be seen as a practical extension of the theories of text that assume that "a" meaning, trapped in certain words and images and only waiting to be elicited by the informed reader, exists in literature. By focusing primarily on empirical textual data, computer studies of literature tend to treat text in a way that some literary critics see as a reapplication of dated theoretical models:
One might argue that the computer is simply amplifying the critic's powers of perception and recall in concert with conventional perspectives. This is true, and some applications of the concept can be viewed as a lateral extension of Formalism, New Criticism, Structuralism, and so forth.(Smith 1989: 14)
Most studies, both quantitative and qualitative, published in the context of literary humanities computing after powerful desktop computers became available, tend to prioritize empirical data, either in the form of automatically extracted stylistic features, or as encoded thematic units that are then quantified, mapped, and interpreted.
Most suitable for this kind of literary analysis are studies of repeated structures in texts. These are usually characters, syllables, words, or phrases that reappear throughout a text or a collection of texts. These repetitions are frequently recognized by readers as structural devices that help segment a text, or link passages in texts. Chapters, characters, locations, thematic units, etc., may thus be connected, parallels can be established, and a systematic study of textual properties, such as echoes, contributes substantially to the understanding of the intricate setup of (literary) texts. This type of analysis is closely linked to theoretical models of intertextuality used in non-computer-based literary studies, and here the impact of electronic procedures is felt most acutely. Repetitions and echoes can be traced throughout a text in a consistent fashion; it takes, however, a sound theoretical model that allows one, first to identify, and then to isolate common formal properties of these textual units. The criterion of reliability and verifiability of results and findings is all-important in studies of repeated structures, and maps of distribution and significant presences and absences of textual features are used as the basis for a more detailed analysis. In this area computer-assisted approaches substantially contributed to the understanding of literary texts, and electronic studies of literary texts provided empirical evidence for the analysis of a broad range of intertextual features.
The methodological problems connected with this kind of approach feature prominently in nearly all electronic studies of literary texts: how does traditional criticism deal with formal properties of text, and where do electronic studies deviate from and/or enhance established techniques. Most computer-assisted studies of literature published in the last twenty years examine their own theoretical position and the impact of formal(ized) procedures on literary studies very critically. Nearly all come to the conclusion that rigorous procedures of textual analysis are greatly enhanced by electronic means, and that the basis for scholarly work with literary texts in areas that can be formalized is best provided by studies that compile textual evidence on an empirical basis.
The concept of rigorous testing, ideally unbiased by personal preferences or interpretation by the critic, relies on the assumption that textual properties can be identified and isolated by automatic means. If automatic procedures cannot be applied, stringent procedures for the preparation of texts have to be designed. It has been, and still is, one of the particular strengths of most electronic studies of literature that the criteria used in the process of analysis are situated in a theoretical model of textuality that is based on a critical examination of the role of the critic and the specific properties of the text.
These textual properties often need to be set off against the rest of the text, and here markup as the most obvious form of "external intervention" plays a leading role. The importance of markup for literary studies of electronic texts cannot be overestimated, because the ambiguity of meaning in literature requires at least some interpretative process by the critic even prior to the analysis proper. Words as discrete strings of characters, sentences, lines, and paragraphs serve as "natural" but by no means value-free textual segments. Any other instance of disambiguation in the form of thematic markup is a direct result of a critic's reading of a text, which by definition influences the course of the analysis. As many computer-based studies have shown, laying open one's criteria for encoding certain textual features is of prime importance to any procedure that aspires to produce quantifiable results. The empirical nature of the data extracted from the electronic text and then submitted to further analysis allows for a far more detailed interpretation that is indeed based on procedures of close reading, and this "new critical analyse de texte, as well as the more recent concepts of inter-textuality or Riffaterrian micro-contexts can lead to defensible interpretations only with the addition of the rigour and precision provided by computer analysis" (Fortier 1991: 194).
As the majority of literary critics still seem reluctant to embrace electronic media as a means of scholarly analysis, literary computing has, right from the very beginning, never really made an impact on mainstream scholarship. Electronic scholarly editions, on the contrary, are readily accepted in the academic community and they are rightly seen as indispensable tools for both teaching and research. But even the proliferation of electronic texts, some available with highly elaborate markup, did not lead to an increasing number of computer-based studies.
This can no longer be attributed to the lack of user-friendly, sophisticated software specifically designed for the analysis of literature. If early versions of TACT, Word-Cruncher, OCP, or TuStep required considerable computing expertise, modern versions of these software tools allow for easy-to-use routines that literary scholars without previous exposure to humanities computing can master. In addition, dedicated scholarly software has become very flexible and allows the user to dictate the terms of analysis, rather than superimpose certain routines (word lists, concordances, limited pattern matching) that would prejudice the analysis.
Early computer-based studies suffered greatly from hardware and software constraints, and as a result software tools were developed that addressed the specific requirements of scholarly computing. Although these tools proved remarkably effective and efficient given that the hardware available for humanities computing was rather slow and basic, it still took considerable expertise to prepare electronic texts and convert them into machine-readable form. As no standardized form of encoding existed until the Text Encoding Initiative (TEI) was formed, most scholars adopted some system of markup that reflected their particular research needs. Initially, these systems were non-standardized, but later the majority of studies used COCOA tags for markup, but these systems for the scholarly encoding of literary texts needed to be adjusted to the specific software requirements of the programs used for the analysis. Accessing the results of computer-assisted studies in the form of printouts was equally cumbersome, and any statistical evaluation that extended the range of predefined options of standard software would have to be designed specifically for every individual application. Visualization, the plotting of graphs, or the formatting of tables required considerable expertise and expensive equipment and was thus mostly unavailable to one-person projects.
In the light of these technical difficulties it seemed that once hardware limitations no longer existed and the computing infrastructure was up to the demands of scholarly computing, the electronic analysis of literature would become a major field of research. Methodological problems addressed in studies that wanted to but could not, for technical reasons, attempt more demanding tasks that required large sets of data, access to a multitude of different texts and enough computing power to scan long texts for strings, for example, seemed a direct result of technical limitations.
But the three basic requirements, seen as imperative for eventually putting literary computing on the map of mainstream scholarship, have been met since the early 1960s and 1970s:
• virtually unlimited access to high-quality electronic texts;
• sophisticated software that lets the user define the terms of analysis rather than vice versa;
• powerful computing equipment that supplies unlimited computing power and storage capacity.
Despite impressive advances in both hardware and software development, and although electronic texts with markup based on the TEI guidelines have become available on the net, literary computing still remains a marginal pursuit. Scholarly results are presented at international conferences organized by the Association for Literary and Linguistic Computing (ALLC) and the Association for Computers and the Humanities (ACH) that are designed to inform humanists with a background in the discipline. The results are published in scholarly journals (L&LC, Literary and Linguistic Computing;, and CHum, Computers and the Humanities) but rarely make an impact on mainstream scholarship. This dilemma has been commented on repeatedly: Thomas Corns, Rosanne Potter, Mark Olsen, and Paul Fortier show that even the most sophisticated electronic studies of canonical works of literature failed to be seen as contributions to the discourse of literary theory and method. Computer-based literary criticism has not "escaped from the ghetto of specialist periodicals to the mainstream of literary periodicals", Corns writes, and continues that the "tables and graphs and scattergrams and word lists that are so characteristic of computer-based investigation are entirely absent from mainstream periodicals" (Corns 1991: 127).
One reason for this, apart from a general aversion to all things electronic in traditional literary criticism, is described by Jerome McGann as the notion of relevance, because
the general field of humanities education and scholarship will not take the use of digital technology seriously until one demonstrates how its tools improve the ways we explore and explain aesthetic works – until, that is, they expand our interpretational procedures. (McGann 2001: xii)
It is important that computer-assisted studies position themselves in the field of recent scholarship, take up the theoretical issues of text and textuality, and convey to the field of non-experts that the results merit closer inspection. Computers are not used for the sake of using new tools, but computers can supplement the critic's work with information that would normally be unavailable to a human reader. Speed, accuracy, unlimited memory, and the instantaneous access to virtually all textual features constitute the strength of the electronic tool. By tapping into the ever-growing pool of knowledge bases and by linking texts in ways that allow them to be used as huge repositories of textual material to draw on, traditional literary criticism can profit substantially from the knowledge and expertise accumulated in the search for a more rigorous analysis of literature as practiced in computer-based studies.
By looking at the history of literary computing, however, one cannot fail to see that most contributions add significant insight in a very narrow spectrum of literary analysis – in the area of stylistic studies that focus on textual features. The input of computing in these studies is limited to the preparation and preparatory analysis of the material under consideration. No immediate result, of course, can be obtained by the computer, but data are collected that allow for and require further analysis and interpretation by the researcher. The results, however, are impressive. Numerous studies of individual, and collections of, texts show that empirical evidence can be used productively for literary analysis. The history of literary computing shows that the field itself is changing. Stylo-statistical studies of isolated textual phenomena have become more common, even if the computing aspect does not always figure prominently. More and more scholars use electronic texts and techniques designed for computing purposes, but the resulting studies are embedded in the respective areas of traditional research. The methods, tools, and techniques have thus begun to influence literary criticism indirectly.
Right from the very beginning, humanities computing has always maintained its multi-dimensional character as far as literary genre, socio-cultural context and historic-geographical provenance of literary texts is concerned. Studies have focused on poetry, drama, and narrative from antiquity to the present day. Although an emphasis on literature in English can be observed, texts in other languages have also been analyzed. The variety of approaches used to come to terms with heterogeneous textual objects, the multitude of theoretical backgrounds and models of literature brought to bear on studies that share as a common denominator neither one single technique nor one "school of thought", but the application of a common tool, are the strong points of studies of literature carried out with the help of the computer. Discussions of literary theory, textuality, and the interdisciplinary nature of computer-assisted literary analysis feature prominently in modern studies. In this respect, mainstream literary criticism is most open to contributions from a field that is, by its very nature, acutely aware of its own theoretical position. In the future, the discourse of meta-criticism, however, may be fused with innovative approaches to literary texts. As Jerome McGann points out:
A new level of computer-assisted textual analysis may be achieved through programs that randomly but systematically deform the texts they search and that submit those deformations to human consideration. Computers are no more able to "decode" rich imaginative texts than human beings are. What they can be made to do, however, is expose textual features that lie outside the usual purview of human readers.(McGann 2001: 190–1)
References for Further Reading
Ball, C. N. (1994). Automated Text Analysis: Cautionary Tales. Literary and Linguistic Computing 9: 293–302.
Burrows, J. F. (1987). A Computation into Criticism. A Study of Jane Austen's Novels and an Experiment in Method. Oxford: Oxford University Press.
Burrows, J. F. (1992). Computers and the Study of Literature. In C. S. Butler (ed.), Computers and Written Texts (pp. 167–204). Oxford: Blackwell.
Busa, R. (1992). Half a Century of Literary Computing: Towards a "New" Philology. Literary and Linguistic Computing 7: 69–73.
Corns, T. N. (1991). Computers in the Humanities: Methods and Applications in the Study of English Literature. Literary and Linguistic Computing 6: 127–30.
Feldmann, D., F.-W. Neumann, and T. Rommel, (eds.) (1997). Anglistik im Internet. Proceedings of the 1996 Erfurt Conference on Computing in the Humanities. Heidelberg: Carl Winter.
Finneran, R. J., (ed.) (1996). The Literary Text in the Digital Age. Ann Arbor: University of Michigan Press.
Fortier, P. A. (1991). Theory, Methods and Applications: Some Examples in French Literature. Literary and Linguistic Computing 6, 192–6.
Fortier, P. A., (ed.) (1993–4). A New Direction for Literary Studies? Computers and the Humanities 27 (special double issue).
Hockey, S. (1980). A Guide to Computer Applications in the Humanities. London: Duckworth.
Hockey, S. (2000). Electronic Texts in the Humanities. Principles and Practice. Oxford: Oxford University Press.
Landow, G. P. and P. Delany, (eds.) (1993). The Digital Word: Text-Based Computing in the Humanities. Cambridge, MA: MIT Press.
McGann, J. (2001). Radiant Textuality: Literature After the World Wide Web. New York: Palgrave.
Miall, D. S., (ed.) (1990). Humanities and the Computer: New Directions. Oxford: Oxford University Press.
Miller, J. H. (1968). Three Problems of Fictional Form: First-person Narration in David Copperfield and Huckleberry Finn. In R. H. Pearce (ed.), Experience in the Novel: Selected Papers from the English Institute (pp. 21–48). New York: Columbia University Press.
Opas, L. L. and T. Rommel, (eds.) (1995). New Approaches to Computer Applications in Literary Studies. Literary and Linguistic Computing 10: 4.
Ott, W (1978). Metrische Analysen zu Vergil, Bucolica. Tübingen: Niemeyer.
Potter, R. G., (ed.) (1989). Literary Computing and Literary Criticism: Theoretical and Practical Essays on Theme and Rhetoric. Philadelphia: University of Pennsylvania Press.
Renear, A. (1997). Out of Praxis: Three (Meta) Theories of Textuality. In K. Sutherland (ed.), Electronic Textuality: Investigations in Method and Theory (pp. 107–26). Oxford: Oxford University Press.
Robey, D. (1999). Counting Syllables in the Divine Comedy: A Computer Analysis. Modern Language Review 94: 61–86.
Rommel, T. (1995). "And Trace It in This Poem Every Line." Methoden und Verfahren computerunterstützter Textanalyse am Beispiel von Lord Byrons Don Juan. Tübingen: Narr.
Smedt, K. et al., (eds.) (1999). Computing in Humanities Education: A European Perspective. ACO*HUM Report. Bergen: University of Bergen HIT Center.
Smith, J. B. (1989). Computer Criticism. In R. G. Potter (ed.), Literary Computing and Literary Criticism: Theoretical and Practical Essays on Theme and Rhetoric (pp. 13–44). Philadelphia: University of Pennsylvania Press.
Sutherland, K., (ed.) (1997). Electronic Textuality: Investigations in Method and Theory. Oxford: Oxford University Press.
Toolan, M. (1990). The Stylistics of Fiction: A Literary-Linguistic Approach. London: Routledge.