Christopher Blackwell is an Associate Professor of Classics at Furman University in Greenville, South Carolina. He holds a B.A. from Marlboro College and a Ph.D. from Duke University. He has published on historical topics for scholarly audiences and general readers, and works on a variety of projects in digital humanities.
Gregory Crane, Professor of Classics and Winnick Family Chair of Technology and Entrepreneurship at Tufts University, is the editor in chief of the Perseus Project. He has a broad interest in and has published extensively on the interaction between intellectual practice and technological infrastructure in the humanities.
Authored for DHQ; migrated from original DHQauthor format
We can already begin to envision research projects that were scarcely, if at all, feasible in print culture. The papers in this collection allow us as well to enumerate the services and publication types on which emerging scholarship depends. We also need models for publication that meet the needs and realize the potential of the digital media and we describe here the Scaife Digital Library, a concrete example of true digital publication.
Cyberinfrastructure for emerging scholarship
I look upon the discontent of the literary class, as a mere announcement of the fact, that they find themselves not in the state of mind of their fathers, and regret the coming state of mind as untried, as a boy dreads the water before he has learned that he can swim. If there is any period one would desire to be born in, is it not the age of Revolution; when the old and the new stand side by side, and admit of being compared; when the energies of all men are searched by fear and by hope; when the historic glories of the old, can be compensated by the rich possibilities of the new era? This time, like all times, is a very good one, if we but know what to do with it.Emerson,
Every human individuality is an idea rooted in actuality, and this idea shines forth so brilliantly from some individuals that it seems to have assumed the form of an individual merely to use it as a vehicle for expressing itself. When one traces human activity, after all its determining causes have been subtracted there remains something original which transforms these influences instead of being suffocated by them; in this very element there is an incessantly active drive to give outward shape to its inner, unique nature.Wilhelm von Humboldt,
When Emerson addressed Harvard’s Phi Beta Kappa Society in 1837, slavery was still an established institution and those who in Massachusetts favored its abolition, such as William Lloyd Garrison, were the dangerous radicals of their day and those who, like the author Lydia Maria Child, suggested racial equality found the doors of polite society slamming shut in their faces. Many twenty-first century readers will note the linguistic assumption that scholars are boys, fathers and men. Revolution has its own logic and revolutionaries should never forget that the critical pose which they apply to the present and the past will turn itself upon them when they have themselves passed into history — if, of course, they are so fortunate as to touch the historical memory of succeeding generations. If, in decades and generations to come, students of the ancient world read these words, we cannot now say where they may pause to wonder at how prescient the members of this early generation had been or where they may cringe and squirm. But all of those who contributed to this collection have dedicated their lives to a love for the past and that love allows us to embrace the future. The authors of this collection cannot predict what course events will assume or how they will appear to those who follow, but they have recognized the revolution of their own time and all have taken action to carry this revolution forward.
Emerson does not really define the title of his talk, but for those of us who contributed to this collection, whether we happen to live in United States or not, Ross Scaife embodied the best qualities that a phrase such as the
A generation from now, the course that classical studies and the humanities in general have taken may seem to have been a natural outgrowth of the early twenty-first century. And, indeed, we cannot say to what extent the larger forces at work within society may constrain the shape that our field will assume. But those of us who knew Ross also saw a man who anticipated far ahead of his fellows the importance of making our ideas accessible to the widest possible audience. The original proposal that secured funding to the Stoa called for a new generation of publications that would be designed from the start to be intellectually as well as physically accessible to an audience far beyond the narrow channels of twentieth century academic discourse. Blackwell and Martin in this collection articulate how this vision was, in fact, realized: Stoa publications such as Blackwell’s
Ross was among the first to recognize the importance of making our publications fully open — it is not enough to provide a single perspective via a single web site with primary and secondary sources. We need to make the source materials accessible — others need to be able to download what we produce, apply their own analytical methods, and even build new derivative works on what others have done. It is already difficult for us to remember how radical and far-sighted Ross was years ago. He had the vision to see what was obviously wrong at the time but would become obviously correct in the future. Ross embodied that profound originality that Humboldt describes in those who produce the times of which we are all products.
In this conclusion, we synthesize some of the themes outlined and work described in the previous papers. We recall the categories of ePhilology and eClassics, first discussed in the introduction, and use these two categories to characterize two fundamental advances now becoming possible: our ability to begin increasingly complex intellectual projects with greater command of the underlying data and to answer finally the challenge, articulated in Plato’s
Hesiod,And one day they taught Hesiod glorious song while he was shepherding his lambs under holy Helicon, and this word first the goddesses said to me — the Muses of Olympus, daughters of Zeus who holds the aegis:
Shepherds of the wilderness, wretched things of shame, mere bellies, we know how to speak many false things as though they were true; but we know, when we will, to utter true things.So said the ready-voiced daughters of great Zeus, and they plucked and gave me a staff, a shoot of sturdy laurel, a marvelous thing, and breathed into me a divine voice to celebrate things that shall be and things that were before; and they bade me sing of the race of the blessed gods that are eternally, but ever to sing of themselves both first and last.
The Muses gave Hesiod a staff, and for the poet that is enough — few, if any, have produced poetry that has exerted such a spell over so many people from so many periods of time and disparate cultures as have the works of Hesiod and the Homeric Epics. All of us who live the life of the mind, whether we are poets or professors, follow our Muses. The staff that we have now taken into our hands is still rough and we are learning its balance and heft, but already we can begin to glimpse the stories that we will be able to see when the inspiration of our new muses takes full hold.
The introduction to this collection distinguished two goals within a digital world. On the one hand, ePhilology emphasizes the role of the linguistic record in producing and organizing ideas and information about the ancient world. We use eClassics, by contrast, to describe Greek and Latin languages and literatures, wherever and whenever produced, as they live within our physical brains, touch our less tangible hearts and shape our actions in the world around us. We return now to these topics, suggesting how a Cyberinfrastructure, including both comprehensive collections and advanced, domain optimized services, can advance each of these goals. Memographies allow philologists to explore vast topics far too large for individual scholars in print culture. Plato’s challenge allows us to appreciate the magnitude of the opportunities before us now, as we can finally begin to address a critique of the static written word that is more than two thousand years old.
My mother Thetis tells me that there are two ways in which I may meet my end. If I stay here and fight, I shall lose my safe homecoming but I will have a glory that is unwilting: whereas if I go home my glory will die, but it will be a long time before the outcome of death shall take me.Achilles’ choice, Homer,
It is easy to see how we can, in a digital environment, pursue our research topics more extensively than was previously possible. We have also described how we can make the sources of antiquity intellectually accessible to new audiences. We now turn to the question of what research questions we can pursue that would not have been feasible without collections that are, if not exhaustive, at least large enough to be representative of the published record available in print.
Consider a monolingual printed corpus such as English language newspapers in the 19th century United States. The 1869
Clearly we can begin to pursue topics that require analysis of much more data than
any human being can see, much less contemplate. We can begin to trace topics that
have a life in human tradition that goes beyond any single period or immediate
context. Such topics have lives of their own. We can now write histories or (to
pursue the metaphor of living things) biographies of these topics. The geneticist
Richard Dawkins coined the term thoughts, ideas, theories, gestures, practices, fashions, habits,
songs and dances.
While the term
The biological Plato, likewise, vanished more than two thousand years ago but his
writings have been copied ever since and the historical Plato continues to exist
as the topic of discourse. Scholars could, in print culture before the advent of
searchable texts, laboriously track down many Platonic testimonia, e.g., the
explicit quotations and most obvious allusions to particular passages in Plato.
German classicists have begun to apply text mining algorithms to search for
quotations and allusions that previous generations missed.
In an age of very large collections, we can, however, begin to design systems that will provide automatic visualizations of topics such as Plato and Plato’s works.
Each of the above and similar processes is analogous to the sensors by which scientists track data in the material world. Each of the above processes will produce noise as well as a usable signal. The results will not, of course, be scholarship, but rather data within which patterns can emerge to stimulate scholarship — in the end, human beings will have to contemplate what the systems have found. They will refine the questions that they ask, contemplate the results again, and then repeat their analysis in an iterative process. But, despite all the noise within the system, we will quickly start to see patterns about who has said what at various times about which passages of Plato in a variety of languages.
If we consider established genres of reference work such as lexica, grammars,
encyclopedias and editions, we can see that a wide range of topics constitute
memes that we could now begin to study.
No one will ever be able to see, much less read and contemplate over time,
the primary sources underlying broad topics such as the history of Latin over two
thousand years or even the reception of Plato. Of course, this is hardly new: no
living humanist publishing on major canonical authors such as Homer or Shakespeare
can claim to have read and pondered more than a subset of conventional published
scholarship in the conventional languages of European and American scholarship.
But the rise of large collections and emergent systems with which to analyze those
collections allows us to shift our stance away from the limits of what we can read
with our two eyes and towards the challenges of working with machines that can
scan large bodies of material and then (as we will see through the discussion of
Plato’s challenge below) allow us to focus in detail on passages in more languages
and from more contexts than was possible before.
A memography contains elements that are deeply traditional in form and general purpose, even if it represents an engagement between author, reader and source materials so quantitatively broader in scope as to constitute a radical change.
A memography, in effect, applies the same principles to even larger topics and immediately requires automated methods.
books(the amount that a papyrus scroll could conveniently store). Print technology allowed us to refine these citations so that we could describe precise variations between multiple editions of a single work.
Characteristics of a memography include:
Whether we are producing or reading (or both), most memographies will
force us to interrogate primary materials from more contexts, linguistic, cultural
or both, than we can expect to have studied in detail — the most powerful memes
will work their way across time, genre, language and culture and it is this very
quality that leaves a trail too long and complex for any single human mind. We
must look to machines which can find and preprocess material relevant to a given
meme through immense bodies of data.
The heterogeneity of background knowledge brings us again to the need for a
Cyberinfrastructure. The German-US Archimedes Project was able to assemble the
machine readable dictionaries, on-line source texts, morphological analyzers,
annotation systems and other resources needed to explore the history of mechanics.
Scholars without training in Arabic were, for example, able to work effectively
with materials in Arabic. Almost two decades ago, a formal evaluation of students
using the first generation of Perseus reading tools had already demonstrated that
students with no knowledge of Greek could produce analyses of Greek texts that, in
the view of external evaluators, matched the performance of students with advanced
training in the language.
New technologies can help us locate relevant currents in the vast oceans of source material but we will still need to descend from our overview and think carefully about some subset of the sources. While we will never be able to read everything, it becomes all the more important for us to ponder a few things, carefully selected, in great detail. In the past, practical issues such as language were fundamental barriers: if we found a text in a language that we could not read and did not have a human informant or translation, then we could literally do nothing. That condition has begun to change. This leads us to the topic of eClassics and Plato’s Challenge.
Plato,Socrates: Writing, Phaedrus, has this strange quality, and is very like painting; for the creatures of painting stand like living beings, but if one asks them a question, they preserve a solemn silence. And so it is with written words; you might think they spoke as if they had intelligence, but if you question them, wishing to know about their sayings, they always say only one and the same thing.
Plato,Socrates: When one says
ironorsilverwe all understand the same thing, do we not?Phaedrus: Surely.
Socrates: What if he says
justiceorgoodness? Do we not part company, and disagree with each other and with ourselves?Phaedrus: Certainly.
In a famous paper, published in 1950, Alan Turing proposed what has been since
called the Turing test: a machine demonstrates intelligence when we cannot tell
whether we are conversing with a human or a machine.
to whichand enables services such as plotting the right Alexandria for a given passage on a map for the relevant chronological period. Simple dictionary look-up tools answer questions such asAlexander does this particular passage refer?
what does this word mean?Word sense disambiguation systems allow us to determine the probability of a particular word sense in a given context (e.g., Latin
speechvs.
prayer). Text mining systems elicit key words and phrases by which documents can begin to describe what they are about. We may be a long way from a meaningful answer to the Turing test, but even relatively simple technologies have allowed us to make progress against the challenge that Plato leveled against information technology two and a half millennia ago.
Addressing Plato’s challenge has important implications for the problems that humanists choose to address. In 1972, Jacques Derrida published an essay, translated into English in 1981 as
If we address the
In addressing Plato’s challenge, we focus less upon the 2% of instances where we
cannot readily determine to which Antonius an author refers than upon the other
98% where any reader, familiar with the context, can determine the intended
referent. To address Plato’s challenge, we need to maximize a machine’s ability to
recognize the dizzying number of simple referents that expert readers understand
without conscious effort. We shift from pondering the un-decidable to representing
deceptively simple operations in machine actionable form that we can apply
billions and billions of times. While we will continue to ponder the meaning of
concepts such as justice
and goodness,
we now need systems that can
reliably distinguish iron
as metal from the verb by which we press
clothing. In classics, we could use a lexicon with more up-to-date information of
the various meanings of the Greek word beginning
or empire.
The introduction to this collection has already called for a Cyberinfrastructure, including both collections and services, that can make an ever increasing body of knowledge about the Greco-Roman world intellectually, as well as physically, accessible to an ever widening global audience, supporting many languages and cultural backgrounds. To accomplish this goal, we need not only clever software and well-curated knowledge sources but vast collections from which we can harvest increasingly larger amounts of machine actionable knowledge.
The articles in this collection document a range of efforts, each of which is farther along today because of Ross Scaife’s patient and indeed loving support. We see no field within the humanities that has either made the material progress towards — or, even more important, fostered a community to develop and then use — infrastructure on which all of the humanities must depend in a digital world. In this section, we outline a plan forward and argue that any Cyberinfrastructure for the humanities as a whole should begin with classics.
The center of gravity for intellectual life in every developed or developing society
is now digital and humanity has already begun to arrange an infrastructure around
that new center. The term Cyberinfrastructure, however, emerges from the National
Science Foundation (NSF) of the United States and it was the NSF that funded the
workshop from which this collection emerges.Atkins Report
, Atkins, Daniel E., et al. (2003).
Within this larger context Greco-Roman antiquity provides a logical starting point for development. Several reasons stand out:
First, Greco-Roman antiquity provides a cultural heritage that is fundamentally international. The Greco-Roman world physically stretched from Ukraine to Spain, from Morocco to Iraq, and from England to the Sahara. Intellectually, the Greco-Roman world provides a foundation for the entire Western Hemisphere. The two largest entities within this space, the United States and the European Union, must collaborate with each other and with every other group that can contribute. A focus upon Greco-Roman antiquity can thus balance the focus upon cultural heritages for which particular nation states must take responsibility. In the United States, we run the risk of replicating in our cultural infrastructure the Anglophone, geographically isolated, culturally leveling tendencies of our history and not preparing for the multi-lingual, physically interconnected, culturally complex world in which we actually live. Any Cyberinfrastructure for classics should draw seamlessly and naturally upon resources scattered across the globe.
Second, though this collection has focused primarily upon the textual record, the vast body and variety of data about the ancient world come from archaeology. The study of the Greco-Roman world demands new international practices with which to produce and share information. The next great advances in our understanding of the ancient world will come from mining and visualizing the full record, textual as well as material, that survives from or talks about every corner of the ancient world. Individual nations will be best able to document the physical remains within their borders by integrating locally produced data in international networks of interoperable data. Cyberinfrastructure for Greco-Roman antiquity provides strong, constructive motives for individual ministries of cultures and similar institutions to think globally as well as locally.
Third, beyond the influence of any one nation there exists today a finite textual corpus that has exerted and continues to exert, directly and indirectly, an immense influence upon human life. Much of this textual corpus and an increasing body of machine actionable knowledge associated with it is already available under open licenses.
Fourth, Greco-Roman antiquity demands a general architecture for many historical languages. Even if we focus upon Greek and Latin, once we begin to contextualize these languages, we will find that we need to work with materials about the ancient near east of which Greece was one component and thus with languages such as Sumerian, Akkadian, Hittite, Old Persian, Coptic and Hebrew. As we consider the reception and influence of Greco-Roman culture, we must work with Syriac and Arabic, as well as with every language of Europe. To work with so many historical languages, we must develop an architecture that can integrate language specific content and services with general services. While we may focus initially on the languages and cultures of the Mediterranean and the Near East, these subjects, daunting as they may be, provide only a component of an environment that must include the historical languages and cultures of the Indian subcontinent, Asia and the rest of the world.
Fifth, contemporary classical scholarship is multilingual. Many scientific
disciplines manage the language problem by concentrating their publications in
English. North American and European classicists alike are conventionally responsible
for anything written in, as a minimum, English, French, German and Italian, while
classical scholarship appears in Spanish, Modern Greek, Russian, Croatian, Dutch, and
any other language spoken by classical scholars. Technologies such as cross language
information retrieval (CLIR) are well-established and would be essential in a field
such as classics, where scholars want to pose queries in one language to retrieve
results in at least four modern languages for which they are officially
responsible.
Sixth, our knowledge of the Greco-Roman world casts light upon residents of areas that were at some point part of the Greco-Roman world who are not professional academics. We have natural audiences who speak not only every language of Europe but Arabic, Farsi and Turkish. We must address the challenges not only of professional academics with extensive linguistic training in a handful of languages but of general audiences as well.
Seventh, classical scholarship begins the continuous tradition of European literature and continues through the present. Classicists have in recent years led projects on topics such as the history and topography of London, multitexts of Marlowe and Shakespeare, the history of science, 19th century newspapers, and the American Civil War. These have provided us with tangible grounds to argue that the problems of classical studies raise a superset of issues that appear in the humanities before the rise of time-based media such as films and sound. An infrastructure that provides advanced services for primary and secondary sources on classical Greek and Latin includes inscriptions, papyri, medieval manuscripts, early modern printed books, and mature editions and reference works of the 19th and twentieth centuries. Even if we restrict ourselves to textual sources, those textual sources provide heterogeneous data about the ancient world. If we include the material record, then we need to manage videos and sound about the ancient world as well. A major classics development project should have allied projects, sharing the same infrastructure in representative domains (e.g., the History of Science, early modern studies, 19th century Anglo-American history and literature).
Eighth, classicists have already devoted a generation to developing collections and services. They need a more robust environment and are ready to convert project-based efforts into a shared, permanent infrastructure. They have begun to outgrow the physical systems which they can, as projects, reasonably support. We thus shift discussion to the collections and the services that have already been developed to describe what is now feasible in this field.
Services define what we can accomplish. We develop collections in conjunction with services — even if that service consists solely of a mechanical lookup (e.g., call up a particular passage by chapter and verse). We cannot call up Homer,
to do, make), then we can query
The following list offers a minimal set of services, each of which can be built with the technologies available today and each of which addresses established problems relevant to classicists in particular and many humanists. The services below largely address the problem of classification, i.e., applying a set of criteria to find and/or to label materials. Different annotation tasks admit of different levels of certainty: human readers can identify the correct transcription for print on a modern page but lexicographers will disagree on the senses of a given word. Nevertheless, these services aim at more or less deterministic, right-or-wrong answers. We do not include below clustering and other techniques that can detect patterns that require new categories. The services below reflect basic tools on which more open-ended research depends.
Canonical text services allow us to call up canonical texts by standard chapter/verse citation schemes. Christopher Blackwell and Neel Smith, working in conjunction with Harvard’s
Transcription captures the keystrokes. Page layout analysis captures the
logical structures implicit in the page.
Morphological analysis takes an inflected form (e.g, to do, make
). David Packard developed the first
morphological analyzer for classical Greek,
Syntactic analysis identifies the syntactic relationships between words in a
sentence; it allows us to provide quantitative data about lexicography (e.g.,
which nouns are the subjects and objects of particular verbs), word usage
(e.g., which verbs take dative indirect objects? where do we have indirect
discourse using the infinitive vs. a participle vs. a conjunction?), style
(e.g., hyperbaton, periodic composition), and linguistics (e.g., changes from
SOV to SVO word order). Even relatively coarse syntactic analysis can yield
valuable results when applied to a large corpus: working with our morphological
analyzer and a tiny Latin Treebank of 30,000 words with which to train a
syntactic analyzer, we were able to tag 54% of the untagged words correctly,
but the correct analyses provided a strong enough signal for us to detect
larger lexical patterns.
Word sense discovery automatically identifies distinctive word usage in
electronic corpora. Even without syntactic analysis, collocation analysis can
reveal words that are closely associated (e.g., phrases such as the English
The entry for the Latin word ham and eggs
) and thus identify idiomatic expressions.anger
) provides:
(source) Some Words that Regularly Appear with ira
orationbut in other instances to English
prayer.At Perseus, we have been experimenting with this technique since 2005 and have begun a project, funded by the NEH Research and Development Program, to explore methods for a
Named entity identification provides semantic classification (e.g., is Salamis
a place or a Greek nymph by that name) and then associates names with
particular entities in the real world (e.g., if Salamis is a place, is it the
Salamis near Athens, Salamis in Cyprus or some other Salamis?).
Metrical analysis both discovers and analyzes the underlying metrical forms of
digital texts. Metrical analysis provides information about vowel quantity that
can improve performance of morphological, syntactic and named entity analysis.
Metrical analysis is particularly important for areas such as post-classical
Latin, which have very large bodies of poetic materials that will never receive
the manual analysis applied to Homer, the Athenian Dramatists, Vergil and other
canonical authors.
Translation support aims at fluent translation of full text but can provide
useful results at a much earlier stage of development. Thus, word sense
disambiguation, a component within machine translation, helps translate words
and phrases: e.g., given an instance of the Latin word oration
, prayer
or some other English word or phrase.list all Latin words that correspond to the English
word ‘prayer’ in particular contexts.
Cross language information retrieval (CLIR) allows users to pose a query in one
language (e.g., English) and retrieve results in other languages (e.g., Arabic
or Chinese). For classics, CLIR is an extremely important technology because
classicists are expected to work with materials not only in Greek and Latin
but, at a minimum, in English, French, German and Italian. CLIR is a mature
technology where the cross language queries in some competitions perform better
than the monolingual baseline systems (e.g., you get better results searching
Arabic with an English query than if you searched with Arabic).
Citation identification is a particular case of named entity identification
that focuses on recognizing particular: e.g., determining whether the string
Th. 1.33
refers to book 1, chapter 33 of Thucydides, line 33 of the
first Idyll of Theocritus or something else? Are numbers floating in the text
such as 333
or 1.33
partial citations and, if so, what are the
full citations? Primary source citations tend to be shorter and more variable
in form from the bibliographic citations found in scientific publications.
Perseus has, over the course of more than twenty years, extracted millions of
citations from thousands of documents but the citation extractors tend to be ad
hoc systems tuned for the subtly different formats by which publications
represent these already brief and cryptic abbreviations. In the million book
world, we need citation extractors that can recognize the underlying citation
conventions of arbitrary documents and then match them to known citations on
the fly (e.g, observe numerous references to Thucydides and then infer that
strings such as T. 1,33
describe Thucydides, Book 1, Chapter 33).
Quotation identification can recognize where one text quotes — either precisely
or with small modifications — another even when there is no explicit machine
actionable citation information: e.g., it can recognize arma virumque
cano
as a quotation from the first line of the
Translation identification builds on both CLIR and quotation identification to
identify translations, primary but not exclusively, of Greek and Latin texts
that are on-line in large digital collections.
Text alignment services most commonly align translations with their source
texts and are components of word sense disambiguation systems.
Version analysis services can collate transcriptions of manuscript sources or
of different printed editions of the same work.
Markup projection services, implicit in many of the services above, automatically associate machine actionable data from one source with the same passage in another source. Thus, an index might state that a reference to Salamis in passage A describes Salamis near Athens but that the reference in passage B is to Salamis of Cyprus. Markup projection services would associate those statements with all references to Salamis in various versions of passages A or B, including not only full scholarly editions but also quotations of those passages that appear in journal articles or monographs
The fifteen basic services described above provide mechanisms whereby human beings can think about the ancient world. Services are dynamic processes that depend upon the algorithmic processing of pre-existing materials. Google and similar comprehensive organizations succeed insofar as they have identified very general algorithms that can generate useful results over thousands of domains to millions of users. Algorithms are the core of computer science. Computer scientists seek to maximize what can be computed and to minimize the pre-existing knowledge that a system needs. In this context, if we can associate 90% of the geographic names in 90% of the English language internet with their locations to which they refer, we may decide that the problem has been solved. Much of the work underway focuses upon such first order approximations which are good enough for many people in many contexts.
The remaining 10% or 5% or even 1% may, however, be the space in which the most interesting intellectual work takes place and thus the locus of that value which a digital environment can offer. First, we may be most interested in finding the uncommon instances that are much harder to find. Thus, it is easy to score well on an ambiguous name such as Washington if we are looking for George Washington or Washington state but much harder if we are looking for Washington, MA, or Washington, GA. Second, we need to consider the issues of context. The patterns that we find in English language documents from India and South Africa will, of course, differ from those that we find produced in the US and the UK. If we remain focused on the United States, the 1855
Scholarship has always begun where obvious conclusions are not available or, on deeper inspection, prove inadequate. In most cases, readers within a scholarly community can automatically identify the people and places cited by a text but in a small percentage of instances, these references are unclear. Scholars have spent generations trying to decide to which Antonius a particular text refers or which variant reading among the manuscripts (if any) most probably reflects what Aeschylus composed. We may well be able to identify what texts of Plato people have read in dozens of languages over thousands of years and see in a form that we can understand the sorts of things that people have said about Plato as a whole, a particular work of Plato or a particular passage. But such automated analyses and visualizations provide only the starting point for meaningful interpretation.
In this digital age, a major — and indeed, perhaps the important — portion of our work must center on the space between where the machines can bring us and where our intellectual aspirations lead. As technology advances, some scholarly tasks become wholly automated and are thus obsolete as effective instruments of scholarship. We may print the results of word searches as keywords in context but the production of print concordances is at best a problematic activity: we are better off creating an electronic text and then shuffling the words via various algorithms. If we want to create more sophisticated visualizations, we are better served marking the source text (e.g., identifying each dictionary entry) to create a particular view of that data (e.g., a dictionary organized by dictionary entry rather than inflected form).
The following categories of document provide some, though by no means necessarily all, of the foundational data on which we base our work with primary sources. Each constitutes a structured environment through which we human authors communicate with other authors and with automated systems. Each category of document can play the following roles:
bankin passages x, y, z corresponds to a financial institution, but to a river bank in passages a, b, c). Part of each training set is set aside to serve as a gold standard: we test various learning algorithms by training on one part of the training set and then comparing how well it performs on the part that we set aside. Training data thus does not have to be perfect to be useful — in fact, perfection is not a relevant category. In reality, training sets include at least some ambiguous examples and a mature environment must be able to distinguish levels of certainty/community agreement.
The following describe some of the document types that we need in a digital
environment. To some extent they all reflect components of comprehensive digital
editions and each contributes to the roles that textual data can play in a digital
environment.
The contribution of Dué and Ebbott in this collection outlines the concept of a multitext. We use the term multitexts here to describe methods to track multiple versions of a text across time. The term multitext does not mean that editors cannot produce their best attempt to reconstruct a source text no longer available to us — we can represent a multitext as a network of versions with a single, reconstructed root. We may well find that the new linguistic and analytical resources at our disposal — especially resources such as treebanks and other categories of linguistic annotation — will allow editors to place old questions on a fundamentally new foundation and to provide new insights into the editions that classical authors produced of their works.
The term multitext does, however, insist upon our ability to track and compare versions over time. In many cases, the original words of an author are as relevant as the Hubble telescope was to Galileo. Petrarch and Machiavelli did not read Teubner Editions or Oxford Classical Texts. We are in a position to begin modeling the texts of our authors as they appeared at different points of time and even the textual universes in which different actors works. Scholars in early modern studies, for example, need systems that can show us at a glance how various sixteenth and seventeenth century editions of classical authors differ from the modern editions that they have laboriously read.
First, digital editions are designed from the start to include images of the
manuscripts, inscriptions, papyri and other source materials, not only those
available when the editor is at work but those which become available even
after active work on the edition has ceased.
Second, multitexts are versioned: they encode not only one reconstructed
edition produced by one editor but are designed from the start to represent
multiple editions.
Third, multitexts include multiple apparatus critici, but these apparatus critici are machine actionable. Machine actionable means that textual comments are encoded in such a way that readers can compare the text with readings from MS A vs. MS B and/or select their own readings. While there can be multiple apparatus critici, each apparatus criticus must build upon the same set of common identifiers: a machine must be able to determine that B in one apparatus criticus corresponds to V in another.
The multitext as described above only covers versions of a text within a single language. In many cases, however, literary texts have exerted their influence in translations that were one or more languages removed from the original. Shakespeare’s worked with Thomas North’s translation of Plutarch, but Thomas North translated Jacques Amyot’s French translation of Plutarch, rather than Plutarch’s Greek. We have to remember that many Greek texts exerted much of their influence when they circulated in Latin or Arabic translation. We need parallel texts of multiple linguistic versions
The contribution of Bamman and Crane to this collection introduces the concept of parallel texts and their application to lexicography. Parallel texts can include a single edition and translation (like the Loeb and Budé series) but can also include multiple translations in multiple languages aligned with multiple editions (e.g., an Italian translation of Aeschylus that contains variant translations for a number of major editions). Parallel texts assume some level of common citation schemes: e.g., chapter 86 of book one of Thucydides in an English translation roughly corresponds to the Greek in chapter 86 of book one of Thucydides in standard editions. The more numbered sections, the more precisely citation schemes can align source texts and translations. Parallel text analysis and automatic alignment software can, however, discover many instances where words in the translation correspond to words in the source text. Even if we restrict ourselves to high probability correspondences, we can align our texts far more closely than any traditional citation system. Put another way, once we have page sized chunks of text and translation aligned, automatic alignment can do a better job than manually added structures such as section markers. Such section markers are probably most useful for human readers who want to extract logical chunks. Automatic alignments would be familiar to those who work with Plato and Aristotle, where editions use the page breaks and page sections of particular editions rather than the logical structure of the text itself.
Once we have established the correspondences between different linguistic versions of the text, we need automated methods to help identify likely locations where those versions diverge, whether because a translator misunderstood the original or because the idea of translation was looser than that of later periods. Finally, we need methods whereby scholars can annotate these differences according to the patterns which they determine are significant.
The contribution of Bamman and
Crane in this collection also introduced some of the possibilities for
dynamic lexicography in a digital environment. WordNet and EuroWordNet are
pragmatic examples of semantic networks, associating words with similar
meanings into hierarchical classes.
Treebanks are databases that label the syntactic role of each word in a set of
sentences. These syntactic tags constitute parse trees (hence the name) that
can be used to analyze lexical, syntactic and even rhetorical patterns.ut followed by a subjunctive.
Syntax is important but by no means the only subject of linguistic annotation.
Co-reference annotation maps pronouns to their referents (e.g., he
in
passage X refers to Julius Caesar). Annotation languages have emerged to
capture higher level semantic phenomena such as temporal expressions (TimeML).
We use machine actionable grammars to describe resources comparable to print
grammars. These may have hundreds or thousands of observations, each roughly
corresponding to the numbered paragraphs of their print predecessors. But in a
machine-actionable grammar, each paragraph would include not only citations but
a set of patterns (e.g., ut heading a subordinate clause
followed by the subjunctive) and some indication of the precision (how many
false hits the pattern would retrieve) and recall (how many correct hits the
pattern would miss). The machine-actionable grammar would thus build on the
treebank. Where the treebank would stress use of a smaller number of categories
to describe the relations of individual words, machine readable grammars would
suggest an open-ended set of more complex phenomena inferred from the corpus.
The contribution by Elliott and Gillies in this collection outlines the major issues surrounding geographic information in classical studies. We also need to represent information about people, organizations, technical/scientific terms and other entities with regular features.
The underlying principal of machine actionable indices is the same as that of
their print antecedents. Machine actionable indices differ in at least two
ways. First, the structure of the index entries is explicit: we can extract
headwords, hierarchical structures (e.g., Athens, (1) Religion …. (2)
Government …
) descriptive labels (e.g., born at X,
stood for consul in Y
), and associated citations. Second, index
headwords contain the most general possible identifiers. Thus, we don’t simply
cite Athens, Greece, or Thucydides the Historian, but add the identifiers such
as the numbers for Athens (TGN 7001393) and Thucydides (TLG 0003) in the
Propositional knowledge includes standard database fields: e.g., author=Thucydides + Title=History-of-the-Peloponnesian-War in effect states that Thucydides is the author of the
Such propositional reasoning rapidly becomes computationally complex. More significantly, the underlying propositions rapidly become idiosyncratic, as each observer creates slightly different categories and our propositional knowledge becomes internally inconsistent — as soon as computer scientists began converting print reference works such as the
The Historical Event Markup and Linking (HEML) which Bruce Robertson describes in
his contribution to this collection illustrates the measured use of an ontology
to do a great deal but not too much — HEML did much to shape the newest
extensions in the Text Encoding Initiative (TEI) methods for representing
named, dates, people and places.A has property B
: the string Arma virumque cano, Troiae qui primus ab oris
fecit has-language Latin and fecit
has-morphological-analysis; archê-in-passage-X has-sense empire.
A
treebank contains compound propositional statements such as agricola is-a noun and agricola is-subject-of
fecit. We include propositional knowledge as a
separate category to emphasize categories not included above. Thus, the
CIDOC-CRM ontology includes a wide range of categories for art and
archaeological objects and HEML provides a vocabulary for describing people,
places and events in time.
A true digital commentary must build judiciously upon all of the tools listed above. Full commentaries should include annotations identifying every phenomenon of interest to its intended audience: every word should be morphologically disambiguated, every sentence should have its syntactic data encoded; every major variant should be labeled; every person and place should have at least one identifier from a general work or a label indicating that this is a place/person/institution not yet in available reference works and a new identifier. Put another way, if scholars have developed a widely recognized classification scheme (word senses in a lexicon, numbered paragraphs in a standard grammar, metrical analyses), then fully commented texts will have categorized every instance of each relevant phenomenon in a text. And, of course, commentaries must from the start allow commentators to include variant explanations for the same phenomenon (e.g., proposographic disputes about which Antonius is meant, textual arguments about which reading is correct).
An Athenian citizen does not neglect the state because he takes care of his own household; and even those of us who are engaged in business have a very fair idea of politics. We alone regard a man who takes no interest in public affairs, not as a harmless, but as a useless character; and if few of us are originators, we are all sound judges of a policy. The great impediment to action is, in our opinion, not discussion, but the want of that knowledge which is gained by discussion preparatory to action.Thuc. 2.40.2, after Crawley
For us, public affairs go beyond the individual decisions of a particular government but extend to all discussion. We may be professional academics, privileged to earn a living by working on the subjects to which we have dedicated our lives, but we enjoy that privilege because we serve the broader interests of humanity. Our work within the academy is only a means towards the greater goal of supporting intellectual life and the general understanding of the past.
Before discussing some of the essential features that characterize true publication in a digital age, we distinguish, in the context of this discussion, archives and libraries. For our purposes, libraries provide the foundation on which public discourse takes place. Libraries constitute the most advanced and efficient space with which society is able to conduct discourse that extends across time and space and that depends upon preservation of, and access to, the terms of discussion.
He had also, says he, such a library of ancient Greek books, as to exceed in that respect all those who are remarkable for such collections; such as Polycrates of Samos, and Pisistratus who was tyrant of Athens, and Euclides who was himself also an Athenian, and Nicorrates the Samian, and even the kings of Pergamos, and Euripides the poet, and Aristotle the philosopher, and Nelius his librarian; from whom they say that our countryman Ptolemaeus, surnamed Philadelphus, bought them all, and transported them with all those which he had collected at Athens and at Rhodes to his own beautiful Alexandria.Athenaeus,
Our varied conceptions of a library are both descriptive and prescriptive: these
conceptions shift as material culture changes the methods with which we can manage
information. In the Greco-Roman world, Alexandria had the most famous library and
every lover of Greek literature sighs to think of the tragedies of Aeschylus,
Sophocles, and Euripides, the poems of Sappho and the other works that once lay
among its holdings and are now lost. The library at Alexandria was based upon
miraculous technologies such as papyrus production and sea-born travel as well as writing.
Popular conceptions of institutions such as libraries evolve along with the
capabilities of their enabling technologies. The ancient library at Alexandria was
not the instantiation of a Platonic ideal but the best use of the most advanced
methods of the time. The library at Alexandria brought texts from around the Greek
world into a single location. In the industrialized world, we have used
industrialized print technologies to create hundreds of large libraries around the
world, in effect protecting long-term access by maintaining multiple copies of the
same work in widely separate locations. In the digital world we can not only
create far more numerous copies and greater redundancy but our libraries are no
longer inherently limited to physical locations.
The passage quoted attributes to an intellectual of the second century CE the
claim that he had assembled an unparalleled collection of ancient Greek books. Two
features from the underlying Greek are worth noting. First, no word corresponding
to library
actually appears: the Greek phrase (possession of books
and does
not designate either a place or an organization. Second, the passage above speaks
in terms of individuals and collectors. The one exception, Nelius, is not a
librarian: the Greek text probably includes an error but the term applied to
Nelius (
A collection of hand-written documents, however, did not fit the dominant
conceptions of libraries that took shape in print culture. We still call the
ancient manuscript collections of Europe libraries because they bore this name,
but in the massive libraries that emerged in the 19th century manuscripts,
pamphlets and everything that did not fit the exacting demands of academic
publication was preserved in special collections and archives. There, these
documents would await the scholar who would cull them for information or create
printed editions of them that could circulate and play an active role in the
mainstream of intellectual life. For each surviving ancient text of Greek and
Latin the editio princeps, the first printed edition, no
matter how problematic its contents, represented a milestone and a new birth,
marking the transition from handwritten manuscript into the new technology of
print. Works still available only in manuscript were, in print culture, the
material for published editions and printed facsimiles. They had not yet been
published in print and thus were not yet a part of the citable record upon which
general human discourse could depend.
In the past decade, the academic library system has quietly shifted again. The print libraries of the 19th and 20th century have, in effect, become the archives of the 21st century, as publication and discourse in the most heavily supported disciplines have shifted entirely to a digital medium. The debate about print and digital information may continue but the infrastructure of mainstream intellectual discourse is now digital. The hotter the scientific discipline, the shorter the half-life of its publications — the last five or ten years of published material is enough to support many and probably most cutting edge research projects. Biologists studying changes in flora and fauna need access to as much historical data as possible — for them observations from the 18th century provide foundational data. The
But just because information is on-line does not mean that that information has
exploited the full potential of the digital medium. The debate has shifted instead
to the question of open vs. closed access. The extraordinary cost increases for
scientific journals have done more than anything else to drive the principle of
open access — roughly one quarter of the entire acquisition budget for the Tufts
University library in 2007, for example, went to a single scientific publisher,
which does not invest any significant sums in the research that it publishes.serials crisis
is sufficiently well-established that it has spawned a
Wikipedia entry (http://en.wikipedia.org/wiki/Serials_crisis).Recipients of funding from the National Institutes of Health (NIH) should
be aware of a new reporting requirement (http://grants.nih.gov/grants/guide/notice-files/NOT-OD-08-033.html)
that went into effect on April 7, 2008. Principal investigators must ensure
that electronic versions of any peer-reviewed manuscripts arising from NIH
funding and accepted for publication after that date are deposited in PubMed
Central (PMC), NIH's digital archive of biomedical and life sciences journal
literature. Full text of the articles will then be made freely available to
the public no later than 12 months after publication. The requirement
applies to any NIH direct funding, including grants, contracts, training
grants, subcontracts, etc. In addition, beginning May 25, 2008, anyone
submitting an application, proposal, or progress report to NIH must include
the PMC or NIH Manuscript Submission Reference Number when citing applicable
articles that arise from their NIH-funded research.
The massive library collections at Harvard University have been a magnet for
scholars and the university has traditionally been quite conscious of the
investment it has made and the advantages which that investment confers upon it —
the Boston Library Consortium is often described as everyone but Harvard.
Nevertheless, Harvard University surprised many observers by taking a dramatic
stance in favor of open access. The Faculty of Arts and Sciences at Harvard
University voted in February 2008 to give the
University a worldwide license to make each faculty member's scholarly articles
available and to exercise the copyright in the articles, provided that the
articles are not sold for a profit.
request a waiver of the license
for particular articles where this is preferable
— faculty cannot,
according to the language of the press release, simply refuse to exempt themselves
but must request waivers on a case by case basis. Steven E. Hyman, Provost at
Harvard University framed the new policy in terms of responsibility: The goal of university research is the creation,
dissemination, and preservation of knowledge. At Harvard, where so much of our
research is of global significance, we have an essential responsibility to
distribute the fruits of our scholarship as widely as possible.
Harvard
is, of course, only a single institution but the actions of its faculty and
administration provide a powerful example of how conventional thought has begun to
shift.
Google may ultimately solve the problem of access to the earlier print record.
Through its Google Books project, Google has already digitized millions of books
(and a striking amount of 19th century classical scholarship).number over 7 million volumes, covering
thousands of years of civilization, from papyri to reports of the latest
advances in science and medicine.
searching free to the
public
that asserts that Michigan content be made available at no direct cost to end users.
Searching Free to the Public: Google agrees that to the extent that it or
its successors make Digitized Available Content searchable via the Internet,
it shall provide an interface for both searching and a display of search
results that shall have no direct cost to end users. Violations of this
subsection, 4.3, not cured within thirty days of notification by U of M
shall terminate U of M's obligations under section 4.4.
:
Classicists have already begun taking steps to make their core primary materials
available in the interoperable formats and open licenses needed for teaching and
research in a digital world. The Perseus Digital Library released the
TEI-compliant XML source files for all of its primary sources and accompanying
translations in March 2006 under a Creative Commons license. Harvard’s Center for
Hellenic Studies (CHS) has also undertaken to extend this effort and announced in
August 2008 a plan to create a digital library of new TEI-compliant XML editions
for the first thousand years of classical Greek, including at least one version of every Greek text known to us from
manuscript transmission from the beginning of alphabetic writing in Greece
through roughly the third century CE.
If we are to understand what form we would like our libraries to assume, we must first consider what we expect from the publications that will populate these libraries.
Plato,Socrates: And every word, when once it is written, is bandied about, alike among those who understand and those who have no interest in it, and it does not know with whom to speak or not to speak; when ill-treated or unjustly reviled it always needs its father to help it; for it has no power to protect or help itself.
Phaedrus: You are quite right about that, too.
Scholars have written about the ancient world since antiquity itself, and we build upon more than half a millennium of the scholarship that print made possible. A great deal of material about the Greco-Roman world exists in digital form, but only a small subset of that material can fulfill its potential in a digital world. The essential criteria for true publication are different in the digital world because the digital world supports services that are not feasible in print and can reach audiences millions of times larger than academic print publications could reach. The fact that a resource exists in a digital format is a necessary but not sufficient condition: just because an object of potential relevance to classics is digital does not mean that it is useful.
Not only the print volumes that sit upon our library shelves but the digitized publications to which commercial entities sell access have all become, within the digital world, archival materials, tied to a few discrete points on the earth and membership in specialized organizations. Whatever the merits of their content, these essays are important because, despite the vast body of existing scholarship, these essays are among the first original works of classical scholarship to meet the minimal criteria for publication in a digital age.
Scholarly publication in a digital age must satisfy at least the following four conditions. These four conditions overlap, of course, with those familiar from five centuries of print culture, but, of course, they also must adapt to the digital foundation on which all shared intellectual expression already depends.
First, the content must be of interest to someone other than its producers. In
academia, we have developed peer review as an instrument to assert that a
particular intellectual production has sufficient value to warrant a permanent
place in the scholarly record and we used traditional peer review in this
collection as well. Peer review is, of course, no guarantee — and readers will
come to their own conclusions about what is published here, as they do about
everything that they read. Other models exist to achieve the same goal and we
should not confuse the instrument of peer review with its purpose.
Second, the content must be in a format that we can preserve and use for long periods of time. Print culture developed for the organization of books and articles conventions that have proven so successful as to become almost invisible: we take tables of contents, chapters, footnotes, indices, bibliographies and other conventions for granted. In a digital environment, machines are the first and essential readers of all published materials — where more is written than any one person can digest, we depend upon what machines can extract to identify those few objects on which we can focus the limited attention and intellectual capacity of the human brain. The articles in this collection express their basic structures in a standardized format that machines can understand. More sophisticated documents will surely emerge but these are likely to enhance, rather than abandon, the structures within this collection. By investing in the XML markup we have conformed to the best practices of the present so that the digital librarians in future generations can manage these articles within their digital collections.
Third, the content must have at least one reliable long-term home. In print
publication, authors needed publishers to put their work into circulation.
Publishers committed, however, only to provide very short-term access.
Preservation in print culture has always been the task of libraries. Even if war
or natural catastrophe destroyed one library, other libraries preserved separate
copies of each work and these could be reprinted or reproduced with increasing
facility. In a digital age, distribution is trivial — any web page could in early
2008 reach more than half a billion machines.
Fourth, the content can circulate freely — it is, indeed, truly public and thus
published. A decade ago, this idea was radical and unnerving to many of us, but
the Stoa Publishing Consortium always supported open access from its creation in
1997. In the quotation that opens this section, Plato’s Socrates expresses anxiety
that information, once represented in a physical medium is separate from its
producers and begins a life of its own. In the end, we have overwhelming reasons
to leave these anxieties behind. First, we need both our primary and secondary
sources to be open for analysis by as many systems as possible if we are to
exploit the full power of the digital world and to fulfill our professional
obligations as scholars. Second, each scholar, department, discipline, college,
and university is, at some level, locked in a Hobbesian war of all against all.
College and university web sites are very expensive to produce and maintain but
they are freely accessible because each institution is competing for exposure.
Subscription revenues do not pay for scholarship. Third, we have plenty of money
in the system to pay the costs. During 2005, the 123 members of the Association of
Research Libraries invested more than 1.1 billion dollars in their
collections.
Peer review, the
The advice of Themistocles had prevailed on a previous occasion. The revenues from the mines at Laurium had brought great wealth into the Athenians' treasury, and when each man was to receive ten drachmae for his share, Themistocles persuaded the Athenians to make no such division but to use the money to build two hundred ships for the war, that is, for the war with Aegina. This was in fact the war the outbreak of which saved Hellas by compelling the Athenians to become seamen. The ships were not used for the purpose for which they were built, but later came to serve Hellas in her need.Herodotus 7.144, tr. Godley
Themistocles somehow convinced his fellow citizens to forego a windfall payment and to invest instead in a navy. Even then, the nominal object of the navy — a war with the nearby island of Aegina — masked the vastly greater, but inconceivably distant, Persian threat. Aegina looms as a presence visible from the Acropolis. Herodotus elsewhere (Hdt. 5.53) reports that the Persian capital at Susa was a three-month journey from Ephesus on the West coast of modern Turkey.
While most of us remained focused upon publishing our own work under our own name and
building digital resources that would serve our own projects, Ross Scaife early
realized that there were bigger issues at stake than a few drachmas of scarce
prestige in a small academic field. The idea behind the Scaife Digital Library (SDL)
reflected Ross’s own long-term interests: a 1997 grant from the Fund for the
Improvement for Postsecondary Education helped Ross Scaife found the Stoa Publishing
Consortium to pioneer new models of publication to enhance learning and intellectual life.
The SDL is a new, virtual collection designed to support the digital publications
that meet the four criteria outlined above. The first plans for the SDL were
presented at the beginning of a two day workshop on What do you do with a million
books?,
Humboldt University in Berlin on March 17, 2008, two days after Ross
Scaife died in Kentucky. On August 6, 2008, the Institute for the Study of the
Ancient World, based at New York University, funded a planning meeting hosted at
Harvard’s CHS in Washington, DC. The first release of the SDL was announced on
November 6 of the same year, at the TEI Annual Meeting at King’s College London.
The SDL contains durable digital objects that satisfy the four criteria of digital
publication outlined above:
The SDL is simultaneously an idea, a concrete collection, and an organization
to produce new content. Any digital objects that satisfy the four criteria of
publication automatically belong to the SDL — thus every article already published by
the DHQ can be treated as part of the SDL because each DHQ article satisfies all four
criteria. Ross Scaife was a classicist and classics offers the initial center of
gravity for the SDL, but we exclude nothing relevant to the humanities.
The SDL is also a concrete collection: it includes a catalogue of known objects and the information needed for automated services to collect each digital object from its home repository. We hope to see objects from the SDL in a range of locations and organizations: with Internet giants such as Google, at particular computational and storage Grids, and on local computing clusters.
Finally, the SDL is an organization designed to produce new content. The production of new SDL content can be a simple decision that any digital object produced by a particular third party (e.g.,
The SDL does not, however, provide services for end users. The SDL may include the
code for those services that only humanists can be expected to provide (e.g., an
advanced morphological analyzer for classical Greek) but the SDL does not plan to
provide those services. The SDL provides a long term home for the objects which
others can analyze or make accessible in various systems. We require that each object
have an approved format so that as many groups as possible will develop the largest
possible number of services with which to make SDL objects useful to the widest
possible audience. In addition, we require that each object have a long term home,
which in effect, states that we have entrusted libraries to apply their traditional
functions of preservation and access for SDL objects. The requirement that each
object have an open license reduces our dependence on any one institution: we hope
that there will be many copies of each object from the SDL, both under formal
preservation systems (such as LOCKSS) and in thousands of informal
collections.Lots of Copies Keeps Stuff Safe
a
program based
at Stanford University Libraries, is an international community initiative that
provides libraries with digital preservation tools and support so that they can
easily and inexpensively collect and preserve their own copies of authorized
e-content.
Retrieved from http://www.lockss.org/lockss/Home/.
The SDL thus answers questions of production and preservation but questions remain. The digital environment allows us to rethink not only publication but who can publish and how we divide labor in the scholarly world.
Euripides,Theban Herald: Who is the despot of this land? To whom must I announce the message of Creon who rules over the land of Cadmus, since Eteocles was slain by the hand of his brother Polyneices, at the sevenfold gates of Thebes.
Theseus: You have made a false beginning to your speech, stranger, in seeking a despot here. For this city is not ruled by one man, but is free. The people rule in succession year by year, allowing no preference to wealth, but the poor man shares equally with the rich.
Master Tyndale happened to be in the company of a certain divine, recounted for a learned man, and, in communing and disputing with him, he drove him to that issue, that the said great doctor burst out into these blasphemous words,We were better to be without God's laws than the pope's.Master Tyndale, hearing this, full of godly zeal, and not bearing that blasphemous saying, replied,I defy the pope, and all his laws,and added,If God spared him life, ere many years he would cause a boy that driveth the plough to know more of the Scripture than he did.
The papers in this collection have focused upon the practices of scholarship. In this section we consider the work of scholarship and the associated division of labor. The center of gravity for intellectual life has not only shifted, decisively and forever, to a digital medium but the relative position of professional humanists has changed as well. To some extent, that division of labor has already begun to shift. The scholarly practices to which we award Phds, tenure and promotion may have remained largely unchanged but new practices of intellectual life have exploded onto the scene. Most of us like to think of ourselves as a progressive force, but we, in the eyes of many, more closely resemble the bullying Theban Herald of Euripides’
Professional academics have played, insofar as we can tell, almost no direct role
within this historic movement. The authors of this conclusion do not know of any
academic who has included Wikipedia along with their conventional publications in
their yearly reviews. We do know that, as of the end of August 2008, Wikipedia
contains more than two and one half million entries. And we know that this resource
has proven astonishingly useful, its flaws real but, when systematically analyzed, no
worse than those of conventional, centralized reference works.
No one knows how much labor the various language versions of Wikipedia have absorbed
— in part because volunteers have contributed the vast majority of the labor and
volunteers do not track billable hours. Wikipedia does cost money — the 2005 budget
for Wikipedia was $739,200, while the overall Wikimedia foundation reported a budget
of 4.6 million dollars for 2007-2008.
Scholarly publications incorporate a great deal of accumulated labor. In classics, the language barriers make such embedded labor relatively easy to identify — classicists need expertise in the Greek and Latin languages, familiarity with the ancient core texts of at least one of these languages, and enough knowledge to work comfortably with book-length studies in English, French, German and Italian. If we consider four years of undergraduate education and six years of doctoral studies as one model of scholarly apprenticeship, each scholarly publication represents years of embedded labor. When a faculty member devotes a month or two in the summer to a new publication, we thus need to consider not only the hundreds of hours invested during that summer but all the years of work on which that scholar is drawing.
Wikipedia and other forms of community-driven intellectual production ultimately increase the audience for — and thus the realizable value of — advanced scholarship. Professional academics need to decide how they wish to respond to this vast audience. Many of us are products of a print culture in which our publications simply could not reach beyond a few hundred or, at best, thousand research libraries. We had no reason to write for audiences that our publications would never reach. Furthermore, the professionalized incentives of academia rewarded us for producing work that would impress our colleagues and facilitate tenure, promotion, and other signs of academic success. We now have, however, radically new technologies and social practices with which to advance the intellectual life of humanity as a whole.
Twentieth century print culture produced scholarship that required a great deal of
training to produce and almost as much training to understand, much less appreciate.
We now see a world emerging with much lower barriers for entry.
When given a particular set of tags and relationships most readers will
agree on the syntactic relationships between most words in most texts,
however, some Greek sentences support multiple interpretations, whether
because we are not sure what the author originally wrote or because the text
that we have reconstructed is fundamentally ambiguous. Ultimately, the
syntactic analysis for some words in our surviving texts remains an object
of research. Other tasks that are in most cases straightforward can be the object of
research as well: in some cases we cannot determine to which Antonius or
Cleopatra a particular passage alludes and we depend upon skilled
prosopographical analysis to rank the possibilities. We find place names
where we do not know for sure the original location. Word sense
disambiguation depends upon the senses that we ascribe to a word and thus
upon semantic analysis that can become complex for common words. We thus see a gradient of tasks. In many cases, students and undergraduate
classes can improve upon the results of automated processes and/or provide
the initial training data from which, in turn, automated methods can analyze
much larger bodies of material. In some cases, the answers to conceptually
simple questions (e.g., who is the Antonius in this passage? What is the
structure of this sentence?) are not immediately clear and have historically
provided scope for some of the most skillful classical scholarship. The
patterns visible from the many passages that are not controversial will,
when aggregated and analyzed, allow us to place discussions of ambiguous
instances on a more explicit and quantified footing. We may even find
scholarly consensus advancing as new scholarly instruments, developed in
large measure by students and the general public, allow us to shed light
upon old problems. Thus, we have a space that provides ample room for
contributors at a various levels of expertise.
New sources of data open up possible research topics to which our advanced
undergraduates can realistically aspire. The Homer Multitext Project, for
example, has published high resolution images of the most important
manuscript of Homer, the 10th century Venetus A, making visually accessible
scholia and readings that have never been published, much less translated.
Students are well able to produce initial diplomatic editions with basic
contextual information and English translation. Published in standard
formats under open licenses and in long term institutional repositories,
such works can provide the foundation for a new generation of editions.
Generations of students can productively provide the intellectual apparatus
needed to understand the detailed page images already being produced in
Europe and North America for manuscripts of Homer and other classical
authors, fundamentally changing the role that these source materials can
play in intellectual life. Likewise, the creation of treebanks allows us to
see patterns of word usage, linguistic practice and individual style. Even
now, as we develop large automated treebanks, students can create treebanks
for individual works and control samples to produce original research: thus,
given a treebank and the ability to find Greek words corresponding to
English, students could undertake valuable systematic studies that were not
practical before (e.g., the semantics of words for
It would be hard to overstate the possible opportunities of practical
undergraduate research for classics and the humanities in general. Tangible contributions. Automated methods can do an
immense amount but they benefit as well from very large amounts of skilled
human labor. Many basic tasks reflect the strengths of human intelligence
and provide opportunities for students and non-professionals to contribute
tangibly to the infrastructure on which the study of classics depends. The
essays by Blackwell and
Martin and by Elliott
and Gillies document areas in which students can quickly begin
contributing tangibly to our understanding of the ancient world. Bamman and Crane describe
the emerging role of syntactic databases — treebanks — for the study of
classical Greek and Latin. Even if we have a treebank with millions of words
already analyzed to serve as a training set for an automated syntactic
analyzer, the best automated systems do not, at present, provide more than
87 or 88% accuracy — enough for many analytical purposes but not perfect.
Greek classes at Brandeis, Tufts, Furman and elsewhere have already begun to
integrate the production of syntactic data into their curricula. The method
is straightforward. Treebanks use their tags and methodologies but, in
essence, the production of treebanks depends upon ancient practices of
reading — we need to identify the main verb, its subjects, objects, etc. Two
students can, for example, analyze each sentence, the class can then discuss
the points at which they differ, and produce carefully analyzed sentences
that may include variant interpretations.Undergraduate research. Once we have large databases of
information we can begin to see patterns that were not visible before. We
rely upon automated methods of analysis to direct our attention to
interesting patterns and thus to serve as the starting point for, rather
than a conclusion to, analysis. It is important to emphasize that we do not
need perfect data to identify major patterns —a recent study conducted by
David Bamman showed that even when automated syntactic analysis generated
results that were as low as 50% accurate, some significant linguistic
patterns were visible despite the noise of a 50% error rate.power
in Herodotus
and Thucydides). The results of their research can be published through our
university repositories, connected to every passage on which they shed
light, and preserved, as permanent contributions, long after their youthful
authors have passed from the scene.
The field of classics — and, indeed, every field within the humanities — needs to adapt itself to the challenges and opportunities, some realized, others emergent though visible in outline, that this digital environment has thrust upon us.
First, all classicists are digital classicists. Insofar as the practices of our work
advance research projects imagined within the limitations and for the tiny academic
audiences of print culture, we are antiquarians. We may not believe in particular
ideas such as the judgment of history,
but we do believe in conventional ideas
and are confident that the implicit assumptions about what constituted scholarship in
the twentieth century will give way to new conventional ideas. Each of us working now
for an audience in the future is making bets about what those conventional ideas will
assume. The authors of this conclusion are not so sanguine as to believe that the
culture and languages of ancient Greece and Rome will inevitably flow outwards into
the hearts and minds of humanity as vigorously as we hope. Technology constrains and
enables the space within which we move. How well and how quickly we in classics and
the humanities adapt to the niches within this space depends upon the decisions that
we make (however unpredictable the outcomes of those decisions may be).
We do not know yet what common technological knowledge classicists must share. We cannot all be accredited system administrators or application programmers. On the other hand, it is hard to accept complaints that the TEI Guidelines or the underlying structures of treebanks are too complicated for scholars who work with six languages. The services outlined above can use textual and syntactic markup to enable new forms of scholarship and of reading support but such data structures are, fundamentally, surface expressions of traditional ideas. Habits from the past and anxiety about the future are the major barriers. Those who have succeeded in the traditional tasks of classical philology will, if they can muster the necessary labor, find themselves in a world that allows them to pursue their traditional tasks more fully. If they can read Pericles’ Funeral Oration in the original Greek, they are well able to master any general technological system.
Classicists need only to exploit the analytical tools and conventions of intellectual discourse available to them to achieve their goals. For us, the blogs, wikis, assorted web pages and other digital tools simply challenge us to adapt the complementary goals of rhetorical power and intellectual discipline. We hope that others will more fully realize these goals than has been possible so far for us.
Second, classicists need some scholars who have more advanced knowledge of the technology. We do not have the resources to sustain a subfield such as bioinformatics, but the broadening textual collections and treebanks now starting to emerge for Greek and Latin build upon many of the same techniques used to find patterns in the human Genome. The most important philologists now at work may well be the classicists who have joined the field of computer science and are now laying the foundations on which all philological research will depend. Rising scholars such as David Smith, David Mimno, Ryan Gabbard, and Gabriel Weaver, originally trained as classicists, were unable to conduct work in machine translation, text mining and general natural language processing that is foundational for classical studies. We may not be able to imagine the shape that our field will assume in the centuries to come, but future change does not absolve us from the obligation to understand what is already possible. None of the PhD programs with which we are familiar has addressed the challenge of producing and supporting those scholars who can show us how to pursue the ancient goals of our field in the rapidly shifting technological spaces within which we live.
Third, we need new institutions to provide access to the results of our work. Neither the libraries nor the publishers of the early twenty-first century serve the needs that emerge in this collection. While libraries may survive and indeed flourish as an institution, they will do so by subsuming and transforming the functions that we entrusted to publishers in print culture. We need a small number of library-publishers that can help classicists produce new content and then maintain that content over time. And that content must include not only relatively static documents but, at least for now, a minimal set of executable code: every discipline will probably need at least some services that only experts in the field can create and that are part of the field’s core infrastructure. Morphological analysis and lemmatization, mentioned above, are fundamental processes that should be applied automatically to every digital word of Greek and Latin. Classicists may need to develop these systems, but the systems, once developed, need to be preserved as active services along with the XML texts, 2d images, GIS datasets and stable collections.
The seeds of these new organizations are visible in the Digital Knowledge Center in
the Johns Hopkins University Library system and the California Digital Library, but
we do not yet see in operation a mature model that can serve our needs in the present
and expand over time. The Perseus Digital Library thus still finds itself compelled
to maintain its own servers as best it can, maintaining services that were innovative
a decade ago but are still beyond the capacity of any systems with which we are
familiar. Google is moving very quickly in this vacuum. The academic library system
failed to address the legal, technical, and financial challenges of converting its
retrospective print holdings into digital form. Google Books is rapidly filling the
vacuum of collections and services that libraries left. Perhaps it was impossible for
our library system, rich in the aggregate, to organize itself. If so, libraries may
evolve into a handful of repositories, acting as wholesalers to provide the content
by which the Googles, Microsofts, Yahoos and their brethren support the intellectual
life of humanity. If the commercial world can generate revenue by providing access to
content that anyone can download, then the market may function well enough to provide
universal access.As part of this
fact-finding mission, Larry Page reaches out to the University of Michigan, his
alma mater and a pioneer in library digitization efforts including JSTOR and
Making of America. When he learns that the current estimate for scanning the
university library's seven million volumes is 1,000 years, he tells university
president Mary Sue Coleman he believes Google can help make it happen in
six.
If the source for this figure had imagined the ARL libraries alone
dedicating 1% of the $1,000,000,000 collections budget into digital conversion,
the $10,000,000 would pay for roughly 300,000 books per year or roughly 16 years
for 5,000,000 volumes with the Open Content Alliance Workflow. The library
community simply did not think that its retrospective collections were worth the
technical, political, and legislative trouble. It will be interesting to see how
many observers, a generation from now, will view the leadership of the early
twenty first century libraries with sympathy, much less admiration.
Odysseus and Teiresias, Homer
I see here the spirit of my dead mother; she sits in silence near the blood, and does not look upon the face of her own son or speak to him. Tell me, prince, how she may recognize that I am he.[145] So I spoke, and he straightway answered, and said:
Easy is the word that I shall say and put in your mind. Whomsoever of those that are dead and gone you shall allow to draw near the blood, he will tell you true things; but whoever you refuse, he surely will go back again.[150] So saying the spirit of the prince, Teiresias, went back into the house of Hades, when he had declared his prophecies; but I remained there steadfastly until my mother came up and drank the dark blood. At once then she knew me.
Priam and Achilles, Homer
Fifty sons I had, when the sons of the Achaeans came; nineteen were born to me of the self-same womb, and the others women of the palace bore. Of these, many as they were, furious Ares hath loosed the knees, and he that alone was left me, that by himself guarded the city and the men, him you slew, just now as he fought for his country, even Hector. For his sake have I now come to the ships of the Achaeans to win him back from you, and I bear with me ransom past counting. Nay, have awe of the gods, Achilles, and take pity on me, remembering your own father. See, I am more piteous far than he, and have endured what no other mortal on the face of earth hath yet endured, to reach forth my hand to the face of him that hath slain my sons.So he spoke, and in Achilles he roused desire to weep for his father; and he took the old man by the hand, and gently put him from him. So the two thought of their dead, and wept.
A new, digital infrastructure provides the explicit subject for this collection of
essays. We can create now collections that are larger than any Ptolemy or Cleopatra
could have imagined for their Alexandria. We have ever more sophisticated services
that can analyze and combine these collections in new ways and even to generate the
stuff of new knowledge. And the material systems on which these services are based
simply did not exist half a century ago and cost 100,000 times less now than they did
a quarter century ago.
But if everything that we use as a tool is different, nothing that we truly value is new. Like Odysseus in the Underworld, we bring blood to the shades and seek, insofar as possible, to let those who have gone before us to converse with us in their own words. All of us who have studied literature in the academy understand that we can never fully understand our subjects — the very notion of understanding implies a fixity that does not suit complex human beings, filled with contradictory impulses and defined as much by their changing potential for actions as by anything they have done in the past. Priam and Achilles communicate in a single language and understand the cultural backgrounds from which each comes but each of them crosses a gulf as great as that which any mere quantity of time and space can pose. Their moment together has no material effect upon the great events around them. Each will soon suffer a violent death. Troy will fall and a massacre will follow. But the moment above has been powerful for many audiences over the course of almost three millennia, perhaps all the more powerful for the violence that surrounds it.
The future of the past has never been brighter. The digital medium offers new methods with which to make Greco-Roman culture and classical Greek and Latin physically and intellectually accessible to audiences vastly larger and more diverse than was ever feasible in print. The culture of the Greco-Roman world and the languages of classical Greek and Latin can play a fuller role in the cultural memory of all mankind than ever before. The ideas and actions of those who lived in the Greco-Roman world and expressed themselves in Greek and Latin can begin to quicken hearts and fire minds that dream in Chinese, Hindi, and, in the end, every language of humanity.
Each of us brings to bear the skills that we have acquired during the time that we have on this earth. Those skills and periods of time vary. Generations pass. Technologies change. Nations rise and fall. Languages fade away and transform themselves beyond recognition. But the memory of classical antiquity has endured over the millennia. All of us who have dedicated our lives to this field — whether we struggle with new technologies or contemplate the record of the past in more traditional ways — are privileged in the subject that we have chosen. We composed these essays in sadness at the loss of our friend, Allen Ross Scaife, but we send them forth in hope as we contemplate the future that Ross helped to make possible.