Abstract
“Textual Communities” is a new system for managing and performing all aspects of
an online collaborative scholarly editing project. It permits mounting of
document images and offers page-by-page transcription and display, with the
facility for project leaders to recruit and manage transcribers and other
contributors, allocating and reviewing transcription work as it is done. Most
distinctively, Textual Communities is built on a comprehensive model of
scholarly editing, enabling both “document” (page-by-page) and “work”
(intellectual structure, or “entity”) views of the texts edited. Accordingly,
multiple texts of a single work, or part of a work (an entity) may be extracted
and compared, using an embedded installation of CollateX. While completely
conformant with Text Encoding Initiative guidelines, Textual Communities goes
beyond TEI and XML in its ability to handle multiple overlapping hierarchies
within texts. This paper will outline the thinking behind the development of
Textual Communities, and show examples of its use by several major projects.
Introduction: scholarly editions in the digital age
While one may dispute how “revolutionary” scholarly editions in digital form may
be as compared to their print counterparts [
Robinson forthcoming], we can agree that the onset of digital methods has considerably broadened
the ways in which editions may be made and distributed. We
[1]can now contemplate
editions made by wide collaborations of editors, transcribers, indexers,
commentators and annotators, all working across the internet. We are now
accustomed to seeing editions providing multiple interfaces, with distribution
ranging from strictly-controlled paid-for access to access open to anyone with
an internet connection.
These possibilities require us to ask new questions. In terms of the edition as
product, as something made: how can we most usefully characterize the
fundamental intellectual components of a scholarly edition in the digital
landscape? In this article, I consider this question in terms familiar to
scholarly editors: the concepts of document, text and work. In terms of the
edition as a process, as something we make and use: who are “we”? how may we
relate to each other, both as creators and as readers? Indeed, digital methods
open up a yet more radical possibility: that readers may become creators, as the
edition becomes an ever-open field of interaction, with readers contributing to
its continuing remaking. In what follows, I express these two perspectives as
“axes”.
This is not simply a matter of describing editions and how they may be made. We
have choices, more than ever, in the digital age, as to where we locate our
edition against these axes. An edition can be severely limited, or richly
substantive, both in terms of what it contains and who may use it and how they
may use it. In the latter part of the article, I describe an editorial
environment, “Textual Communities”, designed with these axes in mind. It goes
without saying that Textual Communities, like the editions it might make, is
subject to never-ending transmutation.
The axes of scholarly editions
Every scholarly edition may be placed on two axes, representing (as it were) the
longitude and latitude of editing. The first axis is along the familiar
continuum of document/text/work: is the edition devoted to a particular document
(as for a modern genetic edition of an authorial manuscript)? Or is it oriented
towards presentation of a work found in many documents (as for an edition of the
Greek New Testament, or a medieval work found in many manuscripts)? The second
axis is along the range of relationships between the editor and the editions’
audience. Is the edition made by a specialist scholar and intended for a narrow
and specialist audience, not to be read but to be used as a resource for further
scholarly work? Or is it made by a non-specialist and intended for the broadest
possible audience, to be read rather than studied? Does its design and
implementation permit its endless re-use, so that readers may in turn become
editors, or other editors may take it and reforge it for their own purposes and
audiences? Note too that as in geographical coordinates, the place of an edition
on one axis is independent of its place on the other axis. An edition intended
for the general reader may be based on all the documents, or on just one; an
edition intended for a specialist reader may also be based on all the documents,
or on just one. An edition based on a single document, or on many, might be
designed to permit other editors to take and repurpose what is made in ways
unforeseen by its original makers.
There is no novelty about these two axes. Editions made long before the advent of
digital methods may be referenced along these axes. However, digital methods
have widened the range of choices along each axis, and also altered the balance
between these choices, as they favour one choice over another. For example:
digital imaging has made it possible to make full-colour facsimiles for a very
low cost (as little as a few cents a page) and to distribute them over the
internet to any one with an internet connection and a computer, at no cost to
the reader. The emergence of successful methods for encoding a document
page-by-page combined with the availability of high-quality digital images has
favoured the making of a particular kind of edition, oriented towards the
document rather than the work. Along the other axis: the editorial decision
about the audience of the edition has been complicated by the emergence of
funding agencies as the primary sponsors of editions, and by the emergence of
centres substantially funded by these same agencies as the primary place of
making editions. In these circumstances, the primary motivation of an edition is
not to reach an audience, whether narrow or broad, but to satisfy the funder.
This has also led to the emergence of specialists in computer methods as leaders
in the making of editions, rather than specialists in (for example) the texts
themselves.
One of the most influential commentators on scholarly editing in the digital age,
Jerome McGann, has frequently cited William Morris’s observation that (in
McGann’s wording) “you can’t have art without resistance in the material.”
[2] There is danger in failing to resist: in
editing, as in art, to do what is easiest may not be to do what is best.
Following the scenario sketched in the last paragraph, we have seen the making
of numerous “digital documentary editions” [
Pierazzo 2011],
typically created with considerable funding and support from a digital
humanities centre [
Sutherland 2010] [
Maryland 2014]. The coincidence of
cheap digital images, powerful encoding, and significant funding has resulted in
several remarkable editions. However, there is a danger of imbalance, if many
editions are made in a narrow band along the document/text/work axis, and for an
ill-defined audience. Resistance in the materials entails not doing what is
easiest. In scholarly editing terms: resistance means deciding where on the
document/text/work continuum your edition should be located, not simply placing
it where the technology makes it easiest, and identifying what audience you wish
to serve, not simply satisfying your funder and your own inclinations.
Documents, texts and communicative acts
Accordingly, the first task of any editor is to know what is meant by the terms
document, text and work, and how the edition may be located with reference to
these terms. Over the last decades, several scholars (notably Peter
Shillingsburg, Paul Eggert and Hans Gabler; see footnote 1) have debated the
valency of these terms, and I have summarized their arguments and presented my
own definitions in two articles [
Robinson 2013a], [
Robinson 2013b]. In summary: a
document is the material object
upon which marks are inscribed: a manuscript, a book, a clay shard. A
text is
the communicative linguistic act which the reader deduces as present in the
document, represented by these marks. A
work is a set of texts which are
hypothesized as related, in terms of the communicative acts which they
present.
[3] Thus: the
Hengwrt manuscript is a document preserved in the National Library of Wales,
comprising some 248 folios bound into 32 quires. This manuscript contains a text
which we recognize as an instance of Geoffrey Chaucer’s
Canterbury
Tales. Further, we know another eighty three manuscripts and four
print editions dating from before 1500 containing texts of the
Tales, and (of course) many editions, adaptations and
translations dating from after 1500. We speak of the “work” as all these
together. In our common thought, we conceive the work as something beyond all
these physical documents: as the creative act conceived and executed by Geoffrey
Chaucer in the last decades of the fourteenth century.
These definitions have many implications. First: there is a clear division
between document and text. The text is not simply the marks in the document. It
is the communicative act that I, the reader, identify as represented by those
marks. The difference may seem slight. It is critical. When we record the text
of the document, we are not simply saying: that mark is an “i”, this next mark
is a “t”; we have “it”. We see first a set of potentially meaningful marks in
the document. We hypothesize: these marks may represent a communicative act.
[4]
They are not (say) marks left by grubs crawling under the bark of a tree [
Eggert 2010]. Someone made these marks to communicate something.
We then identify the language and writing system to which these marks belong. We
identify these marks as letters, composing the word “it”. As we examine the
marks, the communicative act takes shape in our minds: the words resolve into
sentences, into verse lines, paragraphs, as part of something we know as the
General Prologue of the
Canterbury Tales. This communicative act
has a double aspect. One aspect is the disposition of the marks in the document:
exactly where on the page they appear; the combination of strokes which compose
each letter. The other aspect is that of the components of the communicative
act: for prose, as constituted by a sequence of sentences within paragraphs; for
verse, as lines within stanzas. It is not simply a sequence of words: it is
structured, capable of division and labeling.
Normally, these processes of recognition happen so quickly, so instinctively,
that they do not appear like thought at all. We see a text on a page and we read
it. We think the marks on the page are the text, and hence that recording the
text is a mechanical act of transposing those marks from one medium (the page)
to another (now, usually, an electronic file in a computer). Accordingly one
sees statements implying (or indeed asserting) that a transcription can somehow
not be “interpretive”, and hence aspire to some kind of “objective” status. As
an example: in a discussion on the Text Encoding Initiative list in February
2014, several participants routinely described transcription of the
communicative act, in terms of paragraphs, sentences, identification of names
and places within the text, as “interpretive” (or “interpretative”), while
recording exactly where the text appears on the document page was described as
“non-interpretive”. The Text Encoding Initiative even has distinct elements for
the two types of transcription: “interpretive” transcripts are held in
<text> elements; “non-interpretive” transcripts are held in
<sourceDoc> elements.
[5] This distinction led Barbara Bordalejo to ask tartly in the
course of that discussion: “are you suggesting there are transcriptions that are
not interpretive? How do you distinguish them?”
Encoding text as document and as communicative act
In the definition of text here offered, all is interpretation: there is no such
thing as a “non-interpretive” transcript. Further, this definition stipulates
that text has two aspects: it is both marks upon paper (corresponding to TEI
<sourceDoc>) and it is the components of a communicative act
(corresponding to TEI <text>). Both aspects may be expressed as
hierarchies. The document hierarchy consists of the book, divided into quires,
divided into pages, divided into writing spaces: columns, lines, margins. The
components of the communicative act may also be expressed as chapters divided
into paragraphs divided into sentences, or poems divided into stanzas divided
into lines. The two hierarchies are completely independent of each other. The
General Prologue may be written across several quires, or contained in only one;
it may spread across as few as six folios, or as many as sixteen. Of course, the
hierarchies overlap. Paragraphs continue across page breaks, lines of verse
across line breaks. In the world of documents, this is no problem. The text of
the communicative act runs across quires, pages, line breaks in an orderly and
straight-forward manner. We are so used to this that we do not notice it. We
skip from page to page, across footnotes, past catchwords, page numbers, running
heads. Sometimes, the two hierarchies coincide. The book opens a new story, a
new chapter opens on a new page, a new section a new volume, before once more
the hierarchies diverge, and each runs their separate course, to the end of the
book and the end of the story. The codex, whether in scroll, manuscript or print
form, is superbly fitted to carry the text of communicative acts. A New
Testament gospel might fit neatly in a single small codex; the whole New
Testament in a larger one, or split across several codices. This overlapping,
this sliding of one hierarchy across another, this disposition of this printing
across many volumes in one instance, or in just one in another instance, is so
common that an editor might note it briefly, and move on.
But while it is straightforward to represent a communicative act in a document,
it is not at all straightforward to represent both the components of a
communicative act and of a document in a single electronic representation – and
particularly not in a single electronic representation which conforms to the
norms of the Text Encoding Initiative, the gold standard of encoding for
humanities texts. The XML (“eXtensible Markup Language”) specification requires
that content objects within an XML document conform to a single hierarchy.
Accordingly, it is a simple matter to represent either the document hierarchy
(books, quires, pages, lines) or the communicative act components hierarchy
(poem, stanzas, lines; story, chapters, paragraphs). But it is not at all simple
to represent both hierarchies in a single XML document.
[6] Over
the twenty years of encoding of texts of primary sources using the TEI
guidelines, scholars have used various devices to circumvent this problem. In
the “P3” version of the guidelines, the chapter on encoding of primary sources
suggests that one should represent the communicative act component hierarchy
exactly and fully, by identifying each part of the communicative act (each
paragraph, each verse line) with a discrete segment of the TEI document (thus, a
<p> or
<l> element), and then nesting the segments within other
segments, so that
<p> elements are contained within
<div> elements,
just as paragraphs are contained within chapters.
[7] XML (like its predecessor, SGML) is optimized for representing
a single “ordered hierarchy of content objects”: but it does also have a means
of recording other information about the encoded text, in the form of “empty
elements”, otherwise known as “milestones”. Accordingly, in a TEI document one
might record the communicative act hierarchy as the primary hierarchy, and then
represent the document hierarchy as a sequence of milestone elements:
<pb/> and
<lb/> elements for pages and lines. In this ordering, the
<pb/> and
<lb/> elements, unlike the
<div> and
<p>
elements, hold no content: they state where page breaks and line ends are
relative to the text of the communicative act within the document. The result is
that the components of the communicative act are represented completely, and one
may readily use all the tools available in the XML community, optimized for
dealing with hierarchies of content objects, to manipulate the document.
However, the material document which holds the text is represented far less
adequately. One might record the larger features of the document – the number of
pages within it, the number of lines within each page – and record too exactly
the page and line breaks that occur within the text. But it will be difficult to
represent more complex phenomena, such as a single page which contains multiple
writing spaces, each containing text in a complex relation with texts in other
writing spaces. Further processing of the document according to this second
hierarchy is complex, and often impractical.
The P3 fundamental logic – that one identifies the components of the
communicative act (sometimes referred to as “intellectual” or “logical”
structure) as the primary hierarchy of its XML representation, and record
document features as milestone elements – is used in countless TEI-based
encodings of primary textual materials, including several scholarly editions in
digital form (such as those made by myself or in which I was involved, e.g. [
Robinson 2004] and [
Shaw 2010]. This logic
prioritizes representation of the components of the communicative act over the
physical document, and so is well-suited to situations where the disposition of
the text in the document is either straight-forward, as in many medieval
manuscripts or printed books, so that it may be adequately captured through
sequences of page and line-breaks alone, or is perceived as relatively
unimportant. However, there is an important class of documents where the
disposition of the text on the page is both complex and significant. This
applies particularly to authorial manuscripts, where authorial revision is
expressed through multiple acts of writing within a page, from which editors
must construct a text or texts by decryption of the sequence of revisions
embedded in these multiple writings. In these cases, the P3 system is
inadequate. Further, continuing from the last decades of the last century,
scholarly editors have become increasingly interested in the “material text”,
following the ground-breaking writings of Donald McKenzie [
McKenzie 1999] and Jerome McGann
#mcGann1983,
and continuing through many others.
[8] Thus, it became increasingly
important to many scholars to represent as exactly as possible the document page
and the text upon it, with a fullness and precision which the P3 system could
not achieve. In response to this need, the TEI convened a working group to
prepare encodings for the making of documents where representation of the
document was paramount. This resulted in a new Section 11.2.2, first issued in
“version 2.0.0” of the P5 Guidelines in December 2011. This section introduced a
new high-level
<sourceDoc> element, specifically to carry the “embedded
transcription” described in this section. This “embedded transcription” is
described as “one in which words and other written traces are encoded as
subcomponents of elements representing the physical surfaces carrying them
rather than independently of them”. The examples and the accompanying
documentation make very clear exactly what is meant by this: that the marks upon
the page are interpreted as words completely independent of any sense of their
being part of a communicative act. Thus, the letters and words of the page are
placed within the page hierarchy, in a series of elements which may be nested
within one another: the page as
<surface>, which might contain a
<zonee> (a column, a writing area), itself containing
<line> and
<sege> elements, which might contain the words themselves. There is no
place here at all for recording information about the text as a structured
communicative act. Instead, the Guidelines suggest that information about the
text as communicative act should be recorded in a separate
<texte>
element, parallel to the
<sourceDoc> elements. In theory, this is a better
solution than the rather makeshift procedure adopted by P3. In practice, it is
extremely difficult to maintain two distinct transcriptions, and to maintain the
complex sets of links between the two.
[9]
The Shelley-Godwin archive shows the power of document-based encoding [
Maryland 2014]. A feature of this new encoding is that it
provides for explicit statement of the revisions within each document page and
their sequence. One can see (for example) exactly what Mary Shelley wrote, and
what Percy Shelley wrote. One can read the transcription in parallel to each
page facsimile, with each element of the transcription mirrored in transcript
and page: a considerable technical feat. However, what is excellent for these
materials – a classic instance of “genetic texts”, through which one may see the
authors (in this case) forging the text a phrase at a time – may not be
appropriate for other editions. While the TEI guidelines recommend that a
parallel encoding of the text-as-structured-communicative-act be made alongside
the encoding of the text-as-document, in practice editions may not follow this
advice, and indeed the Shelley-Godwin archive does not do this.
[10] The result is that
while one can see precisely the changes within each page, the failure to encode
the components of the communicative act within each document makes it extremely
difficult to see the changes between one document and another. Indeed, the rigid
segmentation of the document into pages makes it impossible to record a change
which spans across a page boundary. For example: one finds on fol 5v of Fair
Copy Notebook C1. c. 58 and on fol 73r of Draft Notebook B. c. 57 versions of
the end of chapter 22. But to locate these passages one is reduced to using the
search engine to discover parallel texts: not a very efficient procedure.
Representation of both work and document: the DET system
This defect brings us to the third element of the document/text/work triumvirate:
the work. The definition of work given above – that a work is a set of texts
which are hypothesized as organically related, in terms of the communicative
acts which they present – depends on identification of an instance of the
communicative act and its components in any one document (e.g. this is the
General Prologue in the Hengwrt manuscript) and then identification of other
instances of related communicative acts in other documents (this is the General
Prologue in the Ellesmere manuscript, in the Caxton printings, in the Riverside
Chaucer). Because we identify the components of the communicative act in any one
document, we can compare its instantiation in that document with its
instantiation in any other. If we reduce our notion of text to simply words in
documents, we have no means of asserting relations between documents apart from
the happenstance of some words recurring in different documents (as I was able
to use the Shelley/Godwin archive search engine to discover that folios in
different notebooks had similar words to the end of Chapter 22 in various print
editions). This is unsatisfactory, to put it mildly. It provides no means of
linking translations, or radical rewritings. We can assert (to give an extreme
example) that the many hundred manuscripts of the medieval
Visio
Pauli are related, through many recensions in many languages, many of
which have not a single word in common, because they share structure, subject,
theme, motifs and details, and because we can trace the historical growth of the
tradition across time and space and from document to document.
[11] I can assert that both the Sion and the
Merthyr manuscripts contain versions of the
Canterbury Tales,
even though there is not one line of the Tales in common to both (Sion holds the
sequence Clerk’s Tale-Summoner’s Tale, Merthyr has part of the Nun’s Priest’s
Tale and link), as surely as Darwin can assert that two multi-segmented
organisms are both barnacles, even though they have not a single segment in
common. Darwin can assert this by showing that both organisms descend from an
ancestor which had both sets of segments. I can show that both manuscripts
descend from other manuscripts which had both sets of tales.
According to these definitions, then, a fundamental requirement of scholarly
editing is that both aspects of a text are recognized: both the text as marks
upon a document, and the text as structured communicative act. To put it at its
simplest: we need to be able to say that the words “Whan that Auerill with his
shoures sote” are found in a particular space on the first folio of the Hengwrt
manuscript, Peniarth 392D in the National Library of Wales, and that these words
are also the first line of the General Prologue of the works we know as Geoffrey
Chaucer’s
Canterbury Tales. Over the last years, with the help of
many people, I have been developing a formal system for describing documents,
texts and works to enable just this. In essence: we need a scheme for labeling
all three elements, that will allow us to identify every part of each element,
in every document, text and work, and assert too how they relate to each other.
[12]
We call this scheme “documents, entities and texts” (DET), using the term
“entity” to refer to the unique labels we give each component of a communicative
act.
[13] The labeling system we employ is based on the
Kahn/Wilensky architecture [
Kahn 2006]. Like Kahn/Wilensky, we
use the familiar uniform resource name (“urn”) notation to hold each label.
Following Kahn/Wilensky, we separate the label into two parts: a naming
authority, and the name given by that naming authority to the object. Thus, in
Kahn/Wilensky the handle “berkeley.cs/csd-93-712” gives the naming authority as
“berkeley.cs”, and “csd-93-712” is the name given by that authority to a
particular object. In full urn form, this is expressed as
<URN:ASCII:ELIB-v.2.0:berkeley.cs/csd-93-712e>
In our system, we adopt
the use of “/” to separate the naming authority from the name, and we further
specify that the name must be composed of at least one of the key words ‘entity’
and ‘document’ and of one or more key value pairs, separated by the “:”
delimiter.
Applying this to a document:
TC:USask:CTP2/document=Hengwrt - indicates that the naming authority TC:USask:CTP2 has given this document the name “Hengwrt”
…/document=Hengwrt:Folio=1r – indicates Folio 1r of the Hengwrt manuscript
…/document=Hengwrt:Folio=1r:line=2 – indicates line 2 of Folio 1r of the Hengwrt manuscript
Applying this to an entity, that is to a named component of a communicative act:
TC:USask:CTP2/entity=Canterbury Tales – indicates that the
naming authority TC:USask:CTP2 has given this entity the name “Canterbury Tales”
…/entity=Canterbury Tales:Section=General Prologue – the
General Prologue of the Canterbury Tales
…/entity=Canterbury Tales:Section=General Prologue:line=1 – line 1 of the General
Prologue of the Canterbury Tales
In full urn notation, the document would be
<urn:DET:TC:USask:CTP2/document=Hengwrt>; the entity would be
<urn:DET:TC:USask:CTP2/entity=Canterbury Tales>
We have defined a “text” as a communicative act, which comprises both document
(the material upon which it is inscribed) and entity (the components into which
it might be divided). Accordingly, a text of any one communicative act in any
one document is the collocation of the entities and of the document for that
text. Thus, for the text of the Canterbury Tales in the Hengwrt manuscript:
TC:USask:CTP2/document=Hengwrt:entity=Canterbury Tales
For the text of the first line of the General Prologue on the second line of
folio 1r of the Hengwrt Manuscript:
…/document=Hengwrt:Folio=1r:line=2:entity=Canterbury Tales: Section=General Prologue:line=1
The power of this system should be immediately apparent. We can, from this naming
alone, identify all manuscripts which contain the
Canterbury
Tales; all manuscripts which contain the General Prologue; all
manuscripts which contain the first line of the General Prologue. Or, in
reverse: we can say, for any one manuscript, exactly what parts of the
Canterbury Tales it contains; we can say, for any page in any
manuscript, what lines of what part of the Tales it contains; we can say, for
any line or space in any page in any manuscript exactly what words of what part
of the Tales it contains. Note that the document and entity naming is completely
hierarchical: each successive object within the sequence of name/value pairs
must be contained within the preceding object. Line one is part of the General
Prologue, which is part of the
Canterbury Tales; the second line
is on Folio 1r which is part of the Hengwrt manuscript. Note too that the system
can cope with prose and other texts where communicative acts span across lines
and pages. Paragraph 162 of the Parson’s Tale in the Corpus Christi 198
manuscript of the
Tales begins on line 36 of folio 272r, and
continues on the first two lines of folio 272v. This can be represented as
follows:
…/document=Corpus:Folio=272r:line=36:entity=Canterbury Tales: Section=Parson’s Tale:Segment=162
…/document=Corpus:Folio=272v:line=1:entity=Canterbury Tales: Section=Parson’s Tale:Segment=162
…/document=Corpus:Folio=272v:line=2:entity=Canterbury Tales: Section=Parson’s Tale:Segment=162
Implementing DET: Textual Communities
Theory is one thing; implementation is another. The basic outline of this scheme
was prepared by myself, with advice and help from Federico Meschini and Zeth
Green, in 2008-2009, and presented first by myself and Green at a symposium on
Collaborative Scholarly Editing in Birmingham in 2009, and then by Meschini and
myself in a paper presented to the ADHO conference in London in 2010. Following
suggestions at a meeting of the InterEdition project in Pisa in 2009, we first
experimented with expressing this scheme in the form of an ontology. This was
successful, as proof of concept: we could indeed connect documents, entities and
texts via RDF classes and properties.
[14]
This helped persuade us that the concept was fundamentally sound. However,
implementation of even a basic working prototype would take considerable effort
and resources. In late 2010 I moved from the University of Birmingham, UK, to
the University of Saskatchewan, Canada, and a considerable motive was the
prospect of adequate funding to create a real editing system, based on these
concepts. Such a system was needed also to support my own editorial work,
particularly on Geoffrey Chaucer’s
Canterbury Tales.
With funding initially from the University of Saskatchewan (2010-2011), then from
the Canada Foundation for Innovation (2011-2014) and now from the Canadian
Social Sciences and Humanities Research Council (2014-), we have made a
collaborative editing environment, “Textual Communities”, built on the
documents, entities and texts definitions here explained. A full technical
description of the components of the Textual Communities environment is beyond
the scope of this article. In brief:
- Although the DET system is designed to support full hierarchies for both
document and communicative act, and has no difficulties with overlapping
hierarchies, we based the system on TEI encodings which cannot support
overlapping hierarchies. Partly, this was because we had many thousands of
pages of transcription already in TEI encoding. Also, we knew from years of
experience that use of the P3 primary text model, encoding the communicative
act hierarchy as the main hierarchy and recording the document hierarchy
through milestones, could yield useful results.
- Our first intent was to work through RDF, and so create an RDF repository
of materials accessible via SPARQL and other RDF tools. Very quickly, we
realized that the RDF tools then available could not support our aim of a
large real-time editing environment. (This may change as new RDF tools are
developed.) The tools were immature and did not scale well, and we had
significant performance issues even with small amounts of text. Hence, we
moved to use of a relational database for back-end storage of all data.
Decades of development have made relational databases robust, responsive and
fast, with a multitude of tools for management and for web server
interfaces. We are currently moving from a relational database to a MongoDB
system. JSON (Javascript Object Notation) has become our central internal
representation of data, and the optimization of MongoDB for JSON objects
maps well to our data.[15]
- The core of our implementation of the DET scheme with a database backend
is the use of the TEI <refsDcl> element to map any TEI document to a
DET scheme. Here is a fragment from a refsDcl declaration for the Canterbury
Tales project:
<cRefPattern matchPattern="urn:det:TC:USask:CTP2/entity=(.+)" replacementPattern="#xpath(//body/div[@n='$1'])"></cRefPattern>
- Here, the “replacementPattern” attribute declares that every top level
<div> element within the document is to be mapped to an entity. The
entity will be given the name of the ‘n’ attribute, thus
<div n="General Prologue">
is associated with the entity name “General Prologue”. The
matchPattern attribute declares exactly what entity this ‘n’ attribute will
be associated with: here, the entity itself. Taken together, the system now
understands that when it sees <div n="General Prologue"> as a top
level <div>, it associates that <div> and its contents with the
entity “urn:det:TC:USask:CTP2/entity=Canterbury Tales”. In essence: these
refsDcl expressions are used to slice the whole document into entity and
document chunks, to associate each chunk with a document and entity name,
and hence each chunk of text in the document is linked to its document and
entity and this information stored in the database.[16]
Through this system, we have now been able to store some 40,000 pages of
manuscript transcription and images in the Textual Communities implementation at
the University of Saskatchewan: see www.textualcommunities.usask.ca
(particularly, for the Canterbury Tales project, see
http://www.textualcommunities.usask.ca/web/canterbury-tales/viewer). This is now
being used routinely by editors and transcribers in six substantial
projects.
[17] We are currently testing and refining the system
before full public launch.
In its current form, Textual Communities does not (and cannot) go as far as we
want towards a full representation of both aspects of the text, as document and
as communicative act. This limitation arises from our use of the TEI as the base
form for text representation. While the structure of each communicative act can
be fully represented in XML, and hence in Textual Communities, the ability of a
single XML document to represent only one primary hierarchy means that because
our documents in Textual Communities choose to make the structure of the
communicative act the primary hierarchy, then we are limited in our
representation of the document hierarchy. We use (as do most TEI projects
representing documents) sequences of the omnipresent
<pb/>,
<cb/>
and
<lb/> elements to represent the document hierarchy, and for the great
majority of our documents and our purposes, this gives satisfactory
results.
[18] Note that the
limitation is not in the DET scheme, and indeed the current move of Textual
Communities to a JSON-based architecture will also remove this limitation within
the system “backend”.The problem then will lie only in the XML structures we are
currently using in the editorial and display interfaces. However, despite this,
Textual Communities goes further than any other online collaborative editing
environment known to me in its support for both the document and structured
communicative act aspects of the text. Everyone of the more than thirty editing
systems listed by Ben Brumfield at
http://tinyurl.com/TranscriptionToolGDoc is
either very limited in its support for recording communicative act components,
or supports page-based transcription only.
[19]
The second axis of scholarly editions: editors and readers
The definitions of document, text and work here offered, as the foundation of the
DET system, are valid for any text of any period, and might have been offered by
any scholar at any period. Documents vary in form, from inscription on stone to
computer file, but these definitions and these relations hold, no matter what
the medium. The digital age and the stringent mandates of computing systems
require that they be defined more precisely than before: but if the concepts are
valid, they were valid long before the invention of the computer. As earlier
remarked, the advent of digital methods has favoured the making of some kinds of
edition over others, but the fundamentals have not changed. However, this is not
the case for the second axis of scholarly editions: the range of relationships
between the makers of editions and their audiences. This has not just altered
the balance between one kind of edition and another. It has created many more
kinds of relationship between editor, edition and audience, enabling radically
new kinds of edition.
It is now fundamental to the web that every reader may be a writer. The rise of
social media means that communication across the web happens in every direction:
now, routinely, every newspaper article on the web comes with a comments
section, often more interesting than the article itself (though, mostly, not).
The rise of “crowd-sourcing” leverages individual activity into collective
movements: crowd-sourcing has produced remarkable results in areas as diverse as
investigating the expense claims of politicians to transcription of museum
labels. There have been several ventures into the use of crowd-sourcing for
editorial purposes, notably the Transcribe Bentham and Easter 1916 projects
[
Causer 2012]; [
Trinity College Dublin 2014] .
Crowdsourcing raises the possibility of editions which are not made by a single
editor, or even a group of editors, but by many people who may have no formal
relationship with each other, and indeed nothing in common except a shared
interest in a particular text. Further along the editor/audience axis, the ways
in which an edition may be distributed to its readers and used by them are also
vastly changed. While many readers may want just to read a text, others may wish
to take the text of a document, add information to it, alter it, enrich it,
correct it, combine it with other texts, and then republish it. Others in turn
might take up this republished text and alter it still further, in a
never-ending chain.
It has to be said that the scholarly editing community, up to now, has been very
slow in responding to these new potentials. With respect to crowdsourcing: the
Bentham and Easter 1916 enterprises are not “crowdsourced” editions. Rather, the
framework of each edition, the flow of work, and all significant decisions
concerning transcription systems and the distribution of the product of the
editions are made by a small group of academic editors, as has always been the
case for scholarly editions. Any reader is invited to contribute transcriptions,
and the Easter 1916 system allows readers to go further, and contribute their
own documents. But the reader’s role is strictly limited, and the Bentham
project even prevents the reader changing a transcript he or she has made after
it has been “locked” by an editorial supervisor. T-Pen and other systems do
offer much more freedom to the editor, but at the cost of a very limited
encoding of the structure of communicative acts. Further, almost all scholarly
editing digital projects severely restrict how their output might be used. The
Jane Austen Fiction Manuscripts project [
Sutherland 2010] has a
whole page bristling with restrictions: the site and everything on it is
protected by copyright, no derivative works are allowed, the editor asserts her
“moral right to be recognized as author and editor of aspects of this work” (it
is not explained what “aspects” mean), “individual, non-commercial” use is
permitted, but “All other use is prohibited without the express written consent
of the editor. Any requests to use the work in a way not covered by this
permission should be directed to the editor.” Indeed, most projects, while not
going so far as the Jane Austen project, do invoke the “non-commercial” clause
of the Creative Commons license. The effect of the “non-commercial” restriction,
as many have observed [
Möller 2005], is not just to restrict the
republication of online materials by commercial publishers: it is actually to
make it nearly impossible for anyone, commercial or non-commercial, individual
or corporate, to republish those materials on the web. The problem is the
ambiguity of what is “commercial”, what is “non-commercial”, in the web. If you
publish your edition in a university website, which also sells university
services, then it might be deemed commercial. If you publish it in your own
website and this happens to provide links to other sites which sell anything, or
even just belong to a commercial entity, then it might be deemed commercial –
even if you and your edition did not create those links. For these reasons, many
people will not touch any materials covered by the “non-commercial” license, in
any circumstances. Indeed, among the many online digital materials which have
been created by humanities scholars very few are both free of the non-commercial
restriction, and actually made readily available for free re-use and
re-publication.
Editions by everyone, for everyone
Full realization of the possibilities of readerly involvement in the making and
use of editions depends on the materials being available for re-use and
re-publication without restriction. This applies at both ends of the edition
spectrum. One cannot reasonably expect that people who contribute transcriptions
and other materials to an edition will be willing to do so if they cannot make
use of their own transcriptions — or indeed, if they see that the editors are
limiting the ways the transcriber’s work may be used, for the benefit of the
editors. And of course, if you cannot be sure that you can freely distribute
your own work on materials derived from an edition, then you will likely look
elsewhere. For this reason, Textual Communities mandates that any materials
created on the site must be made available under the Creative Commons
Share-Alike Attribution licence (CC SA-BY); all software created by the project
is also available as open source at
https://github.com/DigitalResearchCentre.
[20] The “attribution” requirement mandates that anyone
who worked on the creation of the materials must be acknowledged, at every point
along the publication chain. The “share-alike” requirement mandates that the
materials, no matter how altered, must be made available under the same terms as
they came to the user. This does not prevent a commercial publisher taking the
materials, altering them, and then making them available as part of a paid-for
publication: just that somehow, the publisher must make those altered materials
available, for free (for example, by deposit on a public webserver). Further,
Textual Communities provides an open API (Application Programmer’s Interface)
that makes it possibly for anyone with reasonable computer skills to extract any
texts from editions within the Textual Communities system in just a few lines of
code (see
http://www.textualcommunities.usask.ca/web/textual-community/wiki/-/wiki/Main/The+API+Basics).
However, requiring that materials be made freely available is pointless if those
materials cannot be made in the first place. Some two decades into the digital
revolution, and people routinely send emails, create Word documents,
spreadsheets and Facebook pages. But paradoxically, it is no easier (and indeed,
arguably much harder) to create a scholarly edition than it was two decades ago.
It is certainly harder if one takes the TEI advice, to create two transcriptions
corresponding to the two aspects of text as document and text as structured
communicative act and link them together. Even in cases where one is not going
to make two parallel transcripts: the renowned complexity of the TEI guidelines,
and the webs of software and hardware needed to turn a TEI document into an
online publication continue to require that anyone who wants to make a scholarly
edition in digital form must both master a range of special skills and have
access to considerable technical resources. The effect has been to limit
drastically the number of people who can make scholarly editions in digital
form: in effect, to relatively few people typically at a few digital humanities
centres. Hence, a key aim of Textual Communities is to make it possible for
scholars to make digital editions without having to master more of the TEI than
is necessary for their particular task, and with no need for specialized
technical support. Further (perhaps over-ambitiously) we would like it to be
possible for an edition made with this system to be placed anywhere along the
range of relationships between the makers of editions and their readers. This
means that as well as make it as easy as we can to use, we need to support all
this range. Thus, in Textual Communities one can create a community where
everyone is an editor, everyone can freely change what everyone else does, and
everyone can take whatever is done and use it in any way they wish. Or, an
editor can allow only the people he or she invites to collaborate in making the
edition, and can insist that every page published on the edition must be
approved by an editor before publication.
In order to support this range of roles, and to encourage community building and
partnerships, Textual Communities is based on social media software,
specifically on the LifeRay implementation of the Open Social software suite,
itself used by Google as a foundation of its “Google Plus” social
network.
[21] This allows every community
within Textual Communities to have its own Wiki, Blog, Bulletin Board and Chat
facility. The screenshot below shows how Textual Communities appears to a user
within the Canterbury Tales community:
At the top of the screen, the “Wiki”, “Blog” and “Bulletin Board” links take the
reader to the wiki, etc., for this community. To the right, we see an image
above of the manuscript page; below that, the transcription of this page, in the
last version saved by the transcriber. Notice that the “Compare with” tool
allows the transcriber to compare different revisions of the transcription. This
system does not attempt to hide the XML: we think it helpful for the editor and
transcribers to see exactly what encoding is being applied to the document. Nor
have we had any difficulty with transcribers at any level, including
undergraduates, understanding and using the XML we use in these page transcripts
(observe that, as a fully-compliant TEI implementation, any valid XML may occur
within the transcripts). Note too the use of explicit <lb/> elements at
the beginning of every manuscript line to structure the document. The buttons at
the base permit the transcriber to preview the document, showing it without the
XML and formatted for ease of reading, to save the transcript, and carry out
various other editorial activities (including, “Link Pages”, which allows the
editor to connect text which flows across the page boundaries).
At the left of the screen, you can see the table of contents, showing the
document page by page. This table of contents is generated directly by Textual
Communities from the XML, following the schema for document elements set out in
the <refsDcl> element. If you click on the “By Item” tab, the table of
contents changes:
Now, for the document Hg, we can access its contents by entity: that is, by the
components of the communicative act. Thus, it first contains the General
Prologue, which itself contains a sequence of line entities: first the initial
rubric (“IRE”) and then the first and following lines. Again, the information
about the entities is generated directly by Textual Communities from the XML,
following the schema set out for the textual entities in the <refsDcl>
element. Finally, the “Collations” interface allows the reader to see the
collation of the text in all the documents.
The collation here is built on the CollateX system, here extended by the addition
of regularization and other facilities for adjusting the collation.
Conclusion
Textual Communities, like any computing system, is in constant development. We
have not yet announced it publically, and will not do so until we are fully
satisfied as to its robustness and usability. We are also aware that the demands
of comparatively few users (six major communities, some one hundred and fifty
active transcribers and editors) already place considerable strain on the
current installation, on a virtual server at the University of Saskatchewan. We
are both translating the system to the MongoDB backend and moving it to the
Compute Canada cloud (one of the first Digital Humanities projects to be hosted
on this service, hitherto devoted to “hard science” data). However, it is not at
all our aim that the whole world, or at even a significant part of the whole
scholarly editing community, should use Textual Communities. We are more
concerned that the concepts behind Textual Communities should be promulgated.
Firstly, we argue that scholars understand the reasoning behind the
text/documents/entities division, with its insistence on the double aspect of
the communicative act present in any textual document. Secondly, we argue that
it should be possible for any textual scholar to make an edition, with a minimum
of specialist computing and encoding knowledge and technical support. This
requires that computing systems need to respond to the needs of scholars, rather
than scholars restrict their editions to what computer systems can support.
Thirdly, scholarly editing, perhaps more than any other area of the humanities,
is uniquely positioned to profit from the social web. Scholars and readers may
engage together in understanding documents, the texts they contain, and the
complex histories of works which they compose. If we are able to move our
discipline a small way in those directions, we will have done well.
Notes
[1] Although I am the sole author of this paper, at several points I speak of “we”.
Many people have contributed to the thinking behind this paper, and to the Textual
Communities system which seeks to implement that thinking. To name a few: the
discussion of text, document and work is deeply indebted to many discussions over
many years with Peter Shillingsburg, Paul Eggert, Hans Walter Gabler, David Parker
and Barbara Bordalejo, among others (see, for example, [Shillingsburg 2007]; [Eggert 2009]; [Gabler 2007]; [Parker 1997]; and particularly the
collection of essays in [Bordalejo 2013].This does not mean that any
of these scholars agree with the definitions I offer: the best I can hope for, is
that I have learned enough from them that they might disagree with me less vehemently
now than they would have a few years ago. Here, “I” means “I”: those scholars are
not responsible for my opinions. However, the creation of Textual Communities has
been fully collaborative. The main implementation at Saskatchewan since 2010 has
been the responsibility of Xiaohan Zhang, with some parts written by myself and Erin
Szigaly. Throughout, we have consulted with Zeth Green (Birmingham) and Troy
Griffitts (Münster), who have contributed key insights and questions. The major
projects using Textual Communities, named in footnote 17, have been an invaluable
testing ground for Textual Communities. I am embarrassed for their long suffering
over the last years, and grateful for their patience, support and encouragement.
Finally, this article has benefitted significantly from the comments of many readers
on a draft version posted on GoogleDocs in May 2015. I have indicated each place
where I introduced a change suggested by one of these readers, and acknowledged the
person. I also thank Torsten Schassan for instigating the discussion, and for his
many corrections. [2] So quoted by McGann #mcGann1983. For the
source of the Morris citation, and the relationship of McGann’s formulation
to Morris’s words, see [Noviskie 2013]. Noviskie traces the
quotation to Sparling 1924. [3] Compare the well-known FRBR (Functional Requirements for
Bibliographic Records) Group 1 entities: “work, expression, manifestation,
item” [IFLA 1998]; [Tillett 2004]. “Work” as I
define it maps broadly to FRBR “work” (but might in different circumstances
map to expression or manifestation); a document maps to an item. There is no
equivalent in FRBR to “text” as I explain it. The system here presented
extends, rather than replaces, FRBR. Particularly, this system enables works
(and hence texts) to be seen as structured objects, readily susceptible to
fine-grained manipulation, in ways that FRBR does not. [4] This discussion of the stages of recognition of the text as a
communicative act is indebted to the description by Barbara Bordalejo of the
encoding system in the Prue Shaw edition of the Commedia, thus: “in this
article, I use the phrase the ‘text of the document’ to refer to the
sequence of marks present in the document, independently of whether these
represent a complete, meaningful text. That is: the reader sees a sequence
of letters, occurring in various places in relation to each other (perhaps
between the lines or within the margins) and carrying various markings
(perhaps underdottings or strikethroughs). These make up what I here refer
to as the text of the document” [Bordalejo 2010]. [5] For example: an email by James Cummings to the
TEI-L discussion on 5 February 2014 which speaks of encoding “both the
interpretative <text> view and non-interpretative <sourceDoc>
view” (http://permalink.gmane.org/gmane.text.tei.general/16892). Compare the
prioritization of “document-based editing” over other kinds of editing
argued by Hans Gabler [Gabler 2007], while noting that
Gabler has always argued consistently that editors must also present the
work (e.g., [Gabler 1984], [Gabler 1990]. [6] Among many
contributions to the discussion of overlapping hierarchies in mark-up
languages: see the original statement of “the OHCO thesis” in De Rose et al.
[DeRose 1990] and its restatement and complication in
Renear et al. [Renear 1993]. See too footnote 15. [7] The author, while not
formally a member of the workgroup (“TR9”) on “Manuscripts and codicology”
which was charged with drafting the chapter on representation of primary
sources in the “P3” guidelines (first published in 1994), wrote most of the
draft of what became Chapter 18 “Transcription of Primary Sources” in those
guidelines (see http://www.tei-c.org/Vault/GL/P3/index.htm). This chapter
persisted in revised form into the “P4” version, first published in 2002,
before finally being replaced by Chapter 11 of the first “P5” version in
2007. [8] To name a few: [O’Keefe 2006]; [Bornstein 2001]; and a whole
European Society for Textual Scholarship conference on “Textual Scholarship
and the Material Book: Comparative Approaches” in London in 2006. The
emergence of the document as the locus of scholarly attention now has a
name: “material philology”. See too [Nichols 1990], and
Matthew K. Driscoll “The Words on the Page”, distilling talks given by him
around 2005-2007 and available at
http://www.driscoll.dk/docs/words.html. [9] I know of only one substantial
project that attempts this parallel encoding: the Goethe Faust project, [Brüning 2013]. [10] The
Shelley-Godwin archive creators were fully aware of the arguments for
encoding both “document” and “text”, and canvas these in [Muñoz 2013], while confessing themselves unable to
implement satisfactory encoding of both aspects. [11] For the
complexities of the Visio Pauli tradition see [Robinson 1996]. [12] I was aware, at an early stage, of the work of the Canonical Text
Services (CTS) group, and studied their system closely while devising the
system here devised [Blackwell 2014]. Briefly, this system
is highly compatible with CTS, in that every CTS reference may also be
expressed with no loss of information. However, the reverse is not true.
This system includes completely hierarchical information for both document
and communicative act, permitting full specification of both document space
(a line within a page within a volume) and communicative act component (a
word within a sentence within a paragraph within a chapter) to a degree that
CTS does not enable. Further, CTS does not use the key/value pair
architecture, nor does it specify the naming authority. It is in essence a
labelling scheme, with some hierarchical elements, and relies on external
index files to correlate (for example) text segments with the manuscripts in
which they appear. For example:
https://github.com/homermultitext/hmt-archive/blob/master/cite/collections/scholiainventory.csv
includes the line
"urn:cite:hmt:scholia.379","urn:cts:greekLit:tlg5026.msA.hmt:1.6","6th main
scholia of Iliad
1","urn:cite:hmt:chsimg.VA012RN-0013@0.57311951,0.24827982,0.22451317,0.04644495","urn:cite:hmt:msA.12r".
It appears that this links the “6th main scholia of Iliad 1” with the urn
“urn:cts:greekLit:tlg5026.msA.hmt:1.6”, and further with the manuscript
urn:cite:hmt:msA.12r, presumably page 12r of “MsA”. In contrast, the system
here described would yolk these statements together into a single URL, such
as “…/document=MsA/folio=12r/entity=Scholia/Book=1/n=6”, from which the full
page and communicative act hierarchies could be deduced. [13] The term “entity” is used here in preference to “work” for several
reasons. Firstly, as the examples show, the term entity may be applied to a
structured object of a communicative act at any level: a single line of
Hamlet may be an entity; so too a single scene, an act,
and the whole play itself are also entities. Secondly, the term “work”,
hotly contested in textual scholarship [Robinson 2013b],
comes with many connotations which might not be helpful in understanding the
system here proposed: “entity” has the advantage of neutrality. “Entity” is
also familiar from FRBR, which is built on the categorization of
relationships among “entities”: distinct intellectual objects, analogous to
the distinct components into which an act of communication may be
structured. [15] As part of the move to MongoDB and JSON storage,
Zeth Green, Xiaohan Zhang and myself reviewed how the document and
entity hierarchies relate to the text formed from the collocation of the
two hierarchies. We realized that it was possible, using a JSON-based
architecture, to support not just two hierarchies for any text, but any
number of hierarchies, thereby avoiding any difficulties with
overlapping hierarchies. As of November 2016, we are still developing
this new architecture. Recent articles by computer scientists
demonstrate increasing awareness of the need to move beyond a “document
paradigm”, with its reliance on hierarchical content models, to systems
which will natively support the multiply overlapping information schemes
we find in actual texts and their physical instances: thus [Schmidt 2009] and, especially, [Schloen 2014]. At present and for the near future, however, XML in the TEI
implementation remains crucial to our work, both because it keeps us
close to a wide community of scholars working with digital editions and
because of its sophisticated validation facilities. [16] This paragraph
describes the procedure used for designating document and entity parts
in the first version of the Textual Community system. In the second
version, we drastically simplified this: now, any TE/XML element with an
“n” attribute becomes an element in either the document or entity
hierarchy.
[17] The six projects are: at the University of Saskatchewan, the
Canterbury Tales Project (with KUL, Belgium), led by myself and Barbara
Bordalejo of KUL, 30,000 pages and 30 active transcribers; the John Donne
Digital Prose Project, led by Brent Nelson, 1800 pages and 25 active
transcribers; the Recipes Project, led by Lisa Smith, 1500 pages and 30
active transcribers; the Incantation Magic Project, led by Frank Klaassen,
400 pages and 10 active transcribers; at the University of Birmingham, UK,
the Estoria de Espanna project, 3500 pages and 20 active transcribers; at
City University New York, the Teseida Project, led by Bill Coleman and
Edvige Agostinelli, 600 pages and four transcribers. Numerous other projects
are also using the Textual Communities system, although it has not been
publically launched.
[18] This use of <pb/>, <cb/> and <lb/> cannot
cope with instances where the flow of lines on a page is disrupted by, for
example, multi-line marginalia or annotations. We could use
<milestone/> elements to mark out such instances.
[19] Brumfield does not mention two
powerful systems for the creation of fully TEI-compliant XML documents in
collaborative environments: TextGrid (www.textgrid.de/en/) and eLaborate
(www.elaborate.huygens.knaw.nl/login). Both will support the making of the
same complex TEI-XML documents as Textual Communities, but neither offers
the same native support for both page-based and “text-based” transcription
as does Textual Communities. Support for transcription by page, the best way
to apportion transcription of full manuscripts among transcribers, is
particularly crucial, and fully supported by Textual Communities.
[20] This sentence provoked a
lively discussion in the Google Docs forum on the draft. Three commentators,
Andrew Dunning, Hugh Cayless and Laurent Romary, questioned the need for the
‘SA’ condition. The nub of the problem is the expression of the ‘SA’
condition in Creative Commons and other ‘copyleft’ licenses, which insists
that all further share-alike be under the same terms as the granting terms.
This leads to a problem when a site wishes to mix together materials
licencsed under different share-alike flavours: this cannot be done. Thus,
although SA is conceived as a guarantee of continued open access, in
practice it has become a very real restriction, inhibiting the free re-use
we seek [Wiley 2007]. Accordingly, many recent open-access
sites have dropped the SA requirement, and Textual Communities is likely to
follow this lead. [21] The second version of Textual Communities has abandoned the
LifeRay environment here described: in practice, LifeRay has considerable
difficulties, not least its vulnerability to “spam bots”, which routinely
dump spam within LifeRay documents.
Works Cited
Bordalejo 2010 Bordalejo, Barbara. “Appendix C:
The Encoding System”, In Prue Shaw (ed.) Dante Alighieri. Commedia. A
Digital Edition. Scholarly Digital Editions, Birmingham and Sismel,
Florence (2010).
Bordalejo 2013 Bordalejo, Barbara (ed.)
Work, Text and Document in the Digital Age,
Ecdotica, 10 (2013).
Bornstein 2001 Bornstein, George.
Material Modernism: The Politics of the Page. Cambridge
University Press, Cambridge (2001).
Brüning 2013 Brüning, G, Henzel K., and
Pravida, D. “Multiple Encoding in Genetic Editions: The Case of ‘Faust’”.
Journal of the Text Encoding Initiative [Online]. URL:
http://jtei.revues.org/697 Causer 2012 Causer, T., Tonra, J. and Wallace, V.
“Transcription maximized; expense minimized? Crowdsourcing and editing The
Collected Works of Jeremy Bentham”, Literary and Linguistic
Computing 27, pp. 119-137 (2012).
DeRose 1990 De Rose, Steven, Durand, David,
Mylonas, Elli, Renear, Allen. “What is Text, Really?”, Journal of
Computing in Higher Education 1(2), pp. 3-26 (1990).
Eggert 2009 Eggert, Paul. Securing the Past:
Conservation in Art, Architecture and Literature. Cambridge
University Press, Cambridge (2009).
Eggert 2010 Eggert, Paul. “Text as Algorithm and
as Process”. In W. McCarty (ed.) Text and Genre in Reconstruction: Effects
of Digitalization on Ideas, Behaviours, Products and Institutions.
Open Book Publishers, Cambridge, pp. 183-202 (2010).
Gabler 1984 Gabler, Hans Walter. “The Synchrony
and Diachrony of Texts: Practice and Theory of the Critical Edition of James
Joyce’s Ulysses”. Text, 1, pp. 305–26 (1984).
Gabler 1990 Gabler, Hans Walter. “Textual Studies
and Criticism”, The Library Chronicle, The University of Texas at
Austin, pp. 151–65 (1990).
Gabler 2007 Gabler, Hans Walter. “The Primacy of
the Document in Editing”, Ecdotica, 4, pp. 197–207 (2007).
IFLA 1998 Functional Requirements for
Bibliographic Records. IFLA Series on Bibliographic Control. Munich,
K. G. Saur (1998).
Kahn 2006 Kahn, Robert and Wilensky, Robert. “A
Framework for Distributed Digital Object Services”, International Journal
on Digital Libraries 6(2), pp. 115–123 (2006).
McGann 1983 McGann, Jerome J. A Critique of
Modern Textual Criticism. Chicago, Chicago University Press
(1983).
McKenzie 1999 McKenzie, Donald F.
Bibliography and the Sociology of Texts (The Panizzi Lectures,
1985). Cambridge University Press, Cambridge (1999).
Nichols 1990 Nichols, Stephen (ed.) “The
new philology”, Special issue of Speculum: A Journal of Medieval
Studies, LXV (1990).
O’Keefe 2006 O’Keefe, Katherine O’Brien.
Visible Song: Transitional Literacy in Old English Verse.
Cambridge University Press, Cambridge (2006).
Parker 1997 Parker, David C. The Living Text
of the Gospels. Cambridge University Press, Cambridge (1997).
Pierazzo 2011 Pierazzo, Elena. “A Rationale of
Digital Documentary editions”, Literary and Linguistic Computing,
26 pp. 463-477 (2011).
Robinson 1996 Robinson, Peter M. W. “Is there a
text in these variants?”, In R. Finneran (ed.) The Literary Text in the
Digital Age. University of Michigan Press, Ann Arbor, pp. 99-115
(1996).
Robinson 2004 Robinson, Peter M. W. (ed.)
Geoffrey Chaucer. The Miller’s Tale on CD-ROM. Scholarly
Digital Editions, Leicester (2004).
Robinson 2013a Robinson, Peter M. W. “Towards A
Theory of Digital Editions”, Variants, 10, pp. 105-132
(2013).
Robinson 2013b Robinson, Peter M. W. “The
Concept of the Work in the Digital Age”, In Barbara, Bordalejo (ed.) Work,
Text and Document in the Digital Age, Ecdotica, 10, pp.
13-41 (2013).
Robinson forthcoming Robinson, Peter M.
W. “The Digital Revolution in Scholarly Editing”, Ars Edendi Lecture
Series, IV, Stockholm University, Stockholm.
Schloen 2014 Schloen, David, Schloen, Sandra.
“Beyond Gutenberg: Transcending the Document Paradigm in Digital Humanities”,
Digital Humanities Quarterly, 8(4) (2014).
Schmidt 2009 Schmidt, D. and Colomb, R. “A Data
Structure for Representing Multi-version Texts Online”, International
Journal of Human-Computer Studies, 67, pp. 497-514 (2009).
Shaw 2010 Shaw, Prue. Dante Alighieri. Commedia. A Digital Edition. Scholarly Digital Editions,
Birmingham and Sismel, Florence (2010).
Shillingsburg 2007 Shillingsburg, Peter.
From Gutenberg to Google. Cambridge University Press,
Cambridge (2007).
Sparling 1924 Sparling, Henry H. The
Kelmscott Press and William Morris, Master Craftsman. London (1924).
Tillett 2004 Tillett, Barbara. FRBR: A
Conceptual Model for the Bibliographic Universe. Library of Congress
Cataloging Distribution Service (2004).