Peter Johnston completed his Ph.D. on J.M. Coetzee and mathematics in 2013, at Royal Holloway, University of London. With wider interests in the intersection between literature, mathematical philosophy, and digital humanities, he is currently working on a stylometric analysis of the works of David Foster Wallace.
This is the source
Though better known for the novels that eventually led to his being awarded the Nobel Prize for Literature in 2003, there is a strong but less widely recognised case for claiming J. M. Coetzee as a significant figure in the early development of digital humanities. In light of the recent renaissance of computer-assisted statistical approaches to literary analysis, the present article charts in detail a formative period (1969-1973) during which this most important of novelists first adopted, then deconstructed, and eventually rejected one of the discipline’s earliest incarnations.
Coetzee's influence on digital humanities
Thanks in no small part to his first two works of fictionalised autobiography
studying mathematics
assists with first-year tutorials in the mathematics
department
a mathematician to be
to study pure mathematics to the exclusion of
everything else
pure mathematics appears to be the closest approach the
academy affords to the realm of the forms
as noble as mathematics
become like those scientists whose brains solve
problems while they sleep
burn
the moment in history when
Death to reason, death to talk! All that matters is
doing the right thing, whether for the right reason or the wrong reason
or for no reason at all
While the evidence provided by Coetzee's fictionalised autobiographies is of
course rather compromised by its pointed generic instability, it remains
somewhat surprising how little detailed attention the critical discourse
surrounding his work has paid to his clear and profound inheritance from his
engagement with mathematics and, moreover, computer science. Even so, though
some of the more widely cited long-form critical works on Coetzee's writing make
no reference to these subjects at all (see, for instance,
One reason for the lack of focused attention on Coetzee's engagement with these subjects in his fiction might well be the fact that his work in this area now seems rather dated, a relic of the structuralist fervour of the late 1960s against which he had himself quite violently revolted by the time he came to publish his first novel,
refocus the hermeneutical problem away from the nature and limits of computation (which is mostly a matter of methodology)
the artifacts of human culture as radically transformed, reordered, disassembled, and reassembled
Ultimately, though, and putting the minutiae aside, the present-day significance of Coetzee’s contribution depends to a large extent on how successfully the discipline as it stands today negotiates Coetzee’s deeper, more philosophical query: if one can largely eradicate the subjective biases of traditional literary criticism by restricting one's statements to the near-neutral categorical definitions of mathematics, then how can one return from the mere statement of numerical values to draw meaningful natural-language conclusions, without allowing those subjective biases to creep right back in? With this in mind, the principal purpose of the present essay is therefore not to directly assess Coetzee’s work within and around the field of stylostatistics against today’s standards, but rather to provide a narrative account of this work, starting with his first published journal article,
Appearing in a 1969 edition of
Put simply, we can understand Coetzee’s use of the term stylostatistics
to
refer to the branch of stylistics concerned with those features of a text's
style that can be subjected to numerical analysis. The principal aim of the
stylostatistician is to strip away the subjectivity implicit in other types of
literary criticism, leaving only quantitative propositions that, in Coetzee's
words, will not carry the kind of connotative freight that,
for example, the proposition 'A's language is dense' came to carry in
Scrutiny criticism
it is upon a very small minority that the discerning
appreciation of art and literature depends
Difficultycommunicate the fact that he is not in the final analysis interested in producing any clear definitions of
difficultyper se: this term, as it is used in natural language, will ultimately remain at least partially obscure. Given this inevitable impediment, Coetzee's opening paragraph clarifies his conception of the role of stylostatistical analysis:
I take it as universally acknowledged that “difficult” in the proposition “A's style is difficult” is a complex word, and hence that the proposition in fact expresses a number of component conclusions, many of them quantitative in nature and therefore capable of being chiselled into numerical form. We may infer, indeed, that these quantifiable components take their origin in some quantitative, cumulative procedure, however loose, that we follow in our minds as we read, and hence that propositions about “difficulty” and perhaps other so-called qualities of style are most simply and logically formulated with their quantitative and nonquantitative components kept distinct.
The stylostatistician, then, limits his or her analysis to those
capable of being chiselled into numerical form
,
considering these as mere indicators of an overall
Within this methodological framework,
the phenomenon in which we really ought to be interested is not the syllable (as Fucks assumes) but the morpheme, since we can give a more precise meaning to the definition of a word of many morphemes asdifficult than a word of many syllables (considerOopsidaisy)
But what, Coetzee asks, does the Fucks Index measure, exactly? And how does the
trace it produces correspond with the natural language notion of
[I]f we propose to describe the style of a text, an
index must remain meaningless until we can specify precisely what it
measures, i.e. with which phenomena in the text it varies
systematically. If the value of the trace is high, as it is for Othello,
what features of the language of Othello would this value enable us to
predict without referring to the text? If its value is low, as it is for
the present essay, what features of the essay are being reflected? Can
we specify the features both simply and informatively?
fullest statement of what the index measures is a
mathematical restatement of the definition of the index
represents the
defeat of any attempt to distinguish between a quality in the text
(
: What we hope for is presumably a compromise: neither the
extreme simplicity but extreme vagueness of words like
cluster
is sufficiently defined – and that this increase ought
to be commensurate with the increase in the trace value.
Having established this basic goal of stylostatistics as exemplified by the Fucks
Index, Coetzee demonstrates by counterexample that the specific formula Fucks
uses to calculate his trace will not always yield results that stand up to this
test: it is possible, he shows, to deliberately construct texts that have either
a high trace and a relatively high number of clusters, or a low trace and a
relatively low number of clusters. Moreover, the trace tends to accord
disproportionate weight to the values generated by words of higher syllabic
length: in other words, a text that has a high trace in respect of its
constituent mono-, di-, and trisyllabic words may find its overall trace value
affected in an exaggerated way by the occurrence of a couple of highly syllabic
words. The second of these two problems he considers as one of categorisation:
if we are prepared to accept a word of three syllables
as
, he
proposes, then we can accept as valid the revised and more
In
spaceis a structured set of points with fixed definitions regarding the behaviour of the space and the relationship between the points; perhaps the most familiar example is the Euclidean plane. A
probability spaceis a finite space with an associated probability measure that assigns a value between 0 and 1 to the space as a whole. An 'outlier' is an observation that is considered not to conform to the general pattern of a given data set.
languageen masse exhibits many of the characteristics of chance phenomena, and […] the inverse correlation between the value of the trace and the degree of presence of polysyllables and polysyllable clusters, while not invariable, has a high probability associated with it.
may seem odd at first sight that something which is so largely a matter of design as a literary text should exhibit randomness
kind of breakdown in the trustworthiness of the Fucks trace […] will in practice occur very seldom
[w]e cannot, unfortunately, claim that it will never occur, for then we would have to show that texts like [the counterexamples] are notnatural , and would inevitably be reduced to talk about intention
texts [that] are constructed with an eye to the code rather than to the message
While
In the course of a fusillade against what he callsthe revolt against reasonin present-day humanistic studies, Joshua Whatmough quotes a pronouncement of David Hilbert's from 1918:Everything that can be an object of scientific thought at all, as soon as it is ripe for the formation of a theory, falls into the lap of the axiomatic method and thereby indirectly of mathematics.
In service of his ambitious and interdisciplinary thesis, then, Coetzee
establishes his critical methodology as operating within a theoretical space
delimited by two poles
of thought he discerns
within the existing discourse. The first of these classic definition […], style had indeed become an
object of scientific thought, was ripe for the formation of a theory,
and was falling, not at all indirectly, into the lap of
mathematics
the message carried by the frequency
distributions and the transitional probabilities of its linguistic features,
especially as they differ from those of the same features in the language as
a whole
object of
scientific thought
we automatically circumscribe it in terms of a
propositional content – its message
– that is fully
coextensive, without remainder, with the quantitative description generated by
stylostatistical analysis.
Coetzee's selection of Bloch as the principal representative of his first
The statement of meanings is […] the weak point in language-study, and will remain so until human knowledge advances very far beyond its present state
While Coetzee's selection of Bloch as the representative of this
strongly against any simplification of language […] and
indeed against any abstraction from words as counters in a calculus of
thought to words as counters in the less flexible calculus of
language
At its most fundamental level, then, Coetzee's thesis originates from a
compulsion to explore the sense in which Beckett's rejection of the type of
abstraction routinely performed by structuralist linguists such as Bloch and
Bloomfield constitutes a further denunciation of the apparent ease with which
certain terms and categories from the discourses of statistics and probability
theory had begun to migrate into humanistic studies. The dextrous manner in
which he introduces his negotiation of these terms merits close examination: Between the conceptions of style held by Bloch and
implied by Beckett there are no doubt similarities: Beckett's
terribly arbitrary materiality of the word's
surface
terribly arbitrary materiality of the word's
surface
is a translation from Beckett's original German
of the phrase fürchtliche willkürliche Materialität der
Wortfläche
. See
Even as he establishes the nominal focus of
On the basis of his introduction, Coetzee suggests that the significance of Beckett's attack on
Beckett's description of the
materiality of the word's surface
pictures language as a
wall between objects and their percipientsposition on
style
he adopts in the thesis as being plainly closer to Beckett's than to Bloch’s
connotative freight
and, on the other, the prevailing
orthodoxies of contemporary stylistics. In the first case, for example, he
characterises Hugh Kenner's principal approach as constituting an attempt to catch the essence of Beckett's style in a metaphorical
way
: Thus, for example, of the
unique
translucent enumerating style
of It is an austere
prose, not narcissistic, nor baroque. It is not opulent. It moves
with the great aim of some computation, doing a thousand things but
only necessary ones.
has some perceptive pages on thedizziness(vertige) induced in the reader by his mathematical comedy
a tradition of literary criticism in which terms likeausterehave an agreed meaning, and in which insight into the nature of a style is a partly intuitive act
statements which can be verified by quantitative analysis
general positivism
artistic whole
experience of a work of literature is not necessarily linear in time,and instead tacitly defer to an
analogy of reader to decoding devicethat he considers
misleading
It is probably not too controversial to state baldly that
like Leibniz's automaton with a spark of life
[s]tanding Bergson on his head, […] something living encrusted on the mechanical
characteristic of Watt that he believes that an empirical question can be solved by logical analysis:
No empirical data are introduced into his chains of speculation. The multiplication of these chains depends on a maneuver in four stages: statement of a question, proposal of a hypothesis, breakdown of the hypothesis into components, and analysis of the implications of the hypothesis and its components. [...] The third stage typically breaks the chain into two or more branches. The only qualification Watt demands of a hypothesis is that it answer the question: his criterion is one of logic rather than of simplicity.
disregard for simplicityand is the
foundation of [Watt's] logical comedy, for simplicity is the only criterion that can put a stop to an endless proliferation of logical speculation. InWatt we regularly, with a sinking feeling, find ourselves at the beginning of infinite series.
the infinite series which automatically spring up must somehow be terminated
terminates in the solipsism that is one of Watt's answers to the infinities of logic: fish that need to rise and fall exist because my naming of them brings them into existence
Watt's consciousness, then, is analogous to the type of deterministic formal
axiomatic system of which the modern computer is perhaps the most familiar
model. Built from a series of axioms or rules for behaviour, the system is set
into motion by the intrusion of an essentially arbitrary piece of empirical
data, which consequently acts as its originary affirmation. As Coetzee explains,
with every passing instance in which Watt initiates an exhaustive combinatorial
analysis in response to a particular set of circumstances, the reader gains a
cumulative sense of the inextricability of his condition; the attempt to understand the nature of the simplest
sensory perceptions
Without the means to make qualitative value distinctions beyond the basic logical
tools with which he is endowed, then, Watt is radically unable to determine the
limit-point at which his analysis of each given set of circumstances might be
said to approximate truth to an extent sufficient to justify action. Indeed, the
very idea of cause and effect becomes more and more undermined as Watt's
experience becomes progressively The explosion of logic, epistemology, and ontology takes
Watt into another zone (the asylum) in which he lives a progressively
inverse life. Decline and inversion are reflected in Watt's language, as
reported by the narrator Sam. Decline and inversion constitute what I
call the shape of the telos. What is still lacking is the causal
element. For certain reasons a certain kind of man experiences a call to
a certain kind of situation, and the result is decline and inversion: we
see the results but not the causes, unless we take the step of calling
Watt's whole universe absurd.
[i]f we can justify an initial segmentation of a set into classes X and not-X [...], the whole structure of mathematics will follow as a gigantic footnote
mathematician enough to appreciatethat, on the basis of merely one
single sure affirmation,a
whole contingent world […] can, with a little patience, a little diligence, be deduced
Through an analogy with the analysis of
On the other hand, the smallest amplifications of meaning, particularly those which were probably not under the conscious control of the author - - for example, the frequencies of the words in the text - - show, when quantified, what looks suspiciously like system, i.e. they act like well-behaved mathematical functions. Turning the syllogism upside down, we infer that well-behaved mathematical functions defined on the quantified components of the text define components that belong to the smaller amplifications of meaning.
elevation in dictionas a descriptor for certain textual features, other than to provide a misleading
By the time sufficiently many literary works have been described in terms of the same measures, the measures themselves may come to have associative values with different texts. We may find, for example, that a high noun-to-adjective ratio is common to Pliny and Thomas à Kempis, a low ratio to Virgil and Tacitus. The ratio may then become associated with a quality we may call elevation in diction. But ultimately elevation will have to be defined in terms of the noun-to-adjective ratio and other measures. There is no escape from the absolute measure of quantification here.
More troublingly, perhaps, it is not just in descriptive terms of this nature
that we encounter such a problem: the origins of even the most apparently basic
linguistic terminology are equally as precarious: a little computation shows us that, whatever definitions
of noun and verb we adopt, their effect on the noun-to-verb ratio, while
greater than the effect introduced by the uncertainties created by
implicit nouns and verbs, is considerably less than the effect that
could be introduced by uncertainties in the classifications
noun
and verb
[…]. It does imply that the potential for disastrous
error is high when we depend on figures not derived from identical and
therefore exhaustive definitions of noun and verb for the purpose of
comparing the nominalism
of different texts and authors.noun
or verb
.
Generally, the comparison of the works of any two authors requires strict
definition of the terms of that comparison: the result of this comparison,
however, is destined ultimately to become primarily a comment on the act of
definition that has taken place, rather than on any inherent quality of the
texts or authors themselves. To Coetzee's mind, the only conceivable solution to
this problem that might help to square intuition with mathematics
our only recourse
, he explains, is therefore to assign different numerical weights to
different nouns and verbs, based on such criteria as their rarity, their
degree of compoundness, etc
But now we have opened the floodgates. For we are not
concerned, for example, with absolute rarity (whatever that is) but with
rarity in a context. The position becomes untenable, for no
generalization is possible, and the reason for computing to ratio in the
first place is to have a measure of nominalism in the text, i.e. to have
a generalization about a certain aggregate of particulars.
Ultimately, we are left to conclude that the use of the same index on two
separate occasions is logically counter-intuitive: whereas two words could
previously become equal by being used with the same
frequency
, Coetzee explains, the notion of equality in meaning is tenuous
We are faced, then, with a story in which statistical
analysis of the distribution of vocabulary, classification of the less
neutral diction, and analysis
en masse of
sentence structure, seem at best only to confirm our understanding of
the structure of the work and at worst to remain trapped in their own
terminology.hold
with throw
and reveal
for example) to
the less robustly delineable association of nouns on a semantic basis
(building
with edifice
and construction
) – are
necessarily echoed in the
Perhaps the most critical of all Coetzee's observations in
inverse relation […] between rank and frequency
describable in mathematically simple terms?
Is it coincidence, or is it one instance of isomorphism between the structure of language and the structure of mathematics? In the first case the Zipf-Mandelbrot lawAs Coetzee defines it in his essay on Beckett's is a useful descriptive fact, loosely aLessness , the Zipf-Mandelbrot Law describes the phenomenon such thatin normal discourse each extension of the length of the text adds, though more and more slowly, to the number of different lexical items called on.See. law.In the second case it is indeed tautologous, but the consequences are too immense to bear contemplation.
Coetzee, then, was unwilling to contemplate in the context of his doctoral thesis
the
Depending on how you view him,he begins,
Wilhelm Fucks is a polymath of refreshing synoptic vision or another of those muscle-men of statistics (Yule, Herdan et al.) to whom a ward of kwashiorkorCommon in areas experiencing drought and famine, and characterised most visibly by the distension of the sufferer's abdomen, kwashiorkor is a form of malnutrition that results from insufficient intake of protein. victims or a page of print is first of all a set of quantifiable phenomena and only secondarily people or literature.
speakable formalized language
as a universal language for the technocratic elite
tie succeeding generations into a twentieth century positivist mythology more tightly than natural languages tie us into the mythologies of the past
The essential purpose of
the theme that the artist, like any other organism, exhibits regularities of behaviour, which can be exposed by statistical analysis
the elegantly formulable mathematical distributions underlying such phenomena as the lengths of sentences in a text and the pitches of note-pairs in a concerto
explain so patiently and with such lavish visual aids his basic procedures
seductive […] a field which many think of as rather arid
emphatically not a handbook
a compendium of investigations into intrinsically interesting stylistic topics
seducethe non-specialist reader, not least by indicating the value of stylostatistical procedures beyond the academy; in a chapter pointedly entitled
inter alia that the Gospel of St. John and Apocalypse are probably not from the same hand
Coetzee's praise is tempered, however, by his enduring conviction that the means
through which stylostatistical analysis enacts its negotiation between the
qualitative and the quantitative is intrinsically flawed. While any reasonable man must be convinced that regularities of
all kinds, regularities of stress, of syntax, of word choice, and so
forth, run through literary compositions, [and] that the set of these
patterns comprises a great deal of what we call style
overwhelming proportion of [stylostatistical indices]
either have no critical application or represent quantitative
restatements of qualitative propositions (
A's verse is more varied
than B’s
)
kind of datum that the statisticians, Fucks included, feel at home with is extremely elementary: word length, sentence length, ictus, grammatical class, depth of subordination
a whole new typology of structures
classify and count in a much more complex way
While the review is essentially concerned with commenting upon Fucks's
contribution to the discipline of stylistics, then, Coetzee's more considered
conclusions refer not strictly to issues of literary criticism, but instead to
an epistemological model that stylostatistical analysis covertly advocates. By
suggesting that it would be fairest to take this book as a work of
propaganda, a work intended to convince the uninitiated first that there
are regularities they had never suspected underlying behaviour
regularities
. Fucks, he explains, has a distaste for the
formal phenomena of the printed text
objective descriptive aesthetics
By the end of the review, then, one is left with the clear message that, whereas
Fucks' propaganda
is aimed towards assuring his
readers that a literary science of exact numerical description is a
good thing
objective analysis
omit a great deal
a
speakable formalized
language
as a universal language for the technocratic
eliteWhorf's thesis that
languages have built-in epistemological biases
In some regards, Coetzee's route from the review of
features not often encountered in connected discourse
most notableof these, Coetzee elaborates, is its
finiteness: in the sense that the text of
It is this fact,Coetzee states,
which suggests a mathematical approach to the text, an approach not only via the mathematics of indeterminacy, namely probability theory […] but also via combinatorial mathematics
From this starting point, Coetzee first establishes by means of Spearman's rank
correlation coefficients
)ranges from 1 (perfect positive correlation), through 0 (no
correlation), to -1 (perfect negative correlation).with any acceptable degree of certainty
unit of combination in
obtain an unambiguous segmentation of the text into 106
different phrases varying in length from 1 to 12 words and occurring, on
an average, 5.7 times each
no closed subsets of phrases
there is no statistical reason for rejecting the
hypothesis that phrases are distributed randomly over paragraphs
do not fall into any […] elementary patterns
In the sense that it generates a relatively mechanistic and conventional form of analysis, and as such is fairly typical of the discourse of computer-assisted literary criticism as it existed in 1973,
Beckett's most recent fictions, theResidua , of whichLessness is one, portray an existence whose conditions are stripped further and further down
The first level of this consciousness contains a past womb-existence, a set of figments. The second level contains the figments of the new fictionLessnessthat the consciousness now inhabits: ruin, sand, body, etc. The third level contains only the pair dawn-dusk, each of which eventually cancels both the other and the figments for which the other is responsible.
In
an infinite series of nested consciousnesses, each dismissing the figments of its predecessor, is presented in the paradigm of a two-component switching mechanism,each of which ultimately
annihilates the figments of the other
there are no determinate principles of ordering among phrases, sentences or paragraphs, yet that all are interdependent and connectedand that, consequently, there is
no principle of hierarchy or priority among the components of the work
valid as fiction
Since any fragment can combine with any other fragment, and since the 106 phrasal components are not only formal elements but also pretty irreducible elements of meaning, composition is a combinatorial game played with creations of what I have called the second level of the imagining consciousness – a level whose creations are dismissed as figments – and the upshot of the game is nothing more than what Sam, inWatt , calleda pillow of old words.
Ultimately, then, Coetzee proposes that one ought not to take too seriously any
cumulative effects resultant from the essentially arbitrary route taken by the
consciousness enacted through Beckett's fiction, but rather to the ephemeral,
non-linear motions through which it passes within the working-out of its finite
process: The residue of the fiction is not then the final
disposition of the fragments but the motions of the consciousness that
disposes them according to the rules we have traced, and no doubt others
we have failed to trace.
is the plight of consciousness in a void, compelled to
reflect on itself, capable of doing so only by splitting itself and
recombining the fragments in wholes which are never greater than the
sums of their parts
Dusklands, In the Heart of the Country, Foe,
Disgrace, Elizabeth Costello,
and Diary of a Bad Year in particular – are not unlike
Lessness, in the sense that they represent the
motions of a consciousness through an apparently disordered maze of assertions,
appearing to enact a cumulative process as the consciousness experiences,
affirms, and effaces various propositions, often paired in binary oppositions,
before seemingly arriving at fixed conclusions by the novel's end.
While the force of Coetzee’s conclusions in relation to the discipline as it stands today remains very much up for debate, his work during the period under observation here offers a unique perspective not only on the early years of the field of digital humanities, but also on the intellectual development of one of the most significant novelists of the late twentieth and early twenty-first centuries: one rarely emerges from the revolutionary battles of one's youth unmarked, and it might be said that in order to locate those marks in the war stories of a veteran, one need not only know that the war happened, but also the detail of each particular battle in which the storyteller fought. As such, the consequences of the present paper are threefold: firstly, critics of Coetzee’s writing ought to approach the thematisation of quantification — particularly in such works as