Abstract
The Tesserae Intertext Service Application Programming Interface (TIS-API) enhances
the machine-accessibility of the intertext discovery capabilities of the Tesserae
software. Instead of requiring inputs through a human-accessible webpage, the TIS-API
accepts inputs according to a web development standard. Two case studies demonstrate
the contributions of the TIS-API to computer-assisted literary criticism,
particularly in increased software development and maintenance flexibility as well as
in easier integration of Tesserae software into research workflows. Those interested
in integrating the TIS-API into their digital projects can find documentation at https://tesserae.caset.buffalo.edu/docs/api/. For exact implementation
details, the source code is available at https://github.com/tesserae/apitess.
Introduction
The term “intertext” is used to name a passage of text that gives evidence of one author
using another author’s words. Scholarly consensus indicates that intertexts exist
([
Juvan 2008] [
Allen 2011] for Latin literature, see [
Edmunds 2001] [
Coffee 2012a] for
Greek literature, see [
Berti 2016]). While there is great variation in the motivation for
the use of an intertext, perhaps the reuse signals an author’s erudition or enhances the
effect of an expression. For example, when Vergil in his epic
Aeneid describes the sound
of a horse galloping across the fields:
quadrupedante putrem sonitu quatit ungula campum (Aeneid 8.596).
The horses’ hooves shake the soft-soiled plain with four-footed thunder.
He is adapting a line from his epic predecessor Ennius:
. . . summo sonitu quatit ungula terram. (Ennius Annals 8.264)
The horses’ hooves shake the earth with great thunder.
With this recollection, Vergil is reusing available poetic materials, demonstrating
familiarity with his predecessor Ennius, and inviting the reader to compare the parallel
passages.
Various attempts have been made at computer-assisted intertext discovery ([
Lee 2007], [
Mastandrea 2011], [
Büchler 2013], [
Chaudhuri 2015])
[1], but none have made a conscious effort to standardize
machine-accessibility of the intertext discovery process and results. This paper
presents the Tesserae Intertext Service Application Programming Interface (TIS-API),
which specifies the rules of machine-accessibility for the intertext discovery tool
known as “Tesserae”.
We argue that machine-accessibility to Tesserae’s capabilities, as provided by the
TIS-API, is an enhancement to the existing Tesserae software. We first demonstrate that
the TIS-API does, in fact, make Tesserae’s capabilities machine-accessible by showing
how the TIS-API is designed on standard principles in the web development community, and
by walking through an example of how the TIS-API can be used to conduct a Tesserae
search in a machine-accessible way. Two case studies further explore how the TIS-API
enhances the Tesserae software. The first case study demonstrates how the TIS-API aids
collaboration among members of the Tesserae project team, particularly in the
implementation of the new Tesserae website interface. The second case study demonstrates
how someone outside of the Tesserae project team sees value in the TIS-API for making
the software easier to integrate into research workflows that aid reproducibility in
literary criticism research.
Tesserae
The Tesserae Project develops software for discovering intertexts, with particular focus
on the intertexts that appear in ancient Greek and Latin literature [
Coffee 2012b]. The
project’s main tool, called Tesserae (
https://tesserae.caset.buffalo.edu/), accomplishes this by analyzing and
comparing two texts for regions of word reuse.
For example, in order to find intertexts between Vergil’s
Aeneid and Lucan’s Bellum
Civile, Tesserae goes through both works line-by-line to find where pairs of words occur
in lines from both works. Because the
Aeneid precedes the Bellum Civile in publication,
we can assume that the
Aeneidserved as a source text from which the later poet could
draw upon for poetic raw material. Accordingly, the results from this Tesserae search
can help us find places where Lucan may have borrowed words from Vergil (see
[
Coffee 2012b] for examples).
Given the highly inflected nature of both ancient Greek and Latin, Tesserae also allows
for matching by stem. This allows for finding examples of Lucan reusing Vergil’s words
in different grammatical and morphological forms. It is also useful to exclude common
words, that is stopwords, from Tesserae searches, on the grounds that the reuse of
common words signals not so much Vergil’s influence on Lucan as much as the common
language available to the poets in composing their respective works. In summary, when
Tesserae is given (1) two texts to compare, (2) a manner in which to compare the words
of the texts (i.e., by exact word match or by stem), and (3) a list of stopwords to
ignore, instances of text reuse can be found, which in turn can serve as evidence of
intertexts.
Originally conceived of as a command line tool, the advent of the Tesserae website
interface in 2009 (
https://tesseraev3.caset.buffalo.edu/) made the delivery of its intertext
discovery capabilities much more human-accessible [
Coffee 2012b]. However, this same
interface made machine-accessibility difficult for two main reasons. First, the
submission of Tesserae searches relied on user input via web forms. Second, the results
of the Tesserae searches were returned as an HTML webpage. While these two reasons did
not make the Tesserae results impossible to access automatically, they did present
difficulties with the automatic submission of search queries and the parsing of results
because there were no guarantees that the input forms or the results webpage would
remain unchanged. Any tools built on the Tesserae site looking and performing in exactly
one way would most likely fail in the face of website updates that changed the input
forms or the layout of the search results. The lack of a machine-accessible design
created a situation in which Tesserae could be difficult to integrate with other digital
projects, short of installing and hosting a standalone instance of Tesserae.
The TIS-API eases the burden of integrating Tesserae’s intertext discovery capabilities
into other projects by permitting queries to Tesserae’s database of texts, submission of
Tesserae search requests, and retrieval of search results — all in a machine-accessible
way. The TIS-API was designed, developed, and deployed at the SUNY University at
Buffalo, in collaboration with developers at the University of Notre Dame, where backend
libraries on which the TIS-API relies were developed. The University of Notre Dame also
developed a web frontend powered by the TIS-API, demonstrating one instance in which the
TIS-API eased the burden of integrating Tesserae’s intertext discovery capabilities into
new software. As an important perspective from outside of the Tesserae Project, a
researcher at the Quantitative Criticism Lab (University of Texas at Austin) has tested
and commented on the TIS-API.
TIS-API Design
Following standard practice for machine-accessible communication across the Internet,
the TIS-API was designed according to the five REpresentational State Transfer (REST)
principles [
Fielding 2000], [
Fielding 2017]. Since the basic rules for communication over
the Internet are already defined by the HyperText Transfer Protocol (HTTP), the five
REST principles provide further guidelines on how to use HTTP for effective
machine-to-machine communication.
The fundamental assumption of the five REST principles is that the only thing a machine
can request from another machine is a resource (Figure 1).
In the original
vision of the five REST principles, a resource typically took the form of a webpage or
some multimedia asset, like an image. For Tesserae’s intertextual discoveries, one
resource would be the discoveries themselves — the results from a Tesserae search. Other
resources, such as text information and search options, aid in the production of those
discoveries.
Starting from these basic assumptions about resources, the first REST principle is to
have a name for every resource (Figure 2).
This allows one machine to make a
request on a specific resource from another machine. For example, the name
http://www.buffalo.edu/ refers to the resource that is the main web page for the
University at Buffalo. The TIS-API specifies naming conventions for the resources
necessary to carry out Tesserae searches. For example, the naming conventions require
that “/texts/” be part of the name for resources referring to texts available in a
Tesserae database. More examples of names and the TIS-API naming conventions will be
shown later in the paper.
The second REST principle is that the way one specific request behaves for one resource
is similar to the behavior of the same request on a different resource (Figure 3).
Within the rules of HTTP, there are various requests that can be made.
One common request is GET, which requests retrieval of the resource. Going back to the
University at Buffalo main web page example, a computer requesting a GET for
http://www.buffalo.edu/ is asking another computer to retrieve the University at
Buffalo main webpage. The TIS-API specifies that requesting a GET for
“<webhost>/texts/” (where “<webhost>” is replaced with the URL referring to
a server on which the TIS-API is installed) will retrieve information about the texts
available in a Tesserae database.
The third REST principle is that modification of a resource is mediated through
instructions, which are called representations (Figure 4).
For the
TIS-API, this principle is highlighted particularly in the way Tesserae searches are
submitted. To submit a Tesserae search, a machine must request a POST (like GET, POST is
one of the requests that can be made through HTTP) on the “<webhost>/parallels/”
resource and provide Tesserae search options in a particular format. The particularly
formatted search options would be a representation to which the third REST principle
refers.
The fourth REST principle is that representations are self-descriptive. Admittedly, the
TIS-API does not conform to this principle directly. However, one way in which the
TIS-API adheres to the spirit of this principle is in the search results resource.
Search results resources are defined to contain the search options used to produce the
results available in the resource. In this way, it would be possible to know exactly
what options produced these results (Figure 5). This also enables caching of
results, which can lead to a better user experience in the form of reduced search
times.
The fifth and final REST principle is that application state information should not be
stored; instead, it should be passed as a representation. To unpack the meaning of this,
it is important to consider another assumption about the REST principles: when one
computer communicates with another computer, the two are in a server-client
relationship. The server is the computer that holds the resource that a client wishes to
make a request on. As a result, the client always initiates contact with the server.
Application state refers to a step in some process that the server and client navigate
together.
A common server-client application is online shopping. The first step is for the client
to submit a set of items to purchase; then, the server needs to calculate various things
(like whether the items are in stock, how much the order will cost, taxes to collect,
and other business-related tasks); then, the client needs to provide authorization to
pay for the purchase; finally, the server should notify the client that the order has
been placed and that the purchase has been billed. According to the fifth REST
principle, neither the client nor the server should store information specifying which
step they are on. Rather, the exchange of information they conduct as part of that
process should specify which step they are on.
The TIS-API exhibits compliance with the fifth REST principle in the way it handles
Tesserae search submissions (Figure 6).
For the Tesserae search application, the
client first submits a Tesserae search; the server then needs to run the search and
store the results; finally, the client should be able to access the results. Since
Tesserae search may take longer than a user is willing to wait, it seemed important to
design the search application in such a way that the client would not be stuck waiting
for the server to respond with search results. Designing the application so that the
server would first respond to the client with the resource name for completed search
results not only freed the client from waiting on the server but also made storing
application state in either the server or the client unnecessary. The client can make a
request on the completed search results, and if the server responds that they aren’t
ready yet, the client can wait to make a request later.
The previous points demonstrate that each of the five REST principles were considered in
certain aspects of the TIS-API. While these examples do not describe all the ways in
which the five REST principles were employed in the design of the TIS-API, these
examples should serve to substantiate the claim that the five REST principles were
important considerations during the design of the TIS-API. Because of this, the TIS-API
provides machine-accessibility to Tesserae’s capabilities.
Tesserae Search via the TIS-API
Though claims of machine-accessibility can be made through the REST principles, it can
be more convincing to show machine-accessibility by walking through the process of
submitting a Tesserae search through the TIS-API (see Figure 7 for a
high-level workflow of the process). For those who prefer directly reading the behavior
prescribed by the TIS-API instead of going through this example, the TIS-API
documentation site is available at
https://tesserae.caset.buffalo.edu/docs/api/. For those who want to know exact
details about how the TIS-API was implemented, source code is available at
https://github.com/tesserae/apitess.
Anatomy of a Tesserae Search Query
To start the discussion of how the TIS-API can be used to perform a Tesserae search, the
component parts of a Tesserae search query should be reviewed. Although the details for
the Tesserae search process have already been described [
Forstall 2014], the focus here
is to have a simplified but strong conceptual grasp on the parts involved in a Tesserae
search. By understanding what the building blocks are for defining a search, it will be
easier to understand how each of these blocks are addressed by the TIS-API. The aspects
of the Tesserae search input discussed in this paper are:
- source text
- target text
- type of feature used to find matches
- stopwords
To frame the discussion of these inputs, it will be helpful to state what the Tesserae
search is supposed to do: the objective of Tesserae search is to find intertexts between
two works. Thinking of Tesserae search in this way, it becomes immediately obvious that
two works need to be specified in the input. Because Tesserae was originally developed
as a tool for intertext detection, a relationship between the two works is assumed:
namely, that one work influenced the other. For this reason, a Tesserae search query is
formulated in terms of a “source text” (i.e., the work that influences the other) and a
“target text” (i.e., the work that was influenced by the other).
Another aspect fundamental to specifying a Tesserae search query is defining what counts
as an intertext. For purposes of the Tesserae search algorithm, an intertext is defined
by the matching of at least two features between two passages, where one passage comes
from the source text, and the other passage comes from the target text. Thus, it becomes
necessary to specify which type of feature to compare when looking for matching
features.
One feature type is the lemma type, which can be thought of as a dictionary headword.
When looking up the definition of a given word in the dictionary, it is sometimes
necessary to change the spelling of the word to find the correct entry. For example, the
definition for the word “love” in “I love to swim” is found in the dictionary under the
lemma “love”. In this example, no spelling change is necessary to find the lemma.
However, the definition for the word “loves” from “He loves swimming” is also found
under the lemma “love”. It is necessary to remove the final “s” from “loves” in order to
find the lemma.
If lemmata (the plural of lemma) are used to search for intertexts, only passages that
share at least two different lemmata will be considered a potential intertext. For
example, suppose we have the following passages (again):
- I love to swim
- He loves swimming
Note that passage 1 contains the words “love” and “swim”, both of which also look
identical to their lemmata. Passage 2 likewise contains the lemmata “love” (from “loves”)
and “swim” (from “swimming”). From this information, it can be seen that a Tesserae
search searching for intertexts based on the lemma feature would find that these two
passages are a potential intertext.
A final aspect to consider in this simplified conception of the inputs to a Tesserae
search is the matter of stopwords. Often, there are features we believe will be
unimportant in determining whether one passage refers to another. For example, we may
find common words like articles and prepositions to be irrelevant. In this case, we
would like a Tesserae search to ignore passages that match only because of articles and
prepositions. To specify features that we think are not interesting, we can use a
stopwords list to prevent those features from being counted when determining whether two
passages are a potential intertext.
Knowing now that a source text, a target text, a feature definition, and a stopwords
list are important options to specify in a Tesserae search, it is clear that these
options will need to be specified to the TIS-API when submitting a Tesserae search
query.
API Concepts
Before discussing how to use the TIS-API to perform Tesserae search, some terminology
and conventions should be noted. First, the meaning of “TIS-API” should be more
carefully stated. The TIS-API, in its strictest sense, is a set of rules that defines
server behavior in response to client requests. An implementation of the TIS-API,
therefore, is code that, when run, behaves in accordance with the rules that were
defined. When this code is running, we can say that a computer is “running an instance
of the TIS-API.” Since the code allows a computer to follow the rules of the TIS-API and
since the deployed software exhibits the behaviors defined by the rules of the TIS-API,
the use of “TIS-API” can slip from its strict sense into a more inclusive (and therefore
ambiguous) sense that may denote either the rules, the code, or the deployed
software.
The angle brackets (“<,” “>”) convention used in resource names should also be
explained explicitly. Just as earlier in the paper, the angle brackets (“<,” “>”)
denote a placeholder; the word(s) within the angle brackets hint at what should replace
this placeholder. For example, as seen earlier, “<webhost>/texts/” specifies a
name for interacting with text resources. However, the name does not literally contain
“<webhost>”; rather, “<webhost>” should be replaced with an identifier to a
running instance of the TIS-API. For example, if
https://tesserae.caset.buffalo.edu/api is the URL to a running instance of the
TIS-API,
https://tesserae.caset.buffalo.edu/api/texts/ (note that “/texts/” has been
added to the URL) is the name referring to the text resources available in that
instance.
Finally, the term “endpoint” should be defined. Note that the initial part of the URL is
largely irrelevant to the defined behavior for the TIS-API. In other words, whether
https://tesserae.caset.buffalo.edu/api or
https://www.tis-api.com is used to work with
texts, the TIS-API will respond in a similar manner — and it would behave exactly the
same if both web hosts had the same text information in their respective databases.
Therefore, the part of the name that distinguishes behavior is what comes after the web
host’s identifier. These distinguishing patterns that come at the end of the URL are
known as “endpoints.” By convention, these endpoints are referred to not by their full
URL; rather, they are referred to by the portion of the URL that comes after the web
host’s URL. For example, “/texts/” is the TIS-API endpoint that deals with texts.
Using the TIS-API to Run a Tesserae Search
With a clear picture of Tesserae search options as well as the vocabulary used to
express APIs, it is possible to consider how the TIS-API can be used to build and submit
a Tesserae search query. The first endpoint to consider is “/parallels/”, which accepts
Tesserae search requests. By performing a POST request on the “/parallels/” endpoint and
providing the Tesserae search options in a particular machine-readable format, a
Tesserae search query can be submitted. Exact details on this particular
machine-readable format can be found by consulting the documentation; for purposes of
this paper, it is sufficient to know that the “/parallels/” endpoint accepts Tesserae
search query submissions.
Ignoring the specifics of the Tesserae search query submission format for the
“/parallels/” endpoint, earlier discussion has shown that a source text, a target text,
a feature to search by, and a stopwords list must be specified. The TIS-API can be
useful in choosing these options.
Since Tesserae can perform intertext discoveries only between texts already in its
database, we can only reference a source text and a target text already in the Tesserae
database when submitting a Tesserae search. How does the TIS-API provide access to
information about texts in the Tesserae database? As mentioned earlier, text information
can be obtained at the “/texts/” endpoint. Again, there are specific details about the
“/texts/” endpoint that may be useful (such as querying the Tesserae database for texts
written by a particular author) which can be found by consulting the documentation. But
for purposes of this paper, it is sufficient to know that the “/texts/” endpoint can be
used to discover what texts are available to be used as a source or target text. From
the information returned by performing a GET request on the “/texts/” endpoint,
identification codes used by the database to refer to specific texts can be obtained.
These identifiers can specify a source and a target text in the information submitted in
a POST to “/parallels/”.
Although the TIS-API does not specify a convenient endpoint for determining which search
features can be specified, the documentation does specify valid options. Furthermore,
the TIS-API defines that when an invalid option is given, the response will include
information indicating invalidity of the request.
The TIS-API, however, does have a “/stopwords/” endpoint that can be useful in choosing
a stopwords list. Recall that stopwords lists are often composed of items that appear
most frequently in the language. Given the appropriate information (as defined in the
documentation), the “/stopwords/” endpoint consults the Tesserae database and returns
the most frequently appearing features (Figure 8).
In other words, the
“/stopwords/” endpoint can be used to collect the top N most frequently appearing
features, where N is a number that can be specified. That list can be used as the
stopwords list given to “/parallels/” when submitting a Tesserae search query.
It has been shown that the TIS-API provides the “/parallels/” endpoint for submitting a
Tesserae search query, as well as the “/texts/” and “/stopwords/” endpoints to aid in
choosing Tesserae search options. To retrieve search results, the TIS-API specifies the
“/parallels/<uuid>/” endpoint, where “<uuid>” is replaced by a search
submission identifier (“UUID” is an acronym for “Universally Unique IDentifier”, which
refers to a string of letters and numbers used to identify something). The identifier
for a particular search can be found in the response data given by the “/parallels/”
endpoint on a successful query submission. In particular, a URL that ends in
“/parallels/<uuid>/” will be specified. In other words, a successful Tesserae
search submission will provide the URL to retrieve the search results. Importantly, that
URL is transmitted in a standard, machine-readable way.
For search queries that involve very small texts, going directly to the URL as soon as
the response from “/parallels/” comes back may permit retrieving the search results
immediately. However, when longer texts are chosen, Tesserae search may take more time
to complete. In that case, results would not immediately appear. Instead, requesting a
GET at the result URL would yield a response indicating that the resource is not
available. To ensure that the search has been submitted, the
“/parallels/<uuid>/status/” endpoint, which retrieves status information on the
search submission (such as whether the search is in process or if it failed), could be
used. When the status indicates that the search is complete, it would be possible to go
to the results URL and find the search results.
Case Study: Running Search on the New Tesserae Website
Having seen a theoretical use case for the TIS-API, it seems worthwhile to consider a
practical use case for the TIS-API as well. To this end, we present the new Tesserae
website, which is available at
https://tesserae.caset.buffalo.edu/ (Figure 9 shows what the
website looks like at the time of writing when displaying results). This example
demonstrates how the TIS-API, though designed for machine-accessibility, can still be
used to simulate the traditional human-accessible experience that Tesserae has offered
and that its user base in classics scholarship (of widely varying technical comfort) has
become familiar with.
In order to perform a Tesserae search, the user first needs to choose source and target
texts. The new website displays the source and target text options in dropdown menus,
found in the top left part of the interface (see Figure 10). Behind the scenes,
the website uses the “/texts/” endpoint to find what texts are available for comparison
from the TIS-API. It then populates the dropdown lists with the information received
from the “/texts/” endpoint. The user can then select source and target texts with these
dropdown menus. In the example image (Figure [uiscreenshot]), the user has selected
Vergil’sAeneidas the source text and Lucan’s Bellum Civile as the target text.
The user might also wish to have stopwords for the Tesserae search. On the new website,
the number of stopwords to be used can be specified under the “Advanced Options” (see
Figure 12). Behind the scenes, the website uses the parameters specified
under “Advanced Options” to calculate a stopwords list using the “/stopwords/” endpoint.
In the example image (Figure 13), the user has left the slider bar under
“Stoplist Size” at the default of 10 stopwords. The example image further shows that the
stopwords will be calculated based on the corpus available to the TIS-API, as shown by
the selection under “Stoplist Basis”.
To submit the Tesserae search for processing, the user simply hits the “SEARCH” button
(see Figure 12). This action causes the website to submit the search parameters
to the “/parallels/” endpoint. Once the search is complete, the website automatically
calls on the “/parallels/<uuid>/” endpoint to retrieve the results of the search
and displays those results on the main part of the interface (see Figure
13).
One tangential benefit of defining the new website’s functionality in terms of the
TIS-API was development flexibility. Because the TIS-API defines expected behavior, the
website developer could work on the website at the same time that the API developer was
implementing the expected behavior. This allowed for important feedback informed by the
website developer’s user experience. That feedback resulted in additional behaviors
(e.g., paging of the search results, more detailed search status information) that
improved the API and website together. As the two efforts came together for deployment
of the new Tesserae website, the integration of the website with the TIS-API
implementation required relatively little work (deployment itself, along with making
sure that the user interface looked right, took most of the development time). This
demonstrates the utility of the TIS-API internal to the Tesserae Project. Other digital
humanities developers may want to consider this benefit when deciding whether they too
can afford to improve their software’s ability to integrate with other software.
Case Study: Harnessing the TIS-API for Research
While the new Tesserae website demonstrates the advantages of the TIS-API for members of
the Tesserae Project, it is also important to establish the advantages conferred to
researchers not directly affiliated with the project. This section describes one such
researcher’s experience with the TIS-API.
For writing his 2017 article
Measuring and Mapping Intergeneric Allusion using
Tesserae for a special issue of the Journal of Data Mining and Digital Humanities on
Computer-Aided Processing of Intertextuality in Ancient Languages [
Burns 2017], one of
the co-authors of this article found himself in the position of downloading the results
of multiple Tesserae web searches in order to analyze not just an intertextual
comparison of two authors, but rather an intertextual comparison of one author (the
Latin epic poet Lucan) against several authors representative of a specific genre (the
Latin love elegists Propertius, Tibullus, and Ovid). The searches had to be conducted
individually and care needed to be taken to ensure that each search used the same
parameters, including stopword lists and cutoff scores, among other settings. Moreover,
the author needed to be sure that this consistency in parameter selection could be
maintained between sessions, with sometimes weeks or even months passing between
searches. At the time, these factors were documented by the author in research notes,
but from testing the TIS-API, it is clear how this process could have been improved had
the new set-up been available: a similar paper written now could document the exact
TIS-API calls for all of the author-to-author comparisons in a script with parameters
specified upfront as constants.
This enhancement to the research process made possible through the TIS-API could be seen
as a variation on software development’s DRY — don’t repeat yourself — principle
[
Hunt 2000]. In programming, DRY argues for a reduction of duplicate code to increase
writing efficiency, reduce opportunities for errors to be introduced, and decrease
maintenance costs. The TIS-API allows for analogous benefits in a research context:
searches can be formalized and then run and rerun without duplicative effort. To
paraphrase Andy Hunt and Dave Thomas’s oft-cited definition of the DRY principle, the
TIS-API makes it possible for every Tesserae search to “have a single, unambiguous,
authoritative representation” [
Hunt 2000, 27] within a study.
Accordingly, code-based calls to the TIS-API should be considered now a best practice
for researchers needing to aggregate the results of multiple Tesserae searches, as for
example papers such as [
Bernstein 2015]. That said, even with respect to research making
use of even a single Tesserae search, the ability to document the request in the form of
an API request is valuable. This is because, no matter the number of searches, being
able to document explicitly decisions made and actions taken in the course of gathering
data plays an important role in making that research reproducible. In other words,
code-based calls to the TIS-API improve researchers’ experience by making decisions
explicitly documented at the time of data-gathering, rather than as an afterthought. It
has been argued that researchers should aim for well-documented and reproducible
workflows as part of a critical digital humanities [
Dobson 2019]. It can also be argued
that academic developers can support researchers in reaching this goal by making it
easier to incorporate their software tools into research workflows. The TIS-API is a
move in this direction.
One final point about using Tesserae searches in research workflows: at present,
running, say, dozens or hundreds of Tesserae searches using the web form could be seen
as a labor-intensive proposition, enough so as to be a disincentive to exploring
research questions demanding this volume of web-form entry. Accessing the TIS-API
through a script, on the other hand, greatly improves the researcher’s experience by
reducing the labor of specifying Tesserae searches, which in turn reduces the risk of
inconsistent querying and related errors that add more labor in data checking and
correction or even having to rerun completed but unusable searches. With this in mind,
one can imagine that availability of the TIS-API, by reducing the difficulty of
collecting and collating multiple searches, will encourage more large-scale
Tesserae-based studies.
Future Work
For all of the benefits made possible through the TIS-API, we recognize that the API
could be improved. One improvement in particular we are considering is a way to document
which version of the software produced a given set of results. This will be important as
software development continues on Tesserae’s core functionality. While such development
aims to improve Tesserae results, those improvements would come at the cost of
reproducibility. It is possible that the development improvement would cause results
obtained through one set of parameters on a given date to be different from the results
obtained through the same set of parameters on a different date. By explicitly
documenting which version of the Tesserae code produces a given set of results,
reproducibility of results could be guaranteed while also permitting incremental
improvements to Tesserae’s core software.
It will also be interesting to see how the TIS-API serves the growing movement within
digital classics to provide better interoperability between collections and tools
emerging from different projects within the field [
Burns 2019]. Such cross-project
collaborations within digital classics hold much promise for enabling new modes of
inquiry across well-known editions and newly digitized works.
Beyond digital classics, we hope the TIS-API will serve as a helpful point of comparison
for other digital humanities projects. A future in digital humanities where larger
questions can be answered through the aggregation of data from multiple sources will be
more easily realized when individual projects provide an API to make their data more
machine-accessible. The considerations made in the design and implementation of the
TIS-API, including the adoption of REST principles and the analysis of Tesserae’s
intertext discovery process, can serve as inspiration for other projects as they make
decisions on how to design and implement their own APIs.
Conclusion
The TIS-API enables machine-accessibility of Tesserae intertext discovery capabilities.
Machine-accessibility is achieved by following the software industry standards known as
the REST principles, which encouraged thinking about Tesserae’s capabilities as
resources and considering how to act upon those resources. In particular, the REST
principles are evident in how Tesserae’s texts are made available, how stopwords are
computed from Tesserae’s text collection, and how Tesserae search is implemented without
saving application state either in the client or the server.
Making Tesserae machine-accessible yields two main benefits. First, it allows for
parallelizing development efforts in the Tesserae software, as evidenced by the
simultaneous development of both the TIS-API implementation and the new Tesserae
website. The second benefit is in the promising possibilities for those not affiliated
with the Tesserae project. Specifically, the ability to document search procedures
exactly should lead to improved scholarly practices in computational literary criticism.
We expect that other digital humanities projects can likewise reap benefits from
upgrading their software with machine-accessibility.
Acknowledgments
This work was made possible by NEH Digital Humanities Advancement Grant
HAA-258767-18. PJB acknowledges the support of the Quantitative Criticism Lab and the
Institute for the Study of the Ancient World while conducting this work.
Works Cited
Allen 2011 Allen, G. Intertextuality. Routledge, London. (2011).
Bernstein 2015 Bernstein, Neil, Kyle Gervais, and Wei Lin. “Comparative rates of text reuse in classical Latin hexameter poetry.” Digital Humanities Quarterly 9.3 (2015).
Burns 2017 Burns, Patrick J. “Measuring and Mapping Intergeneric Allusion in Latin
Poetry using Tesserae.”
Journal of Data Mining and Digital Humanities (2017).
https://jdmdh.episciences.org/3821.
Burns 2019 Burns, Patrick J. “Building a text analysis pipeline for classical
languages.” In Digital Classical Philology: Ancient Greek and Latin in the Digital
Revolution. Berlin: De Gruyter (2019). pp. 159–76.
Büchler 2013 Büchler, M. “Informationstechnische Aspekte des Historical Text Re-use
(English: Computational Aspects of Historical Text Re-use).” Ph.D. thesis, Leipzig
(2013). See also
http://www.etrap.eu/research/tracer/.
Chaudhuri 2015 Chaudhuri, Pramit, Joseph P. Dexter, and Jorge A. Bonilla Lopez.
“Strings, Triangles, and Go-betweens: Intertextual Approaches to Silius’ Carthaginian
Debates.”
Dictynna. Revue de poétique latine 12 (2015). See also
https://www.qcrit.org/filum.
Coffee 2012a Coffee, N. “Intertextuality in Latin Poetry.” Oxford Bibliographies in
Classics. D. Clayman, (ed). Oxford University Press, New York (2012).
Coffee 2012b Coffee, N., J.-P. Koenig, S. Poornima, R. Ossewarde, C. Forstall and S.
Jacobson. “Intertextuality in the Digital Age.” TAPA 142, no. 2 (2012): 381-419.
Dobson 2019 Dobson, J.E. Critical Digital Humanities: The Search for a Methodology.
University of Illinois Press, Champaign, IL (2019).
Edmunds 2001 Edmunds, L. Intertextuality and the Reading of Roman Poetry. Johns Hopkins
University Press, Baltimore (2001).
Fielding 2000 Fielding, Roy T. “Architectural styles and the design of network-based
software architectures.” Vol. 7. Ph.D. thesis, University of California, Irvine
(2000).
Fielding 2017 Fielding, Roy T., Richard N. Taylor, Justin R. Erenkrantz, Michael M.
Gorlick, Jim Whitehead, Rohit Khare, and Peyman Oreizy. “Reflections on the REST
architectural style and principled design of the modern web architecture (impact paper
award).” Proceedings of the 2017 11th Joint Meeting on Foundations of Software
Engineering. ACM (2017).
Forstall 2014 Forstall, Christopher, Neil Coffee, Thomas Buck, Katherine Roache, and
Sarah Jacobson.
Modeling the scholars: Detecting intertextuality through enhanced
word-level n-gram matching. Digital Scholarship in the Humanities 30, no. 4 (2014):
503-515. See also
http://tesserae.caset.buffalo.edu/.
Hunt 2000 Hunt, A. and Thomas, D. The Pragmatic Programmer: From Journeyman to Master.
Addison-Wesley, Boston (2000).
Juvan 2008 Juvan, M.
Towards a History of Intertextuality in Literary and Culture
Studies. CLCweb: Comparative Literature And Culture 10, no. 3 (2008).
https://doi.org/10.7771/1481-4374.1370.
Lee 2007 Lee, John. “A computational model of text reuse in ancient literary texts.”
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
(2007).
Mastandrea 2011 Mastandrea, Paolo, Massimo Manca, L. Spinazzè, L. Tessarolo, and F.
Boschetti. “Musisque Deoque: Text Retrieval on Critical Editions.”
Journal for Language
Technology and Computational Linguistics 26 (2011): 129-140. See also
http://mizar.unive.it/mqdq/public/.