Language DNA: Visualizing a Language Decomposition Adam James Bradley, University of Waterloo; Travis Kirton, University of Calgary; Mark Hancock, University of Waterloo; Sheelagh Carpendale, University of Calgary
Abstract
[en]
In the Digital Humanities, there is a fast-growing body of research that uses
data visualization to explore the structures of language. While new techniques
are proliferating they still fall short of offering whole language
experimentation. We provide a mathematical technique that maps words and symbols
to ordered unique numerical values, showing that this mapping is one-to-one and
onto. We demonstrate this technique through linear, planar, and volumetric
visualizations of data sets as large as the Oxford English Dictionary and as
small as a single poem. The visualizations of this space have been designed to
engage the viewer in the analogic practice of comparison already in use by
literary critics but on a scale inaccessible by other means. We studied our
visualization with expert participants from many fields including English
studies, Information Visualization, Human-Computer Interaction, and Computer
Graphics. We present our findings from this study and discuss both the
criticisms and validations of our approach.
The Archive as Repertoire: Transience and Sustainability in
Digital Archives Miguel Escobar Varela, National University of Singapore
Abstract
[en]
Digital archives change more quickly than traditional ones: they are adaptable
and transient. This has advantages and disadvantages; digital archives can
disappear from sight almost instantly but they can also be easily safeguarded
and restored. Borrowing the critical vocabulary of performance studies, digital
archives could thus be understood as “repertoires” rather than traditional
archives. By treating digital archives as repertoires, this article explores
different threats and opportunities presented by their volatile nature and makes
policy and technical recommendations on how to ensure their relevance and
sustainability.
Digital library search preferences amongst historians and
genealogists: British History Online user surveyAdam Crymble, University of Hertfordshire
Abstract
[en]
This paper presents the results of a study of 1,439 users of British History
Online (BHO). BHO is a digital library of key printed primary and secondary
sources for the history of Britain and Ireland, with a principal focus on the
period between 1300 and 1800. The collection currently contains 1,250 volumes,
and 120,000 web pages of material. During a website rebuild in 2014, the project
team asked its registered users about their preferences for searching and
browsing the content in the collection. Respondents were asked about their
current search and browsing behaviour, as well as their receptiveness to new
navigation options, including fuzzy searching, proximity searching, limiting
search to a subset of the collection, searching by publication metadata, and
searching entities within the texts such as person names, place names, or
footnotes. The study provides insight into the unique and often converging needs
of the site’s academic and genealogical users, noting that the former tended to
respond in favour of options that gave them greater control over the search
process, whereas the latter generally opted for options to improve the efficacy
of targeted keyword searching. Results and recommendations are offered.
Machine Reading the Primeros
LibrosHannah Alpert-Abrams, University of Texas at Austin
Abstract
[en]
Early modern printed books pose particular challenges for automatic
transcription: uneven inking, irregular orthographies, radically multilingual
texts. As a result, modern efforts to transcribe these documents tend to produce
the textual gibberish commonly known as “dirty OCR” (Optical Character
Recognition). This noisy output is most frequently seen as a barrier to access
for scholars interested in the computational analysis or digital display of
transcribed documents. This article, however, proposes that a closer analysis of
dirty OCR can reveal both historical and cultural factors at play in the
practice of automatic transcription. To make this argument, it focuses on tools
developed for the automatic transcription of the Primeros
Libros collection of sixteenth century Mexican printed books. By
bringing together the history of the collection with that of the OCR tool, it
illustrates how the colonial history of these documents is embedded in, and
transformed by, the statistical models used for automatic transcription. It
argues that automatic transcription, itself a mechanical and practical tool,
also has an interpretive effect on transcribed texts that can have practical
consequences for scholarly work.
Information access in the art history domain: Evaluating a
federated search engine for Rembrandt research Suzan Verberne, Radboud University; Lou Boves, Radboud University; Antal van den Bosch, Radboud University
Abstract
[en]
The art history domain is an interesting case for search engines tailored to the
digital humanities, because the domain involves different types of sources
(primary and secondary; text and images). One example of an art history search
engine is RemBench, which provides access to information in four different
databases related to the life and works of Rembrandt van Rijn. In the current
paper, RemBench serves as a case to (1) discover the requirements for a search
engine that is geared towards the art history domain and (2) make
recommendations for the design of user observation studies for evaluating the
usability of a search engine in the art history domain, and in digital
humanities at large.
A user observation study with nine participants confirms that the combination of
different source types is crucial in the art history domain. With respect to the
user interface, both free-text search and facet filtering are actively used by
the observed participants but we observe strong individual preferences. Our key
recommendation for specialized search engines is the use of faceted search (free
text search combined with filtering) in combination with federated search
(combining multiple resources behind one interface). In addition, the user study
shows that the usability of domain-specific search engines can successfully be
evaluated using a thinking-aloud protocol with a small number of participants.
Attention Ecology: Trend Circulation and the Virality
ThresholdNicholas M Van Horn, Capital University; Aaron Beveridge, University of Florida; Sean Morey, University of Tennessee, Knoxville
Abstract
[en]
This article demonstrates the use of data mining methodologies for the study and
research of social media in the digital humanities. Drawing from recent
convergences in writing, rhetoric, and DH research, this article investigates
how trends operate within complex networks. Through a study of trend data mined
from Twitter, this article suggests the possibility of identifying a
virality threshold for Twitter trends, and the possibility that
such a threshold has broader implications for attention ecology research in the
digital humanities. This article builds on the theories of Jacques Derrida,
Richard Lanham, and Sidney Dobrin to suggest new theories and methodologies for
understanding how attention operates within complex media ecologies at a
macroscopic level. While many various theories and methods have investigated
writing, rhetoric, and digital media at the microscopic level, this article
contends that a complimentary macroscopic approach is needed to further
investigate how attention functions for network culture.
A Macroscope for Global History: Seshat Global History
Databank, a methodological overviewPieter François, University of Hertfordshire, University of Oxford; J.G. Manning, Yale University; Harvey Whitehouse, University of Oxford; Rob Brennan, Trinity College Dublin; Thomas Currie, University of Exeter, Penryn Campus; Kevin Feeney, Trinity College Dublin; Peter Turchin, University of Connecticut
Abstract
[en]
This article introduces the “Seshat: Global History”
project, the methodology it is based upon and its potential as a tool for
historians and other humanists. Seshat is a comprehensive dataset covering human
cultural evolution since the Neolithic. The article describes in detail how the
Seshat methodology and platform can be used to tackle big questions that play
out over long time scales whilst allowing users to drill down to the detail and
place every single data point both in its historic and historiographical
context. Seshat thus offers a platform underpinned by a rigorous methodology to
actually do longue durée history and the article argues for the
need for humanists and social scientists to engage with data driven longue
durée history. The article argues that Seshat offers a much-needed
infrastructure in which different skill sets and disciplines can come together
to analyze the past using long timescales. In addition to highlighting the
theoretical and methodological underpinnings, Seshat's potential is demonstrated
using three case studies. Each of these case studies is centred around a set of
longstanding questions and historiographical debates and it is argued that the
introduction of a Seshat approach has the potential to radically alter our
understanding of these questions.
Racial Proxies in Daily News: A Case Study of the Use of
Directional EuphemismsTimothy Messer-Kruse, Bowling Green State University
Abstract
[en]
This study examines the extent of the use of geographic code words in place of
racial terms in daily news reporting. This is a case study of the only daily
newspaper, the Toledo Blade, in the midwestern city
of Toledo, Ohio. A data set was constructed by searching a nine year collection
of Blade articles, available in full-text
searchable format in a ProQuest database, that included the most frequently used
directional terms and had specific street addresses (a total of 981 stories).
Besides bibliographic data, each story was coded for its location and the
general nature of the story. Street addresses were used to compile relevant
census tract information on the proportion of minorities in each area
referenced. These references were then plotted over a street map of Toledo
revealing geographic distributions that do not relate to actual cardinal
directions. Population data corresponding to each data point was then analyzed
to show that directional terminology correlates with the concentration of
minority population. Additionally, a comprehensive content analysis of all
21,667 Blade articles published in this period revealed racial differences in
reporting. Such quantified observations are reinforced by examination of
particular examples of racialized usage of geographic terms.
Obama’s Sixth Annual Address: Image, Affordance, FlowDan Faltesek, Oregon State University
Abstract
[en]
Recent State of the Union addresses have included a number of new visual
elements, including a running slide show and interactive social media cards.
This paper poses a method for collecting and analyzing these new visual elements
and incorporating the results of that study into the study of presidential
Rhetoric. This article will: (1) situate the enhanced State of the Union within
the study of presidential rhetoric, (2) combine aspects of close and distant
reading for critique of the address, (3) provide the results of the approach to
distant reading taken here, and (4) discuss the implications of the analysis of
this particular visual program as they afford future annual addresses different
opportunities, and constraints.
Developing a Qualitative Coding Analysis of Visual Artwork for
Humanities ResearchTina Budzise-Weaver, Texas A&M University
Abstract
[en]
The field of humanities has now grown into a digital environment challenging
educators and scholars to create, manipulate, and curate data for research and
instruction. The humanities is faced with a digital medium that is changing the
way scholars conduct their exploration of research. This study encourages the
examination of imagery through qualitative coding, or annotation, to reveal
themes and visual stories to further unravel the layers of a visual object.
Images from the work of 1960s pop artists James Rosenquist and Roy Lichtenstein
were evaluated using ATLAS.ti to determine common themes, visual stories, and
aesthetic differences. Qualitative coding is usually associated with textual
data, but using a software analysis tool such as ATLAS.ti can centralize the
collection of data to efficiently code imagery, text, audio, and video. This
case study will be used to introduce researchers, faculty, and students to
qualitative analysis tools and the usefulness of coding to reveal themes in
imagery. Furthermore, librarians have an opportunity to facilitate the learning
of these tools in combination with the various proprietary and open access image
databases housed in the library.
The Digital Future of Humanities through the Lens of DIY
CultureHenriette Roued-Cunlife, University of Copenhagen
Abstract
[en]
This paper asks the question: Do the humanities by necessity have a digital
future? It argues that the answer to this question is both yes and no. The
argument looks through the lens of DIY culture as an attempt to try and
understand the future for the humanities in terms of both cultural material and
processes. The argument is made first by examining the case of information
sharing within DIY culture as an expression of current day cultural material.
Secondly, it illustrated how traditional humanities scholarship, such as reading
ancient documents, compares to it’s DIY equivalent within family history
circles, and how both will continue to use digital and non-digital methods.
Complex Modeling and You: A Review of Would-Be Worlds by John L. Casti (New York: John Wiley & Sons,
1997)Erik Kenneth Shell, University of California Los Angeles
Abstract
[en]
In Would be Worlds: How Simulation is Changing the Frontiers
of Science, John Casti lays out the history, methods and evolution
of simple and complex systems as they exist in the digital world of our
computers and manifest themselves in our daily lives. He does this through a
plethora of examples, ranging from football simulators to internal computer
warfare between lines of code. Casti presents a framework approach to creating
one’s own complex systems for research purposes, and enduringly fosters in his
reader an appreciation of the fundamentals: how such systems behave, what the
best practices are, and how best to think about complex systems.