Topic Modeling Genre: An Exploration of French Classical and
Enlightenment DramaChristof Schöch, University of Würzburg, Germany
Abstract
[en]
The concept of literary genre is a highly complex one: not only are different genres
frequently defined on several, but not necessarily the same levels of description, but
consideration of genres as cognitive, social, or scholarly constructs with a rich history
further complicate the matter. This contribution focuses on thematic aspects of genre with
a quantitative approach, namely Topic Modeling. Topic Modeling has proven to be useful to
discover thematic patterns and trends in large collections of texts, with a view to class
or browse them on the basis of their dominant themes. It has rarely if ever, however, been
applied to collections of dramatic texts.
In this contribution, Topic Modeling is used to analyze a collection of French Drama of
the Classical Age and the Enlightenment. The general aim of this contribution is to
discover what semantic types of topics are found in this collection, whether different
dramatic subgenres have distinctive dominant topics and plot-related topic patterns, and
inversely, to what extent clustering methods based on topic scores per play produce
groupings of texts which agree with more conventional genre distinctions. This
contribution shows that interesting topic patterns can be detected which provide new
insights into the thematic, subgenre-related structure of French drama as well as into the
history of French drama of the Classical Age and the Enlightenment.
Reconstructing a website’s lost past Methodological issues concerning the
history of Unibo.itFederico Nanni, University of Mannheim
Abstract
[en]
This paper describes how to deal with the scarcity of born-digital primary
sources while retrieving materials on the recent past of an academic
institution. The case study is an analysis of the first 25 years online of the
University of Bologna. The focus of this work is primarily methodological:
several different issues are presented, starting with the fact that the
University of Bologna website has been excluded for thirteen years from the
Internet Archive's Wayback Machine, and possible solutions are proposed and
applied. Moreover, this study aims at highlighting how web materials could give
us new and distinct insights into the recent past of academic institutions,
thereby becoming the starting point for several new studies.
Some principles for making collaborative scholarly editions in
digital formPeter Robinson, University of Saskatchewan
Abstract
[en]
“Textual Communities” is a new system for managing and performing all aspects of
an online collaborative scholarly editing project. It permits mounting of
document images and offers page-by-page transcription and display, with the
facility for project leaders to recruit and manage transcribers and other
contributors, allocating and reviewing transcription work as it is done. Most
distinctively, Textual Communities is built on a comprehensive model of
scholarly editing, enabling both “document” (page-by-page) and “work”
(intellectual structure, or “entity”) views of the texts edited. Accordingly,
multiple texts of a single work, or part of a work (an entity) may be extracted
and compared, using an embedded installation of CollateX. While completely
conformant with Text Encoding Initiative guidelines, Textual Communities goes
beyond TEI and XML in its ability to handle multiple overlapping hierarchies
within texts. This paper will outline the thinking behind the development of
Textual Communities, and show examples of its use by several major projects.
A Historical Geographic Information System (HGIS) of Nubia
Based on the William J. Bankes Archive (1815-1822)Daniele Salvoldi, Dahlem Research School, Freie Universität Berlin
Abstract
[en]
The William J. Bankes Archive, Dorchester, is an impressive collection of
original material concerning the archaeological, anthropological and natural
heritage of Nubia and was amassed in the years 1815-1822. In the last two
hundred years, many geo-human factors caused radical changes in the region. In a
landscape almost untouched for centuries, the signs of the interactions between
the ancient human communities and the natural environment were much clearer in
Bankes’ times than now. Digital humanities offer powerful tools to manage and
visualize large amounts of data and GIS in particular is an effective form of
relational database, where all items of data have a position on the earth. This
paper presents the methodology and the preliminary results of a research project
that aims at a draft reconstruction of ancient Nubia based on the Bankes
Archive. Archaeological, historical, natural history and ethnographic
information extracted from the documents will be georeferenced in the GIS.
Original maps, landscape views and epigraphic copies will also be made available
on-line.
Mining for characterising patterns in literature using
correspondence analysis: an experiment on French novelsFrancesca Frontini, Université Paul-Valéry Montpellier 3 - Praxiling UMR 5267 CNRS - UPVM3; Mohamed Amine Boukhaled, Laboratoire d'Informatique de Paris 6 (LIP6 UPMC) / Labex OBVIL; Jean-Gabriel Ganascia, Laboratoire d'Informatique de Paris 6 (LIP6 UPMC) / Labex OBVIL
Abstract
[en]
This paper presents and describes a bottom-up methodology for the detection of
stylistic traits in the syntax of literary texts. The extraction of syntactic
patterns is performed blindly by a sequential pattern mining algorithm, while
the identification of significant and interesting features is performed at a
later stage by using correspondence analysis and by ranking patterns by
contribution.
Exploratory Search Through Visual Analysis of Topic
ModelsPatrick Jähnichen, Machine Learning Group, Humboldt-Universität zu Berlin; Patrick Oesterling, Image and Signal Processing Group, Leipzig University, Germany; Gerhard Heyer, Natural Language Processing Group, Leipzig University, Germany; Tom Liebmann, Image and Signal Processing Group, Leipzig University, Germany; Gerik Scheuermann, Image and Signal Processing Group, Leipzig University, Germany; Christoph Kuras, Natural Language Processing Group, Leipzig University, Germany
Abstract
[en]
This paper addresses exploratory search in large collections of historical texts.
By way of example, we apply our method to a collection of documents comprising
dossiers of the former East-German Ministry for State Security, and classical
texts. The bases of our approach are topic models, a class of algorithms that
define and infer themes pervading the corpus as probability distributions over
the vocabulary. Our topic-centered visual metaphor supports to explore the
corpus following an intuitive methodology: First, determine a topic of interest,
second, suggest documents that contain the topic with "sufficient" proportion,
and third, browse iteratively through related topics and documents. Our main
focus lies on providing a suitable bird's eye view onto the data to facilitate
an in-depth analysis in terms of the topics contained.
Diachronic trends in Homeric translationsYuri Bizzoni, University of Gothenburg; Marianne Reboul, Université Paris-Sorbonne; Angelo Del Grosso, Institute for Computational Liguistics A. Zampolli
Abstract
[en]
In this paper we intend to present a tool we developed for translation studies
and diachronically compare various French translations of the Odyssey.
This field of study is part of the more general “Classical Receptions”
studies that try to analyse the influence and adaptation of classical texts in
modern and contemporary literature, theatre, cinema, and many other artistic
fields. While Greek texts have been analysed by scholars for more than two
thousand years, research about classical translations is not yet a most renown
subject. In recent years this theme has raised a growing interest in the
academic community.
We developed a program that can align textual sequences (defined as groups of
words delimited by a specified grammatical pivot, in our case proper nouns),
without need of previous training. We obtained alignments for many different
kinds of translationsEven free translations, a problem that wasn’t
generally considered by textual aligners since recent studies. While
other programs have an upper bound for one-to-many alignments (for example with
a maximum of four translated elements aligned to the same original element) this
algorithm allows an indefinite number of alignments, both for the source
sequences and the target ones. The aligner is based on an implementation of
Needleman-Wunsch algorithm and on a string-based similarity approach to textual
segments. The aligner needs to establish proper names as anchor words, as they
are a relatively stable feature through different translations and tend to be
similar in several languages.
Thanks to the alignments obtained using the program, we can explore translations
in a number of ways. We will illustrate the creation of a graphical interface to
visualize French Homeric translations.
With our tool, it is possible to highlight aligned portions of texts and show
their immediate differences or similarities, both in meaning and in syntactic
distribution.
We will show some resulting syntactic analyses carried out on a small sample of
texts, taken from a corpus of twenty-seven unabridged French translations of the
Odyssey and explore how the study of diachronic translations through algorithms
of computational linguistics can produce interesting results for literary and
linguistic studies.
Comparing Disciplinary Patterns: Exploring the Humanities
through the Lens of Scholarly CommunicationDaniel Burckhardt, Humboldt-Universität zu Berlin
Abstract
[en]
For the past fifteen year, scholarly communication networks such as H-Soz-Kult –
the German Information Service for Historians – and H-ArtHist – a specialized
discussion and information network for art history based in Germany with an
international reach – have been steadily publishing conference announcements and
reports. Since both services were born digitally, starting with the listserv
infrastructure of the Michigan based H-Net and later supplemented by
database-driven web sites, the archives are easily accessible by electronic
means. The aim of this paper is to demonstrate that the archives of scholarly
communication provide a suitable basis for conducting an assessment of broad
fields such as German historians or German art
history, with relatively low technical effort. For the initial analysis
of H-Soz-KultFirst presented at the Historical Network Research Conference
2014 in Ghent., editorial practices facilitated the automated
extraction of the speakers’ names as a key feature. But even in cases where no
such special markup has been applied, freely available Web services such as
AlchemyAPIAlchemyAPI Entity Extraction, http://www.alchemyapi.com/products/alchemylanguage/entity-extraction.
provide methods that can be used to achieve comparable results.Thanks to
Victoria H. Scott for suggesting analyzing H-ArtHist’s conference
announcements in similar ways.
Friedrich Kittler's Digital Legacy – PART I - Challenges,
Insights and Problem-Solving Approaches in the Editing of Complex Digital Data
CollectionsJürgen Enge, FHNW Academy of Art and Design, Basel; Heinz WernerKramski, German Literature Archive Marbach
Abstract
[en]
Three years after his death Friedrich Kittler’s impact on the Humanities and
Media Studies remains a topic of interest to scholars worldwide. The
intellectual challenges presented by his theoretical work, however, are now
complemented by the practical and archival difficulties of dealing with his
personal digital legacy. How are we to preserve, survey and index the complex
data collection Kittler bequeathed to the German Literature Archive in Marbach
in the shape of old computers and hard drives? How are the Digital Humanities to
handle the archive of one of its most important forefathers? To address these
questions, this paper will first focus on the estate itself and then describe
the design and development of the “Indexer”, a tool
for the initial indexing of technical information. Two especially problematic
aspects are the sheer mass of files (more than 1.5 million) and Kittler's
idiosyncratic organization, both of which serve to make conventional content
evaluation very difficult. Here, the “Indexer” has
proven to be a powerful tool. Finally, a case study using the indexer's web
interface will enable us to tackle the question: When and to what purpose did
Friedrich Kittler acquire a computer?
Friedrich Kittler's Digital Legacy – PART II - Friedrich
Kittler and the Digital Humanities: Forerunner, Godfather, Object of Research.
An Indexer Model Research Susanne Holl, Berlin, Germany
Abstract
[en]
Three years after his death Friedrich Kittler’s impact on the Humanities and
Media Studies remains a topic of interest to scholars worldwide. The
intellectual challenges presented by his theoretical work, however, are now
complemented by the practical and archival difficulties of dealing with his
personal digital legacy. How are we to preserve, survey and index the complex
data collection Kittler bequeathed to the German Literature Archive in Marbach
in the shape of old computers and hard drives? How are the Digital Humanities to
handle the archive of one of its most important forefathers? To address these
questions, the presentation will first focus on the estate itself and then
describe the design and development of the "Indexer", a tool for the initial
indexing of technical information. Two especially problematic aspects are the
sheer mass of files (more than 1.5 million) and Kittler's idiosyncratic
organization, both of which serve to make conventional content evaluation very
difficult. Here, the "Indexer" has proven to be a powerful tool. Finally, a case
study using the indexer's web interface will enable us to tackle the question:
When and to what purpose did Friedrich Kittler acquire a computer?
Automated Pattern Analysis in Gesture Research: Similarity
Measuring in 3D Motion Capture Models of Communicative ActionDaniel Schüller, Natural Media Lab, Human Technology Centre, RWTH Aachen University; Christian Beecks, University of Münster; Marwan Hassani, Data Management and Exploration Group, RWTH Aachen University; Jennifer Hinnell, Department of Linguistics, University of Alberta; Bela Brenger, Natural Media Lab, Human Technology Centre, RWTH Aachen University; Thomas Seidl, Ludwig Maximilian University of Munich; Irene Mittelberg, Natural Media Lab, Human Technology Centre, RWTH Aachen University
Abstract
[en]
The question of how to model similarity between gestures plays an important role in
current studies in the domain of human communication. Most research into recurrent
patterns in co-verbal gestures – manual communicative movements emerging
spontaneously during conversation – is driven by qualitative analyses relying on
observational comparisons between gestures. Due to the fact that these kinds of
gestures are not bound to well-formedness conditions, however, we propose a
quantitative approach consisting of a distance-based similarity model for gestures
recorded and represented in motion capture data streams. To this end, we model
gestures by flexible feature representations, namely gesture signatures, which are
then compared via signature-based distance functions such as the Earth Mover's
Distance and the Signature Quadratic Form Distance. Experiments on real
conversational motion capture data evidence the appropriateness of the proposed
approaches in terms of their accuracy and efficiency. Our contribution to gesture
similarity research and gesture data analysis allows for new quantitative methods of
identifying patterns of gestural movements in human face-to-face interaction, i.e.,
in complex multimodal data sets.