Allison Cooper is Assistant Professor of Romance Languages and Literatures and Cinema Studies at Bowdoin College and project director of Kinolab. Her research is on the relationship between moving image and computational analysis, focusing in particular on the digital analysis of film language.
Fernando Nascimento is Assistant Professor in Digital and Computational Studies at Bowdoin College teaching courses and researching on digital text analysis, philosophy of technology and hermeneutics. He is currently lead collaborator of the Kinolab project, co-director of the Digital Ricoeur project, and director of the Society for Ricoeur Studies.
David Francis is Senior Interactive Developer in Academic Technology and Consulting at Bowdoin College. He provides technical support for various digital initiatives for the Bowdoin College Museum of Art, the Arctic Museum, the Bowdoin Library's Special Collections, and faculty initiatives.
This is the source
This article presents a case study of Kinolab, a digital platform for the analysis of narrative film language. It describes the need for a scholarly database of clips focusing on film language for cinema and media studies faculty and students, highlighting recent technological and legal advances that have created a favorable environment for this kind of digital humanities work. Discussion of the project is situated within the broader context of contemporary developments in moving image annotation and a discussion of the unique challenges posed by computationally-driven moving image analysis. The article also argues for a universally accepted data model for film language to facilitate the academic crowdsourcing of film clips and the sharing of research and resources across the Semantic Web.
This article presents a case study of Kinolab, a digital platform for the analysis of narrative film language.
Today, decades after the earliest experiments with DH methodologies, scholars hoping to apply DH approaches to the study of audiovisual media continue to find themselves at somewhat of a disadvantage relative to colleagues working with text-based media. Impediments to computationally assisted analysis of moving images have been well documented and are both technological and legal in nature. In recent years, projects like Dartmouth's Media Ecology Project and the University of Richmond's Distant Viewing Lab, among others, have lowered technological barriers by making inroads into moving image annotation and the application of computer vision to moving image analysis. In 2018, the Library of Congress lowered legal barriers in the United States with the most recent round of exemptions to the Digital Millennium Copyright Act (DMCA), granting increased freedom to excerpt short portions of films, television shows, and videos for the purposes of criticism or comment and thereby removing a hurdle to DH-inflected forms of moving image analysis such as videographic criticism. Despite the advances described above, film and media studies scholars are still unable to analyze the moving images digitally that are the subject of their research with anywhere near the ease of DH practitioners working with text or other forms of data.
One illustration of this predicament is the ongoing lack of a database dedicated to
something as seemingly straightforward as the analysis of film language. As Lucy Fischer
and Patrice Petro lamented in their introduction to the 2012 MLA anthology
Why should cinema scholars pursue DH approaches when, seemingly, they are so fraught with
challenges? One answer to the question can be found in the methodology of a groundbreaking
analysis in our field that took place before the first wave of DH scholarship in the 1990s
and early 2000s and led to the definition of the group style known as classical Hollywood
cinema
A related answer to the question of why cinema scholars might seek to incorporate DH methodologies into their work can be found on the IMDb Statistics page (see https://www.imdb.com/pressroom/stats/), which at the time of this writing included over half a million film titles in its database. Lev Manovich (2012) has argued that, before the global expansion of digital media represented by these kinds of numbers, "cultural theorists and historians could generate theories and histories based on small data sets (for instance, 'Italian Renaissance,' 'classical Hollywood cinema,' 'post-modernism', etc.)" but now we face a "fundamentally new situation and a challenge to our normal ways of tracking and studying culture" (250). For the Kinolab project, this new situation presents an opportunity to broaden our understanding of how film language works by creating a platform capable of sorting and clustering hundreds of aspects of film language along multiple dimensions such as region, genre, language, or period, among others.
We anticipate that our DH approach to the analysis of film language will allow researchers to move between different scales of analysis, enabling us, for example, to understand how a particular aspect of film language functions in the work of a single director, in a single genre, or across films from a particular time period or geographical region. We also anticipate that decontextualizing and remixing examples of film language in these ways will enable us to see what we might not have seen previously, following Manovich's assertion that "Being able to examine a set of images along a singular visual dimension is a powerful form of defamiliarization" (276). We argue that the collaborative development of a data model for film language, essential for the creation of a common understanding among cinema and media studies researchers as well as for their collaboration across the Semantic Web, will clarify and extend our knowledge of film language in the process of making its constitutive components and their relationships comprehensible to computers. And, finally, we expect that these efforts, made possible through the adoption of DH methodologies, will enable us to make more confident statements about the field of cinema studies at large.
Our research has found few scholarly, open access projects dedicated to the digital
analysis of film language – a situation likely due at least in part to the technological
and legal barriers indicated above. Among the projects that do exist is the Columbia Film
Language Glossary (FLG) (see https://filmglossary.ccnmtl.columbia.edu/), a teaching tool designed to offer
users illustrated explanations of key film terms and concepts
The efforts described above to make narrative moving image media available digitally for
educational and scholarly purposes are complemented by projects developing promising tools
for the digital analysis of moving images. Estrada et al.
Even as machine learning projects like the MEP and Distant Viewing Lab bring scholars of
moving images closer to the kind of distant reading now being performed on digitized
literary texts, their creators acknowledge an ongoing need for human interpreters to
bridge the semantic gap created when machines attempt to interpret images meaningfully.
Researchers can extract and analyze semantic information such as lighting or shot breaks
from visual materials only after they have established and encoded an interpretive
framework
A frequent topic in digital humanities concerns the balance between data annotation and
machine learning. Manovich
In the field of Natural Language Processing (NLP), annotations of parts of speech have
greatly assisted in the advancement of text mining, analysis, and translation techniques.
Pustejovsky and Stubbs have suggested the importance of annotation to enhance the quality
of machine learning results: "machine learning (ML) techniques often work better when
algorithms are provided with pointers to what is relevant about a dataset, rather than
just massive amounts of data"
Even more recent advances in machine learning, especially in the area of neural
networks and deep learning
These advances of digital text analysis seem to point to a trend toward a diminishing need for annotation to achieve results similar to or superior to those that were possible in the past with annotated data set training alone. However, despite the many advances we have described so far, there are still higher levels of semantic information (such as complex narrative structures or highly specialized interpretative fields) that require manual annotation to be appropriately analyzed.
From this brief exploration of the relationship between annotation and machine learning
algorithms in the context of text analysis, we highlight three related observations.
First, there has been a continuing and evolving interplay of annotation and machine
learning. Second, recent machine learning algorithms have been reducing the need of
extensive annotation of textual corpora for some interpretative and linguistic analyses.
And thirdly, manual annotation still has a role for higher-level semantic analyses, and
still plays an essential role in the training of machine learning models. With these three
observations related to developments in text analysis, we are better positioned to
understand a similar relationship in the context of time-based media. For this purpose, we
take as reference the Distant Viewing framework proposed by Arnold and Tilton, which they
define as "the automated process of making explicit the culturally coded elements of
images" (5). The point, well noted by the authors, is that the code elements of images are
not as clearly identifiable as the code elements of texts, which are organized into
lexical units and relatively well-delimited syntactic structures in each natural language.
Indeed, as Metz
Thus, digital image analysis imposes the need for an additional level of coding – in
Kinolab's case, curatorial annotations – so that the semiotic elements comprising film
language are properly identified. As discussed earlier, Arnold and Tilton highlighted the
semantic gap that exists between "elements contained in the raw image and the extracted
structured information used to digitally represent the image within a database"
Mechanisms to bridge this semantic gap may either be built automatically through
computational tools or by people who create a system of annotations to identify these
semiotic units. Moreover, these semiotic units can be grouped hierarchically into higher
levels of meanings, creating a structure that ranges from basic levels of object
recognition, such as a cake, to more abstract levels of meaning, such as a birthday party.
Such analysis becomes more complex when we consider time-based media since its temporal
aspect adds a new dimension to potential combinations, which adds new possible
interpretations of meanings to images considered separately. An example taken from
Jonathan Demme's
The Distant Viewing framework proposes an automatic process to analyze and extract
primary semantic elements "followed by the aggregation and visualization of these elements
via techniques from exploratory data analysis"
Kinolab creates a framework to explore the intermediate levels in this semiotic hierarchy by defining annotations that form a set of higher-level semiotic units of film language relative to basic units such as the cut or other types of edits and allows the description of common categories for understanding time-based media characteristics. Such semiotic units form the basis of a film language that describes the formal aspects of this type of digital object.
Kinolab is structured to help researchers reduce the semantic gap in digital film language analysis in three distinct ways. The most basic form is through a collaborative platform for consistent identification of semiotic units of film language in film clips, allowing sophisticated searches to be done immediately utilizing them. The Kinolab software architecture is also designed for integrating distant viewing plugins so that some film language forms can be automatically recognized by machine learning algorithms from the scientific community. This plugin would also allow subsequent exploratory data analysis based on Kinolab's archive. Finally, Kinolab can serve as a resource for applying, validating, and enhancing new distant viewing techniques that can use the database with information about film language to develop training datasets to validate and improve their results. Given Kinolab's architecture, it can produce a standard machine-readable output that supplies a given clip URL with a set of associated tags that a machine learning algorithm could integrate as training data to learn examples of higher-level semantic annotations, such as a close-up shot. What is lacking in Kinolab towards this goal is specific timestamp data about when a certain film language form is actually occurring (start/stop) which, combined with automatically extracted basic sign recognition (e.g. objects, faces, lighting), would be extremely valuable for any machine learning processes. The existing architecture could be expanded to allow this with the addition of a clip-tag relationship to include this duration information, however the larger work would be identifying and inputting this information into the system. One possible way to address this limitation is to integrate a tool like the aforementioned Media Ecology Project's Semantic Annotation Tool (SAT) into Kinolab. The SAT can facilitate the effort to create more finely grained annotations to bridge the gap between full clips and respective tags, providing a more refined training dataset.
With these extensions and within this collaborative ecosystem of complementary tools we believe that Kinolab could serve as an ideal platform for exploring the full spectrum of combinations between manual annotations and machine learning techniques that will foster new interpretative possibilities of time-based media in a manner analogous to advances in the area of digital text analysis.
Kinolab is a digital platform for the analysis of narrative film language yet, as
previous discussion has suggested, 'film language' is a fluid concept that requires
defining in relation to the project's objectives. The conceptualization of film as a
language with its own set of governing rules or codes has a rich history that dates back
to the origins of the medium itself. This includes contributions from key figures like
D.W. Griffith, Sergei Eisenstein
Our primary objective in developing Kinolab was to create a rich, DMCA-compliant platform for the analysis of narrative media clips annotated to highlight distinctive use of film language.
The platform we envisioned would facilitate comparisons across clips and, to this end, feature advanced search options that could handle everything from simple keyword searches to searches using filters and Boolean terms. A secondary objective was to develop an easy-to-use contribute function so that users wishing to add their own legally obtained narrative media clips to the collection could do so with relative ease, thereby building into Kinolab the capacity for academic crowdsourcing. Ultimately, the simple design that we settled on invites verified academic users into the collection through four principal entry points accessed via the site's primary navigation (see Figure 3): Films and Series, Directors, Genres, and Tags. The terminus of each of these pathways is the individual clip page, where users can view a clip and its associated film language tags, which link to other clips in the collection sharing the same tag, and, if desired, download the clip for teaching or research purposes. Additional entry points accessed via the primary navigation bar include the Contribute (see Figure 4) and Search (see Figure 5) functions. Users can contribute their own narrative media clips via a simple interface designed to facilitate the curatorial process for project members working in Kinolab's back end. Academic crowdsourcing is standardized via a controlled vocabulary of film language terms (discussed further in Section Five: Working Toward a Data Model for Film Language). The Search function queries all of the fields associated with a clip in Kinolab's database, including informational metadata akin to what one would find in an IMDb film or series episode entry and content metadata supplied by Kinolab curators and contributors. Kinolab curators – project faculty, staff, and students – have access to the back end of the Contribute function, where they can evaluate and edit submitted clips and their metadata (informational and content metadata including film language tags) and approve or reject submissions to the collection.
The vast majority of Kinolab's file system overhead goes to storing audiovisual clips. Accordingly, we built the first implementation of Kinolab on a system that could handle most of the media file management for us. Our priority was finding an established content management system that could handle the intricacies of uploading, organizing, annotating, and maintaining digital clips. To meet this goal, we initially adopted Omeka, a widely used and well-respected platform with a proven record for making digital assets available online via an easy-to-use interface (see https://omeka.org/). Built to meet the needs of museums, libraries, and archives seeking to publish digital collections and exhibitions online, Omeka's features made it the most appealing out-of-the-box solution for our first release of Kinolab. These features included: an architecture stipulating that Items belong to Collections, a relationship analogous to clips belonging to films; almost limitless metadata functionality, facilitating deep descriptive applications for film clips; a tagging system that made applying film language identifiers simple and straightforward; a sophisticated search interface capable of performing complex searches; and, finally, a built-in administrative backend capable of handling a significant part of the project's file and database management tasks behind the scenes.
Omeka's ease of use came with some significant restrictions, however. Its functionality for describing Collections through metadata was far more limited than that for Items. This limitation makes sense for the cultural heritage institutions that are Omeka's primary users, which need extensive descriptive metadata for individual items comprising a collection rather than for the collection itself. In Kinolab's case, however, an Omeka 'Collection' was analogous to an individual film, and we struggled with our inability to attach key metadata relevant to a film as a whole at the Collection level (for example, cinematographer, editor, etc.). The constraints of Omeka's model became more pronounced as the project expanded beyond films to include series. This expansion entailed moving from a relatively straightforward Film-Clips relationship to the more complicated relationship between collections and items Series-Seasons-Episodes-Clips, which Omeka's generic model couldn't represent. The inclusion of series also confounded Omeka's search operation, which did not operate in a way that could factor in our increasingly complex taxonomies. As Kinolab grew, so did our need for functionalities that Omeka could not provide, ranging from the ability to select thumbnail images from specific video frames to the ability to specify extra relational concepts. Omeka's rich development community and plugins could have moved us toward some of these goals, but as we continued to add plugins and to customize the core feature set of Omeka, we were forced to recognize that the time and cost of the alterations were outweighing the benefits we gained from a pre-packaged system. Indeed, we had altered the base code so much that we could no longer claim to be using Omeka as most people understood it. That meant that upgrades to Omeka and its plugins could prove problematic as they could potentially affect areas of code we had modified to meet our goals.
Moving away from Omeka gave us the freedom to take the Kinolab concept back to the data modeling phase and define a database backend specifically for our project. We were able to implement the user interface collaboratively, module by module, with all team members, which helped flush out additional requirements and desirable features in easy-to-regulate advances. The system we ended up building used many of the same tools as Omeka.
The system requirements for Kinolab read much like those for Omeka and include a Linux operating system, Apache HTTP server, MySQL, and PHP scripting language.
Perhaps the most significant change that we made in the move from Omeka to a platform of our own design concerns metadata collection. In the first, Omeka-based implementation of Kinolab, project curators manually gathered informational metadata for films and series from IMDb.com and physical DVDs, subsequently uploading that metadata into Omeka's back end as part of a labor-intensive curatorial workflow. We eventually understood the project to be less about collecting media data than about aggregating annotations in service of film language analysis. We recognized that, if we were to continue attempting to collect and store all of the significant metadata describing films and series ourselves, we would be spending considerable energy duplicating efforts that existed elsewhere. This realization led us to partner with a third party, TMDb (The Movie Database) to handle the project's general metadata needs. For our new Kinolab implementation, we do store some descriptive data particular to the project in order to seed our search interface, but for the most part we rely on TMDb to be the actual source data and direct our users to that source whenever possible, enabling us to focus more narrowly on clip annotation.
Unlike IMDb, TMDb has a clear message of open access and excellent documentation. In
testing, it offered as much and sometimes more information than one could access on IMDb.
We have concerns about the long-term reliability of a less established source like TMDb
over a recognized entity such as IMDb, but since we only make use of this data
tangentially we decided that it is provisionally the best option. The metadata that TMDb
provides is important for helping to locate and contextualize Kinolab clips, but the
project is not attempting to become a definitive source for providing information about
the films and series from which they are excerpted. Consequently, we simply reference this
kind of metadata via TMDb's APIs or direct Kinolab users to the TMDb site itself. The lack
of an accessible, authoritative scholarly database dedicated to narrative films and series
is an ongoing problem shared by the entire field of media studies
Early in Kinolab's development, we confronted a tension between the expansive concept of film language and the need to define it methodically for computational purposes. Problematically, clips initially contributed to the project, for example, could illustrate the same cinematographic concept using synonymous but different terms, complicating the indexing and retrieval of clips. For example, a shot in which the camera frame is not level with the horizon was defined differently (and correctly) by contributors as either dutch angle, dutch tilt, or canted angle. Alternatively, a clip might be identified with a single form of film language but not with its parent form. For example, the sequence shot, in which an entire sequence is rendered in a single shot, is a child of the long take, a shot of relatively lengthy duration, thus identifying the one ought to also identify the other.
Though different in kind, these and other related issues we encountered demonstrated the
need to situate individual film language concepts within a broader, machine-readable model
of film language such as a thesaurus or ontology. The first case cited above, involving
the interchangeability of dutch angle, dutch tilt, or canted angle, is a straightforward
problem of synonymy, resolvable through the adoption of a controlled vocabulary for film
language spelling out preferred and variant terms and including synonym ring lists to
ensure Kinolab's ability to return appropriate clips when queried. The second case cited
above, however, demonstrates the need to conceive of film language hierarchically. Both
problems reveal how Kinolab could benefit from a data modeling approach capable of
explicitly defining the "concepts, properties, relationships, functions, constraints, and
axioms" of film language, akin to those proposed by the Getty Research Institute for art,
architecture and other cultural works
Our research revealed the lack of preexisting, authoritative models for film language.
The International Federation of Film Archives (FIAF), for example, offers a "Glossary of
Filmographic Terms" designed to assist film catalogers in the consistent identification
and translation of credit terms, as well as a "Glossary of Technical Terms", for terms
used in film production and the film laboratory, but neither resource could provide the
kind of guidance we sought in organizing and deploying film language consistently. The
Large-Scale Concept Ontology of Multimedia (LSCOM, see http://www.ee.columbia.edu/ln/dvmm/lscom/) is, for now, limited to concepts
related to events, objects, locations, people, and programs and therefore lacking labels
related to film form. The AdA Ontology for Fine-Grained Semantic Video Annotation (see
https://projectada.github.io/) is
promising for its focus on film-analytical concepts, but remains only partially complete.
This led us to take an exploratory first step in that direction in the form of a
controlled list of film language terms, drawn primarily from the glossaries of two widely
adopted cinema studies textbooks, Timothy Corrigan and Patricia White's
This is a modest solution that notably excludes specialized terms and concepts from more
technical areas of film language such as sound, color, or computer-generated imagery.
Moreover, relying upon authoritative introductory texts like
will eventually allow a user to drill down to
Our experience thus far in developing Kinolab has demonstrated that there is a genuine
need for development of a film language ontology with critical input from scholars and
professionals in film and media studies, information science, computer science, and
digital humanities. Beyond the uses described above, this kind of formalized,
machine-readable conceptualization of how film language works in narrative media is also a
logical information-age extension of the critical work that has already been done on film
language and narrative by the figures cited earlier
A robust, well-researched body of literature exists in support of U.S.-based media
scholars wishing to exercise their right to assert fair use
The Kinolab team authored a comprehensive statement detailing the project's adherence to the principles of fair use as well as its compliance with the DMCA in order to secure critical institutional support for the project, which was granted after vetting by Bowdoin College's copyright officer and legal counsel (see http://kinolab.org/ for Kinolab's Statement on Fair Use and the Digital Millennium Copyright Act). Essential as this kind of work is, it is time-consuming and somewhat peripheral to the project's main goal. Moreover, our confidence about finding ourselves on solid legal footing is tempered by the knowledge that that footing does not extend outside of the United States, where Kinolab would fall under the jurisdiction of diverse and, in some cases, more restrictive copyright codes. For now, we echo colleagues whose work has paved the way for Kinolab when we observe that the right to make fair use of copyrighted materials is a key tool that will only become more vital as audiovisual work in DH increases, and that members of the AVinDH community should continue to exercise this right assertively. For our part, we make Kinolab's work available under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC), which gives users permission to remix, adapt, and build upon our work as long as their new works acknowledge Kinolab and are non-commercial in nature.
This case study highlights several of the challenges and opportunities facing DH
practitioners who work with audiovisual materials: in particular, the recent shift in
digital text analysis (and, to some extent, in moving image analysis) away from annotation
as a basis for data set training in favor of newer forms of machine learning; the ongoing
need for an authoritative data model for film language; and the changing legal terrain for
U.S.-based projects aiming to incorporate AV materials under copyright. The fact that each
of these challenges is simultaneously an opportunity underscores just how dynamic AVinDH
is in 2021. It also explains why this case study describes a project that is still very
much
As of this writing, the Kinolab team is testing its new platform and seeking user feedback on ways to improve it. We are also taking steps to ensure the thoughtful, intentional growth of Kinolab's clip collection and the project's long-term sustainability. These include, among others, 1) expanding the project's advisory board to include members broadly representative of an array of scholarly interests in film language and narrative, including sound, color, and computer-generated imagery (the use of 3D computer graphics for special effects), but also animated media, national and regional cinemas, horror, ecocinema, science fiction, silent cinema, television, queer cinema, classical Hollywood cinema, transnational cinema, and/or issues related to diversity and inclusion, among others; 2) independently developing and/or contributing to existing efforts to create a robust data model for film language; 3) encouraging colleagues to contribute to Kinolab by supporting the ongoing work of clip curation at their home institutions, either by internally funding undergraduate or graduate student clip curation or through student crowdsourcing in their classrooms; 4) testing and implementing where appropriate machine vision technologies such as those in development at the Media Ecology Project and the Distant Viewing Lab; 5) developing relationships with likeminded groups such as Critical Commons, Domitor, the Media History Digital Library and the Alliance for Networking Visual Culture, among others; and 6) developing national organizational partnerships with the Society for Cinema and Media Studies and/or the University Film and Video Association. Through these and other strategies, we hope to become a genuinely inclusive platform for the analysis of narrative media clips, built from the ground up by the scholars and students using it.