Eef Masson is a senior researcher at Rathenau Institute (the Hague), a research for policy institute concerned with the societal impact of science and technology. Previously, she was an assistant professor of Media Studies at the University of Amsterdam, where she taught courses in film and media history and media archiving and preservation, and published on non-fiction and non-theatrical films, media archives, museum media, and practices in data-driven research and data visualization. Until the Autumn of 2019, she was a senior researcher in UvA’s
Christian Gosvig Olesen is a postdoctoral researcher at Utrecht University and lecturer at the University of Amsterdam’s Media Studies Department, where he teaches courses in film studies, media preservation, restoration and digital heritage. For UvA’s
Nanne van Noord is a researcher at the University of Amsterdam and the Netherlands Institute for Sound and Vision. His research focuses on developing new computer vision methods that push the state of the art in a manner that is informed by and relevant to humanities research. He holds a PhD from Tilburg University for his thesis on learning visual representations of style and he has previously worked as a researcher for
Giovanna Fossati is a professor of Film Heritage and Digital Film Culture at the University of Amsterdam and chief curator at Eye Filmmuseum. She recently led
This is the source
In recent years, efforts to unlock digitized moving image collections have focused primarily on the retrieval of collection items through semantic descriptors: keywords or other labels produced either manually, or as (semi-)automatically generated metadata. As a result, access to digital archives is still governed overwhelmingly by a logic of search. In practice, this means that users not only need to know what they are looking for, but are also constrained by the interpretive frameworks informing the materials’ labelling. Arguably, this poses restrictions on what they can find, how they can interrelate collection objects, and ultimately, how they can reuse or reinterpret collections. Taking such issues as its starting point, the
In this article, we discuss the project’s rationale and its early results. First, we place SEMIA in a recent history of visual analysis for media scholarly research, specifying how it both builds on and departs from this history (also in the epistemic sense). Subsequently, we provide more details about the project’s approach to image feature extraction and discuss some analysis results. In our conclusions, we confront those results with what we had initially hoped to gain by applying computer vision methods for enabling access to collections.
Discusses the
One senior curator said that some of museum staff [sic] were
skeptical of the project at first. We would get an email from Wes asking, Do
you have a list of green objects? Could you send us a list of everything you
have that is yellow? Our data system does not have these
categories.
Until late April of 2019, visitors of the Kunsthistorisches Museum in Vienna could drop in on the exhibit
In a more general sense, this holds true also for most moving image archives. Oftentimes, such institutions house collections of many thousands of films or television episodes, composed in turn of millions of discrete images. Typically, the sensory characteristics of those objects barely feature in catalogue descriptions. While some entries contain information, either at the title or the fragment level, about the colour or sound systems used, this information tends to be fragmentary. Moreover, further specifics about the films’ or episodes’ visual features are usually absent.
In recent years, audiovisual heritage institutions have invested much time and
resources into digitising their collections, so as to enable various kinds of reuse.
Yet in spite of this, the above situation is largely unchanged. So far, attempts to
improve usability have focused primarily on the searchability of collections and the
retrieval of collection items through (linked) metadata. Therefore, access to digital
archives is overwhelmingly governed, even today, by a logic of search – one dominant
in practices of information retrieval more in general
For users, the selectiveness of catalogue descriptions poses two important problems.
On the one hand, it forces them to search collections on the basis of prior
interpretations, and from the perspective of those who catalogued them – rather than
to more freely explore them. On the other, it prevents them from relying in the
process on features that are essential to their experience of heritage objects, but
inadequately captured through verbal description; for example, visual features such
as colour, but also shape or movement. Such characteristics are particularly
significant for historic (moving) images, as those are valued not only for the
information they hold, but also for their look and feel
The research project The Sensory Moving Image Archive (SEMIA): Boosting Creative
Reuse for Artistic Practice and ResearchUsers,
in this context, are
filmmakers or exhibition designers, but also scholars. It has been argued, indeed,
that the work of researchers may benefit from modes of access that do not (solely)
rely on search and retrieval of single items but afford a more explorative form of
browsing
SEMIA, a two-and-a-half year project that ran until late January 2020, was a
collaboration between the University of Amsterdam (with contributions from media and
audiovisual heritage scholars as well as computer scientists), the Amsterdam
University of Applied Sciences (specifically, experts in the domain of data
visualisation and interface design), the interaction design company Studio Louter
(experienced in the development of museum presentations) and two audiovisual heritage
institutions: Eye Filmmuseum (focusing on film and cinematography) and the
Netherlands Institute for Sound and Vision (television).
The project consisted of two phases, whose timings partly overlapped: a first,
focused on image feature extraction and analysis, and a second, concerned with the
development of a generous
interface
In computer vision, a subdiscipline of AI, models are developed for extracting key
information – so-called visual features
– from images, so that they can
subsequently be cross-referenced. In the analysis process, images are transformed
into descriptions that are used in turn to classify them. In the early years of the
field, methods were developed that required humans to determine which operations
systems had to perform in order to produce the intended analysis results. More
recently, however, methods based on machine learning, whereby computers are trained
with techniques for automatic feature learning, are becoming more popular.
In SEMIA, we used a combination of both types of methods. In what follows, we explain why this is the case, and elaborate on how the computer scientists in our team aligned their work with our overall objective of enabling new forms of exploration. In doing so, we specifically focus on how we changed the preconditions for archival reuse (the scholarly kind in particular). We are motivated by the observation that reliance on visual features and relations in accessing collections not only opens up new avenues for research, but also helps challenge current understandings of how knowledge is produced – in media and heritage studies (traditional as well as digital) and in the digital humanities more broadly.
In our contribution, we take a funnel approach,
gradually narrowing our focus
to the specific extraction and analysis tasks carried out within the SEMIA project.
First, we provide a broad outline, and discussion, of the landscape
of visual
analysis for media scholarly research, and developments in this area over time. We
pay attention both to the interests and objectives of those active in the field
(along with their epistemic underpinnings) and to their specific approaches or
methods. The purpose of this exercise is twofold: to specify the project’s place
among prior efforts, and to further elucidate our overall motivation in taking it on.
Subsequently, we zoom in on what feature analysis means for SEMIA: first, by looking
at the general principles behind our approach to feature extraction, and then, by
discussing some analysis results. In our conclusions, we confront those results with
our initial intent in exploring the affordances of computer vision for providing
access to collections.
In developing a tool that supports a more unconstrained browsing of media archives
than is currently available, we sought to complement existing approaches to, and
methods for, the visual analysis of moving images. Those approaches and methods have
emerged primarily in the context of stylometric research of the 1970s and on, and
tend to be tailored to the detection of patterns in specific analytical units. In the
interpretation of data, stylometric research usually adheres to semantic categories
that have traditionally had relevance also for both archives and media historical
research (in particular, the above-mentioned categories of director or creator, or
production time). For the purposes of the SEMIA project, we needed to let go of the
assumptions this implied about what is meaningful
about collection objects.
To achieve this, we followed the line of reasoning of a recent trend in digital film
and media studies scholarship that seeks to reorient visual analysis methods by
drawing on artistic practices of archival moving image appropriation. Such strategies
are not intent on finding patterns in preselected image units, but are geared instead
towards accidental or unanticipated finds that reveal more surprising similarities –
or contrasts – in audiovisual materials. Those pioneering scholars, whose work we
sample below, are convinced that artistic work can inspire users
In order to specify the epistemological underpinnings of our own approach, it is helpful to start off with a brief consideration of the foundational assumptions of stylometry. This will help us to subsequently explain how more recent projects in visual analysis in our field draw on this tradition, while also moving it in different directions. We end the section with some further elaboration on the appropriation-indebted trend in film and media studies, explaining how it was inspirational for us.
In film and media studies, the visual analysis of moving images was developed as part
of the intertwining stylometric research programmes commonly referred to as
statistical style analysis
and cinemetrics,
initiated with the
pioneering work of Barry Salt and Yuri Tsivian respectively. Arguably, these
programmes had their very early roots in film theory and criticism from the 1910s and
1920s, attending to the interrelations between film editing, style and perception,
and gained a foothold in academic institutions in the 1970s (see Buckland [2008] and Olesen
[2017] for more on those historical developments). Their objective was to
discern patterns in audiovisual materials, in a way that resembles the analysis of
linguistic patterns in literary computing (for instance, for the purpose of
authorship attribution, for the dating of films, or for the creation of statistical
profiles of directorial styles, periods or genres and their changes over time). Such
research often took a deductive approach, producing data that supports stylistic
analysis as a more rigorous
alternative, or complement, to traditional
hermeneutic approaches. In its first decades as a scholarly form of research,
stylometry pursued its objectives primarily by manually annotating, coding and
quantifying data on shot lengths and shot types in films and television materials, to
subsequently relate the data thus obtained to known information (for instance
production or release date, production company, genre or director) in an attempt to
interpret significant patterns.
In recent years, as digital humanities methods have proliferated, stylometric
research in media studies has become more complex in its methods, but also more
varied in its interests. In the past, shot length and shot type were key parameters
for analysis; more recently, however, attention is also being paid to colour, motion,
(recurring) objects and aspects of visual composition. Projects such as Digital
Formalism
In this respect, ACTION certainly paved the way for SEMIA. On the one hand, because
the project relies to a considerable extent on techniques developed or used in the
context of previous stylometric research. And on the other, because it likewise
engages in the extraction and quantification of moving image data. In SEMIA, however,
such extraction serves rather different purposes. Data analysis, in this case, is not
done with the objective of authorship attribution or for the establishment of genre
features dominant in a particular corpus or period. As previously explained, the
project is focused rather on enabling exploratory browsing, affording (possibly
incidental) discovery of similarities that do not neatly align with existing
interpretative frameworks. For instance, similarities between collection items that
do
To this end, the project draws inspiration from an emerging approach to visual
analysis and data visualisation in digital film and media studies scholarship – an
approach that is indebted in turn to media art practice and experimental filmmaking.
Kevin Ferguson, a proponent of this trend, explains that there is a tradition of
experimental work in media studies that balances between [...]
new media art and digital humanities scholarship
deforming
its object of study a digital humanities project that is more aleatory
and aesthetic than it is formal and constrained
As previously mentioned, film and media scholars who proceed in this way oftentimes
seek inspiration in the work of artists, and specifically, those engaged in practices
of archival appropriation. History has shown that these practitioners in particular
have their own contributions to make to the challenging of preconceptions
underpinning scholarly analysis. At times, they even use the same analytical devices
for this purpose – but in methodologically less rule-bound ways. In the last few
decades, this has led to productive exchanges between academics, archivists and
artists – the constellation Thomas Elsaesser once dubbed the three A's
For instance, in the 1970s, when film historians would use projectors and editing
tables to come up with statistics providing insight into developments in film style,
artists would use those same devices to visually explore archival films in more
idiosyncratic ways. They would focus in the process on particular image details, or
dwell on and contemplate specific temporal units by stretching them. Examples of this
practice are the 1970s structural films of Ken Jacobs, Al Razutis or Ernie Gehr, who
repurposed films from the early 1900s. Their oftentimes rather abstract works
highlighted the different
formal properties of early cinema (compared to the
narrative standard of later years). In bringing those to the fore, they challenged
prevalent assumptions among contemporary historians, who had in fact largely
neglected early cinema in their stylistic accounts to date
Ultimately, the great merit of such artistic work is that it strips archival films of
the categories and interpretive frameworks with which they have previously been
associated – thus opening up the possibility of applying new ones. Film scholar
Michael Pigott, in this context, has credited the practice with inducing illegibility.
In his view, this sort of work serves the dual purpose (and double tension) of making the image illegible
(again) and then attempting to read it
In setting up SEMIA, the project team, while familiar with the above-mentioned examples, was more directly inspired by the work of Dutch video artist Geert Mul – a long-term collaborator of heritage partner Sound and Vision. Particularly influential for the projects’ approach was
To create his work, Mul used large databases of images, for which he extracted a wide
variety of visual features. Those features served in turn as the basis for a matching
of images at different levels of similarity. The first part of this process was
conducted automatically; however, human intervention occurred when the artist
selected approximate rather than identical matches to include in his work matches
would likely be
considered errors, glitches or mismatches. But in the context of an exploratory
browse through an archival collection, they are precisely the kinds of results that
may yield unexpected connections or patterns, worth investigating further outside of
conventional notions of authorship, genre or period.
The above observations informed our decision, made early on in the SEMIA project, to
radically abandon those kinds of categories, as embedded in archival metadata through
semantic descriptions, and to opt instead for a visual analysis approach. We did this
primarily by way of experiment, and in the assumption that the explorative options it
opened up would eventually prove useful primarily
In addition to pursuing a different set of media scholarly objectives, the SEMIA
project team also sought to engender a shift in terms of the techniques used for
visual analysis. In this section, we discuss the rationale behind our choice for
specific feature extraction methods, and why we chose to tweak existing ones in
particular ways. The connecting links between those different choices are, first, our
wish to extract features that would point to unanticipated – rather than predictable
– connections among objects, and second, to do so at a higher level of abstraction
than is currently considered state of the art,
in light
of the overwhelming focus in computer vision on the recognition of meaningful
semantic entities.
To a greater extent than other projects so far – ACTION, for instance, or the
Zürich-based FilmColors – SEMIA set out to explore the affordances of deep learning
techniques for revealing similarity-based patterns in (large) collections of
digitised moving images.routes
through the
collections, remixing
them as it were, and that this would elicit new
questions about the items and their mutual relations. As we previously explained, we
were specifically interested in relations inspired by the material’s visual features
– rather than the sort of filmographic or technical data that make up traditional
metadata categories for film and video.
As it happens, such metadata, in archival collections, are often also fragmentary – and therefore, hardly reliable as a starting point for an inclusive form of collection exploration. Early on in the project, we took this as a key argument for looking into the possibilities of computer vision, and specifically deep learning techniques, for the purpose of feature extraction. This approach would help us generate large quantities of new metadata that would invite, if not a more inclusive kind of exploration, then at least one that could complement approaches to access based on search. After all, a lack of metadata in the form of semantic descriptors as encountered in an institutional catalogue may render the objects in a collection invisible, and therefore unfindable. While an approach relying on visual analysis does not solve this problem – as it can create new invisibilities, which we argue elsewhere (see Masson and Olesen [2020]) – it does challenge existing hierarchies of visibility.
Initially, the choice for a deep learning approach seemed to fit neatly with the
project’s intent to refrain as much as possible from determining in advance the route
a computer might take in order to identify similarities between collection items. In
the alternative scenario, known as feature engineering,
it is humans who
design task-specific algorithms, which are used to extract pre-defined features from
the images in a database (so that they can subsequently be compared). Deep learning,
which relies to an overwhelming extent on the use of Neural Networks (NNs, or
neural nets
for short), involves algorithms trained with techniques for
automatic feature learning (and as such, is a particular brand of machine learning).
As we mentioned in the introduction, this is a more recent approach, and it entails
the learning of specific data representations rather than set analysis tasks. Like
feature engineering, deep learning does to some extent rely on the intervention of
humans; after all, it is people who, at the training and/or retraining stages,
determine which similarities do or do not make sense (see also Masson and van Noord [2019]; in Masson and Olesen [2020], we elaborate on the epistemic
implications for users of our tool). However, it does not require them to decide in
advance
However, we soon decided to only partially rely on such techniques – and the
abovementioned role of human knowledge is certainly one of the reasons why. As a
rule, deep learning is employed for the recognition of semantic classes, and more
specifically, object categories. This is hardly surprising, as the development of
such techniques is oftentimes done for purposes that involve the recognition of
semantic entities: vehicles, people, buildings, and so on. (One might think here of
applications for transportation and traffic control, geolocation, or biometrics; see
e.g. Uçar et al. [2017]; Arandjelović et al. [2018]; Taigman et al. [2014]). Within the SEMIA context,
however, the use of conventional semantic classes does not make sense, as it is the
sensory aspects of collection items – rather than the meanings we may assign to
images, or image sections, on the basis of specific content – that are of interest.
In fact, semantic classes commonly identified by deep learning approaches partially
overlap with the sorts of categories that are used in descriptive metadata for
archival collections, and that are central also to practices of search and retrieve.
In performing feature extraction, we had hoped to be able to work instead with more
abstract visual categories, which according to computer vision logic, involves
extraction at a lower (syntactic
) feature level (a point we elaborate on
further below).
Another reason why exclusive reliance on a deep learning approach ultimately did not make sense, is that its underlying logic clashed with the requirements we had for interfacing. If our objective was to take sensory features as the point of entry into the collections, then it was imperative that our exploration tool allowed users to also take those features as the basis for digging further into the connections between items. For this purpose, we would need to at least minimally categorise, or re-categorise, those features, from the outset. The most logical choice here was to use the same intuitive classes that had also inspired the project: features such as colour, shape and visual complexity, and, for relations across time, movement.
One way of tackling this task with deep learning methods might have been to run
successive analyses, whereby each time, the focus would be on one specific set of
features, while other features would be cancelled out. For example, in order to
extract information about shape, we might have deactivated the neural net’s colour
‘sensitivity’ by temporarily turning all images in the database into black-and-white,
so as to focus its attention in the required direction. This type of approach is
generally associated with a (fairly new) line of research in computer science,
focused on learning so-called disentangled
representations
(see Xiao et al. [2018];
Denton and Birodkar [2017]). So far, however, it
has had limited success.
A major point of attention was the need to attain a sufficient measure of abstraction
in the results of the computer vision part of the project – results that were used in
the development of a tool for visualising the sensory relations between the films and
fragments in our database (a process we shall discuss elsewhere). We explained that
our objective within SEMIA was to inspire users by revealing potentially significant
relations between database items; in doing so, however, we sought to relegate the act
of assigning such significance – or in Pigott’s terms: of attempting to read
images made illegible, through novel relations – as much as possible to users. For
example, while we may want to draw attention to the circumstance that a specific set
of database objects covers very similar colour schemes, or that they feature
remarkably similar shapes, we leave it to the user to figure out whether, and if so
how, this might be significant (that is, what questions it raises about media and
their histories, or which alternative ways of researching historical film or
television materials it affords). But arguably, we also withhold interpretation at a
more basic level. In the above example, for instance, we leave undetermined whether
similarity in colour or shape derives from the fact that the images concerned
actually feature the same things.
(They might, and they often do – but it is
not necessarily so.) In this respect, what we do is entirely at odds with the
objectives of much machine learning practice in the field of computer vision.
Our search for abstraction is evidenced in a very concrete way by what happened
exactly in the feature extraction process. First, the extraction of image information
along the lines of colour, shape, visual complexity and movement was not followed in
our case by an act of labelling: of placing an image or image section in a particular
(semantic) class (we elaborate on this point in Masson and
van Noord [2020]). The reason, of course, is that we did not actually seek
to identify objects. For the purposes of our SEMIA experiment, the information as
such, and the relations it allowed us to infer, were all we were interested in.
Second, our search for abstraction is also evident from our application of deep
learning methods, which was limited to the extraction of information about shape.
Here, we focus on what computer vision experts call lower-level
features – a
notion that requires some further elaboration.
In computer vision, conceptual distinctions are oftentimes made between image
features at different levels.
From one perspective, these are distinctions in
terms of feature complexity. Levels of complexity range from descriptions relevant to
smaller units (such as pixels in discrete images) to larger spatial segments
(sections of such images, or entire images), whereby the former also serve as
building blocks for the latter. From another, complementary perspective, the
distinction can also be understood as a sliding scale from more syntactic
(and
abstract) to more semantic
features (the latter of which serve the purpose of
object identification). Taking the example of shape-related information, we might
think of a range that extends from unspecified shapes, for instance defined in terms
of their edges (low-level), to more defined spatial segments such as contours or
silhouettes (mid-level), all the way to actual object entities (e.g. things, people,
faces, etc.) or relations between such entities. In SEMIA, we made use of a neural
network trained for making matches at the highest (semantic) level. However, we
scraped information at a slightly lower one, which generally contains descriptions of
object parts. At this level, it recognises shapes, but without relating them to the
objects they are part of.understand
images, see also Olah et al.
(2017).
Arguably, this approach helped us mitigate a broader issue that the use of computer
vision methods, and machine learning in general, posed for the project: that its
techniques are designed, as Adrian MacKenzie puts it, to mediate
future-oriented decisions
– but by implication, also to accurate
identifications at the semantic level as the highest
achievable goal within machine learning also imposes limitations, in that it renders
meaningless all other similarities – and importantly, dissimilarities – between
objects in a database. Anna Munster, therefore, argues that prediction also takes down potential
(quoted in Mackenzie [2017, 7]). Within the SEMIA context, we expressly tried
to bring back some of this potential for the user. Sometimes this required us to
deviate from what was ‘state of the art’ in the field of computer vision. Only in
this way, after all, we could leave room for matches that might, within a purely
semantic logic, be considered mistakes but still provide productive starting points
for unrestrained explorations of patterns that perhaps no one had noticed before.
To round off this account, we now look at the results of our feature extraction
efforts, and at what we learnt about the aptness of the approach for our goals. The
classes of features the SEMIA project centred on are embedded in a rich history of
computer vision research, which, as we previously explained, began with a process of
manually designing features for predefined analysis tasks.
As mentioned earlier, we chose to focus on four broad sets of image features,
commonly understood as instances of shape, colour, visual complexity and movement.
Shape, we explained, is the only feature for which we extract information using a
neural net. The net we chose was trained for object recognition, but is commonly
repurposed for other tasks.level,
where, as we explained in the
previous section, the prediction probabilities for the semantic classes it
was trained on are to be found).
The features used to describe shape, colour, and visual complexity were all extracted
with techniques that are applied to still images. In order to apply them to moving
image material, we extracted feature descriptions from shots taken from the films and
programmes in our corpus. Specifically, we extracted the shape, colour, and visual
complexity features from five frames, evenly spaced throughout the shot, and
aggregated them to create the final feature descriptions. Movement, however, is a
feature specific to moving images. For extracting this kind of information, we relied
on an optical flow method, measuring relative motion between two subsequent frames.
In each case, we applied it to the same sets of five frames.
For the purpose of the project, we gathered approximately 7,000 videos, which we
subsequently segmented into over 100,000 shots with the help of automatic shot
boundary detection. Each of those shots was subjected to the four feature extraction
algorithms. Altogether, this resulted in four different feature spaces, in which
every shot constitutes a datapoint. By measuring the distance between all points, we
could determine which other shots are most similar to a given one; the two closest
points are known in this context as so-called nearest neighbours.
nearest neighbour
is also key to the
In Figure 2, we show three examples with the
Query
shots to the left, represented here by a single still each, and the
16 shots identified as their nearest neighbours in the four different feature spaces
to the right. A first possible observation concerns the diversity between the nearest
neighbours for the three query shots: while all nearest neighbours share sensory
aspects with their respective query image, they are considerably different from those
for the other query shots. This at the very least suggests that they are not randomly
selected. The second query shot, furthermore, shows a visible similarity between
nearest neighbours across the four different feature spaces for each query image.
This last pattern logically follows from the nature of nearest neighbours, in that
shots that look similar in one sensory aspect, are likely to also look similar in
others. Colours in a nature shot (such as the mushroom in the third query shot), for
example, are very distinctive, making it likely that its nearest neighbours in terms
of colour are also nature scenes. Similarly, the movement of leaves swaying in the
wind is very distinctive, making it probable that the nearest neighbours of a shot
with this element, in movement terms, also show leaf-rich scenes.
At the same time and in spite of other visual similarities, our query images also produce matches that are quite distinct, precisely, in terms of the semantic entities they feature. The movement feature space for the mushroom query image, for instance, features a standing man (presumably, one who moves from left to right or the other way around, in the same way that the mushroom does; however, it would require further inspection to ascertain this or to make sense of this pairing). In instances like these, the matching process has arguably yielded more unexpected or surprising results and variations. Moreover, such matches occur more often if we look beyond the closest of the nearest neighbours. For example, a desert scene is similar to a beach scene in terms of colour, but not in terms of movement; in contrast, a grassy plain has similar movement to a beach scene, but differs strongly in colour. Hence, by exploring similarities in multiple feature spaces, we are still able to uncover such relations that would otherwise remain hidden.
In this article, we have argued for a reorientation of existing visual analysis methods, in response to a need for exploratory browsing of media archives. We explained how we took our cue from a recent line of digital scholarship inspired by artistic strategies in (new) media art, and how we also built on the tradition of exchange between film archives, media history and appropriation art. Historically, artists have used the analytical devices of scholars to different ends, thus engendering shifts in the latter’s working assumptions. In a similar vein, the SEMIA project team drew inspiration from the ways in which data artists repurpose existing visual analysis tools. We did so with the specific goal of enabling a transition from searching to browsing large-scale moving image collections. This way, we not only hoped to significantly expand the range of available metadata, but also to allow for the revaluation of the images’ sensory dimensions in the very early stages of research. Ultimately, we think, both approaches to collection access can very well complement each other.
Our goal required that for the extraction of data, we adhered to the following
general guidelines. In order to reduce the system’s reliance on a
priori interpretations, we first of all sought to avoid direct human
intervention in the actual extraction process. As a matter of principle, it should be
up to the algorithm to determine similar,
somewhat similar,
or dissimilar
– even if, as we argue elsewhere,
algorithms ultimately always rely on knowledge that originates in humans (see Masson and van Noord [2019]). Furthermore, we tweaked
the algorithm to partially prevent it from recognising (human-taught) semantic units.
Consequently, it could focus on similarities at a more abstract level. At this stage,
some human intervention is ultimately unavoidable, as it is the computer scientist
who decides (ideally on the basis of sample testing results) at which feature
level
the extraction takes place.
One conclusion that can be drawn from our review of most similar results is that
extracting data with a minimum of labelling and human intervention, while also
attending to intermediate similarities, never truly cancels out the detection of
semantic relations and patterns altogether. In fact, this is hardly surprising,
because this relation between low-level feature representations and objects – one
that frames objects in terms of its facets; for instance, in the case of an orange,
its colour and rounded shape – has been commonly exploited in early work on computer
vision to detect semantic relations and objects. Therefore, some feature combinations
are simply too distinctive to not be detected with our chosen approach – even if we
do our best to block the algorithms’ semantic impulse.
illegible,
and therefore, invite
further exploration. In this sense, our working method does yield surprising results,
or unexpected variations. In the remainder of our project, which we report on
elsewhere, our intent has been to further stimulate users in exploring those less
obvious connections by extending our interface with the capacity to also browse
The next step, which we expand on in an upcoming piece, is to assess which kinds of
questions and ideas exploratory browsing through the lens of sensory features
ultimately yields, and to evaluate how this furthers the efforts of various user
groups serendipitous
discoveries
Cinema of the Second Period
HumanisticCinemetrics?