Iben Have is associate professor in Media Studies, School of Communication and Culture, Aarhus University, Denmark. She holds a PhD from Musicology and has specialized in research of sound and media in different constellations across the fields of reception-, text-, and institutional analysis. She is co-founder and editor in chief of the open access journal
PhD student at the Center for Humanities Computing Aarhus, at Aarhus University. His current research is on multimodal representation learning with a primary focus on how to meaningfully combine data sources from disparate domains into meaningful representations. He is especially interested in applied machine learning in the humanities and arts.
This is the source
Digitization has changed flow music radio. Competition from music streaming services like
Spotify and iTunes has to a large extend outperformed traditional playlist radio, and the
global dissemination of software generated playlists in public service radio stations in
the 1990s has superseded the passionate music radio host. But digitization has also
changed the way we can do research in radio. In Denmark digitization of almost all radio
programming back to 1989, have made it possible to actually listen to the archive to
investigate how radio content has changed historically. This article investigates the
research question:
This article investigates the research question:
In the collective research project,
As a part of the Danish research project, Have investigated the hypothesis that there has
been a development from recorded music being the most important content to an increasing
emphasis on spoken words (chattering hosts, news etc.) on Danish public service music
radio. This hypothesis was tested, discussed, and to some extent also confirmed, in a
qualitative study of five case studies of the most popular Danish morning music show on
the radio channel P3 in the period 1989-2016
Digitization is not only changing radio but it is also paving the way for new areas of
radio research. Radio archives are being digitized all over the world as a part of a
preservation policy regarding cultural heritage and some of these archives allow access
for researchers. Even if archive politics are becoming more user-friendly, there is still
a need for developing tools for engaging with the overwhelming amounts of material in the
audio archives
The terms close and distant listening are inspired by Franco Moretti's concept of distant
reading
In sound studies the term
The present study's point of departure is the Danish digital radio archive and digital
infrastructure LARM.fm, which gives access to
almost two million digitized Danish radio programs. No tool has yet been developed for
large-scale analysis of the archive, and one of the ambitions behind the project presented
in this article is to demonstrate a way to do just that. Thus, the aim of the article is
twofold: 1) To describe the methodological process and challenges for developing a model
for large-scale Speech-Music discrimination analysis on audio data to answer the research
question
In the following we will first present the LARM.fm archive that serves as the basis for both studies. After that we shortly present the analogue hand-held case study first presented in the article
LARM.fm was originally developed by the research project LARM Audio Research Archive
(2010-2014), which was a collaboration between 10 research and cultural institutions
funded by the Danish Ministry of Higher Education and Science. Since 2015 LARM.fm has
formed part of the national project DIGHUMLAB, www.dighumlab.org. The infrastructure was
then recreated on the present HTML5 platform, and the platform was expanded to also
include TV
Since 1987 Denmark have had legal deposit of all broadcast material to the Royal Danish Library which is included in the Danish Media Collection. From 2005 onwards, this material has been exclusively born-digital and almost all analogue programs from 1989 to 2005 have been digitized. Due to legal protection the Media Collection is only available to the public through on site computers at the libraries. However, an agreement between The Danish Agency for Culture and the copyright holders allows university faculty and students to stream, but not to download, the material through the library's platform Mediestream or LARM.fm.
As opposed to Mediestream LARM.fm also contains older material from the archives of the
Danish Broadcasting Corporation (Danmark's Radio or DR in the following), dating back to
1925, when the national radio was launched in Denmark. The old material is incomplete and
consists mostly of text documents, as many programs were not saved, and others (still)
have not been digitized
The amount of available material grows proportionally as new Danish radio and TV programs are broadcasted in Danish media. In November 2019 the collection consisted of more than three million radio and TV programs and more than 200,000 OCR-scanned PDF files .
As shown in Figure 1 (left column) the material is arranged according to type: TV programs, Radio programs, Radio news reports, Radio news manuscripts, Program guides, Radio news guides. Radio- and TV programs are available for streaming the other types of material consist of PDF files containing OCR-scanned documents — some handwritten and others typed or printed.
The oldest radio program in the archive is from 22 May 1931 and lasts just over five
minutes. As illustrated in Figure 2b there are 10 or fewer
programs from the first five years and between 35 and 61 programs from the remaining years
in the 1930s.
LARM.fm has expanded Danish radio research towards deep and detailed content analysis of
first-hand sources, which has been requested not only in Danish radio research but
internationally within media studies, where radio studies used to be characterized by
institutional and media systemic analysis
The close listening case study of the development of the share of music and speech
focuses on the most popular music radio program
The aim of the case study was to test the hypothesis that from 1989 to 2016
We listened closely to five selected
To ensure that the program from 1992 was representative for the early 1990s and not
affected by the institutional restructuring and formatting of radio channels in DR that
year, registrations from the previous years were included for comparison: 31 October, 1990
and 30 October, 1991. As shown in Figure 3, the distribution
between the categories was rather even, which confirmed that 1992 was not different from
the previous two years. Unfortunately, the program from 2006 appeared to be unusual
because it has technical problems in the studio, which gave rise to the use of some
unplanned
The registrations of the content of the five/seven programs point to some stable as well
as evolving elements during the period. The basic format remained fairly stable over the
years, but there were significant changes in the content of
However, a close listening to the programs revealed that the emphasis on music
Some elements of content are identical in all five programs namely the recorded music, the time announcements and channel/program adverts, but new, entertaining elements that are not related to music (satire, quizzes, and everyday news and actuality) are gradually introduced. This development together with the hosts' ignorance towards the music is pushing the music further into the background.
On the basis of the findings from the case study, it was concluded that
In this study we take the existing study further by zooming out and letting a computer do
the
The corpus for analysis is all radio programs broadcasted at P3 1989-2019 between 6 a.m.
and 12 a.m. and available as audio files in the Danish Media Collection and accessible
through LARM.fm. The data in the corpus is divided in two corpora, resulting from
different digitization processes: 1) Digitized radio files 1989-2005. 20,660 programs from
1989-01-01 to 2005-06-30 and 2) Born digital radio files 2005-2019. 24,456 programs from
2005-07-20 11:05:00 to 2019-05-28 14:03:00.
First, we repeated the manual coding of the five programs from the case study — this time through the open source multi-track audio editor and recorder Audacity — to be able to test the performance of a large-scale analysis. Doing the coding on the same material twice enhanced the reliability and as you can see by comparing Figure 4 with Figure 3 the diagrams are quite consistent from 2006-2016. However, the comparison also revealed some errors, since there is a discrepancy in 1992 and 1998. So, we went back to LARM.fm to deep-listen to the programs and found some external program segments with more speech than usual in Go' Morgen P3 were included in the data. Those smaller inconsistencies further add to the need for quantitative analysis in cohort with a qualitative approach.
In the next step we utilize the trained model by Papakostas and Giannakopoulos
(2018)
As seen in Table 2 and Figure 5 these performance levels generalize well to Danish when comparing it with our coded dataset. For a comparison with an audio-based classifier we used the support vector machine (SVM) by Giannakopoulos (2015).
These performance levels include only pure categories of speech and music, which does not
represent the music radio programs, especially not after 2006 (see Figure 4) in which it is seen that the proportion of the mixed category speech
and music is increased. Including
This level of performance allows for large-scale analysis of the relationship between speech and music and while our approach focuses on speech and music classification in radio, it can be generalized to other audio media and audio recordings such as podcasts, audiobooks and music performances. It is also possible to alternate predictors and, for instance, include gender and mood and allow for multiple outcomes, such as speech, music and jingles.
The results of the analysis are shown in the following four Figures, which will be further discussed in the following section.
In Figure 6 we measured the changes in the proportion of speech during the thirty years and notice significant changes around 1992 and 2001 and again in 2005. We also see an upward trend from 2016-2019.
Apart from the Music-Speech discrimination we also have an interest in how the length of
the sequences with speech/music might change during the years. This interest came from the
findings in the case study showing that the length of a musical track has been
standardized, especially after the introduction of playlists, rotates systems and musical
clock-schemes
Because we in Figure 7 present the mean length in seconds, the distribution is rather a Poisson distribution for speech and for music — a distribution which resembles a bimodal Poisson distribution (see Figure 8 and Figure 9). Note the increase in the average length of music in 1992 and the decline in music and speech length from 2011, which seems to indicate that there are shorter segments of both speech and music and consequently they must have switched more often. Additionally, we observe a rise in average speech length from 1992 until 1999.
To elaborate the findings from Figure 7 we grouped the numbers in seven-years-periods to make comparison of these segments in relation to speech and music, separately.
In Figure 8 and Figure 9 we pay attention to the first peak of very short segments of 10-20 seconds and the second peak around the 200 seconds - especially significant in relation to music.
The study has analyzed some general tendencies in the development of the morning programming 6-12 a.m. at P3 1989-2019 by measuring the proportional distribution of music and speech and by measuring the length of music and speech segment respectively. When we look at the most visually striking changes in Figures 6 - 9, some years across the analysis seems note-worthy: 1992, 2001, 2005 and 2016.
Some important institutional and media political changes must be taken into account to explain the development. The official end of the monopoly in Danish radio came in 1983 when local radio stations were allowed for the first time; however, with regard to nationwide channels DR's monopoly was not fully broken until 2003. That means that P3 has been in an increasing competitive condition during the thirty years, which is further intensified by the competition from music streaming services like Spotify and iTunes from the latter half of the 2000s.
To meet the competition from the local and commercial music radio channels, P3's profile
was strengthened with the formatting of DR's four radio channels in 1992 (P1, P2, P3 and
If we look at Figure 6, we see significant changes around 1992 and 2001 and again in 2005. This generally confirms the case study in relation to the tendency of less speech in the early 1990s before the restructuring and formatting of DR's radio channels in 1992, and a growth peaking around 2011 and hereafter a decreasing tendency towards 2016. What is new in the large-scale study is, that we here see an upward trend in the proportion of speech from 2016 to 2019 reflecting a corresponding decrease in musical content.
The rise in the proportional amount of speech from 2005 to 2011 in Figure 6 can be
explained by the increasing number of hosts in the studio (Figure 10). More hosts entail
more talking. Have (2018) found that there were many
different hosts in the early years but only one host in the studio at a time (1989-2000),
after which there was a relatively stable group of hosts of three to four who were all
co-present in the studio (2005-2015). In the intervening years there was a short period
with one or two hosts in the studio (2001-2004)
The changes in 2005 might also be explained by a change in file format in the dataset. That year The Danish State Media Collection's procedure changed from digitizing the incoming material from the Danish media providers, to receiving it as born-digital as described above. While this change is positive, it leads to analysis on individual programs rather than the entire morning section. This leads to files which predominantly is music or speech hereby leading to increased uncertainty. To normalize across corpora each file was further split into segments of approximately 1,000 seconds leading less effect of outliers, however a noticeable difference still remains between the corpuses. As seen in Figure 8 and Figure 9, hardly any sections of music or speech is above 1,000 seconds. But we must still take an increase in uncertainty after 2005 into account.
In Figure 7 we see an increase in the average length of music from 1992 and a decline in music and speech length from 2011, which seems to indicate that there are shorter segment of both speech and music in the early years which might to some extend be explained by the practice of the host overlapping with the music and consequently music and speech are switching more often. The shift in 2011 might be explained by a general use of jingles and segments like quizzes and DR's own commercials. However, that does not correspond with Figure 4 showing a more stable category of
In Figure 8 and Figure 9 we
pay attention to the first peak of very short segments of 10-20 seconds, which again can
be explained by the overlapping host talk in the early years, and the second peak around
the 200 seconds — especially significant in relation to music. This observation is
interesting because it confirms a standardization in the lengths of musical tracks played
in the radio after the restructuring in 1992. It is an example of what Have (2018) calls a
Apart from the specific results of the changes in the Danish radio channel P3, this article also has an interest in the methodological discussions of deep and distant listening. The opportunity to move back and forth between the existing case study and the large-scale study — you might call it a meso scale listening approach — has made the analysis not just more solid in relation to the existing findings but also in relation to filling the gaps of each method. Risks of cherry picking in relation to the qualitative case study have been dismissed, and Figure 6 added a more nuanced perspective to the case study by showing a quite variable amount of speech during the years. However, with an upward trend from 2016-2019, which is the period not included in the case-study. So, the large-scale study also gave rise to new questions, such as, why do we see an increasing amount of speech from 2016 and onwards? An explanation could be the increasing competitive situation, both from commercial digital music radio stations and digital music streaming services. In this competitive field of musical content DR and P3 turn towards one of their core competences as a public service institution, to offer professional journalistic content presented by well-known young personas in an entertaining way. That strategy turns P3 even more away from being the music channel it once was towards an entertaining and communicative radio channel.
Many of the eye-catching changes in Figures 6-9 can be explained from institutional changes in DR, but the significant change in 2005 cannot. That gave rise to questioning not the classifier but the data, which changed format in 2005. This is valuable knowledge not only in this study but in future large-scale studies of the LARM.fm archive. Working with the LARM.fm archive as reverse-engineering Humanists has given insights in some of the challenges of large-scale archive analysis calling for critical attention to significant oscillation, which is not always anchored in the actual changes in the content of the data but in the formats providing this content.
A main aim of this study in total has been to analyze whether there has been a development from recorded music being the most important content to an increasing emphasis on spoken words. The close listening approach enabled a study of how the music was presented by the hosts in the program and confirmed the qualitatively change from a host filling out the gaps between the musical tracks by talking about the music in the early 1990s to music as something filling out the gaps between a group of hosts, who do not relate to the music at all. However, the large-scale study enabled us to correct or at least nuance the findings further by demonstrating that the general amount of speech has actually not increased but is varying during the 30 years period, as shown in Figure 6. From the large-scale study we learned that the diachronic changes in the share of music and speech are less significant than initially expected. This points to the fact that picking single case studies (even if sampled deliberately) can lead to doubtful conclusion if not reflected on a background of the full radio programming. At this point our study clearly shows strengths and weaknesses of the two approached and why it generates more valid answers to combine them.
This study has compared a case study of five Danish music radio programs (1990-2016) to a large-scale study of the whole morning programming (6 to 12 a.m) of the music channel P3 1989-2019. Both studies are anchored in the data from Danish digital archive and infrastructure LARM.fm, which is offering new paths for radio and media studies by affording deep listening studies as well as large-scale distant listening studies. The study is the first to present a large-scale analysis of the huge amount of data from the Danish radio archive. Inspired by Papakostas and Giannakopoulos we applied a convolutional neural network (CNN) image-based classification on spectrograms, which was obtaining state-of-the-art performance. For a comparison with an audio-based classifier we used the support vector machine (SVM) by Giannakopoulos (2015). As demonstrated in Table 2 and Figure 5 the classifier tools developed from Papakostas and Giannakopoulos performed with high accuracy and the performance levels generalized well to Danish. This is useful insight to bring into future studies of automated speech recognition in Digital Humanities: For instance, how the analytical results found in a small amount of data can successfully be generalized to a large amount of data, and how models of speech-music discrimination can successfully be transferred across languages (Danish and English in this case). We also expect that the model developed in this article can be trained in relation to other tasks such as gender detection or regional accent detection.
The findings in the study confirms that the political and institutional changes in Danish music radio leave their imprint in the content of the programs. For instance, when we register more standardized formats and segments after the restructuring in 1992. But the most eye-catching protrusion of Figure 6 in 2005 must be explained by a change of format caused by a shift from digitized to born digital files in the archive. Thereby the study also contributes with an example of how it is important to include reflections on the constitution and possible changes in the data when doing large-scale analysis.
In general, we can conclude that the combination of a close listening and a distant listening approach has given us a more saturated picture of the development of the morning music radio programming at P3 from 1989 to 2019. The combination both enables us to give more valid answers to how the distribution of music and talk has changed during the period but it also brings forward some strengths and shortcomings of the qualitative case study and the large-scale analysis, respectively. Finally, we hope that this study can contribute to encouraging Digital Humanities scholars to include more audio content analysis in relation to the field of Media Studies.
Melanctha: Using Similarity Analysis in a Discovery Paradigm to Analyze Prosody and Author Influence