What Does A Photograph Sound Like? Digital Image Sonification As Synesthetic AudioVisual Digital Humanities


Computers have the capacity to transpose the pixels, shapes, and other features of visual material into sound. This act of data correlation between the visual and the audial produces a new artifact, a sonic composition created from the visual source. The new artifact, however, correlates precisely to data in the original, thus allowing for fresh ways of perceiving its form, content, and context. Seeming to distort the visual object into an aural one paradoxically allows an observer to observe the visual evidence anew, with more accuracy. A kind of generative, synesthetic criticism becomes possible by cutting across typical boundaries between the visual and the audio, the optic and the aural. Listening to as well as looking at visual artifacts by way of digital transpositions of data enables better close readings, more compelling interpretations, and deeper contextual understandings. Building on my earlier scholarship into image glitching, remixing, and sonification, this essay investigates a photograph of Joan Baez performing at the Greek Amphitheater in Berkeley, California, during the early 1960s. The image comes from my project on the Berkeley Folk Music Festival and the history of the folk music revival on the West Coast of the United States. Here, the use of digital image sonification becomes particularly intriguing. While we cannot magically recover the music being made in the photograph, we can more closely attend to the ghosts of sound within the silent snapshot. Digital image sonification does not recover the music itself, but it does help to amplify issues of gender, power, embodiment, spectacle, performance, hierarchy, and performance in my perceptions of Baez making music in the photograph. Using the ear as well as the eye to scan the image for its multiple levels of meaning leads to unsuspected perceptions, which then support more revealing analysis. In digital image sonification, a cyborgian dance of data, signal, image, sound, history, and human perception emerges, activating visual materials for renewed scrutiny. In doing so, this mode of AudioVisual DH activates the scholarly imagination in promising new ways.

Figure 1. 
Joan Baez performing at the Hearst Greek Theatre, Berkeley Folk Music Festival Collection, Charles Deering McCormick Library of Special Collections, Northwestern University Libraries, n.d. (possibly 1963), photographer unknown, https://dc.library.northwestern.edu/items/87ac8798-6aaf-456e-9c3e-dc187c796115.
In the photograph (Figure 1), we see the famous folksinger Joan Baez performing at the Greek Amphitheater on the University of California-Berkeley campus, but of course, we cannot hear her. Taken on a summer night during the early 1960s, the image contains a moment when musical sound was being produced, but as an artifact, it remains silent. The photograph visually conveys a grand sense of performative scale. Snapped from high up in the audience by an unknown photographer, we look down on Baez from stage right at the 8,500-seat venue. She stands under the stark white spotlight, just a series of blurry shapes far away on stage. You can recognize her iconic blouse and knee-length skirt, her long brunette hair, and her acoustic guitar, but that is about all. She sings in a space modeled after the Ancient Theater of Epidaurus, funded by newspaper magnate William Randolph Hearst, and opened in 1903 in service of the idea that Berkeley was becoming a new fount of democracy, the “Athens of the West.” If you know some of the context for this image, you might think about how you are looking at a young, radical woman, barely out of her teens, who has become a folk music revival celebrity. If you also know Baez's music, you might hear faint echoes of Baez's signature soprano vibrato and fingerpicked acoustic guitar in your mind's ear as you look at the photograph. If you know the San Francisco Bay Area, you might remember the delicious crispness or downright chill of a spring or summer evening in that region, the fog rolling in over the bay toward the Berkeley Hills. Otherwise — or even if you do have those sonic and sensorial resources available — you are left only with a mute image.
Is there more here than meets the eye? If so, can AudioVisual DH help us access it, or at least think more productively about photographs of musical performance or of sound being made? We cannot magically recover the sounds of Joan Baez performing in some miraculous feat of archeological excavation (at least not yet), but the tactic of what I call digital image sonification offers one example of how computation can enhance better close reading, more compelling interpretation, and deeper contextual understanding of pictures of sound — indeed of pictures in general. While archeologists and a number of historians involved in digital humanities have been rightfully interested in recovering or modeling acoustic environments from the past, what I refer to as image sonification is drawn more from media studies concerns about what a digital photograph is.[1] When a photograph is optically scanned and transformed into an image file, we can correlate its pixels, shapes, and other features to sound, creating an audio version of the image. As media studies scholar Wolfgang Ernst points out, "For the computer, the difference between sound and image, if it is counted, would count only as the difference between data formats [ital. in original]." The transpositional possibilities become useful for synesthetic modes of audiovisual DH. As Ernst puts it, "Digital memory ignores the aesthetic differences between audio and visual data and makes one interface (to human ears and eyes) emulate another" [Ernst 2013, 66]. There are many aspects of the image we could sonify, and, to be sure, there is no essential one better than another necessarily. As Taylor Arnold and Lauren Tilton note in their research on what they call "distant viewing" of large visual corpora, "Raw pixel intensities hold no meaningful information out of context" [Arnold 2019, 3]. Nonetheless, even for one image — maybe especially for one image — multiple modes of experiencing data can produce new knowledge through the defamiliarization of the artifact [Shklovsky 2017].
Defamiliarization involves the purposeful alienating of a text in order to glimpse its inner workings and larger implications more perceptively. Originating as a new kind of modernist literary analysis, it is not far removed from Bertolt Brecht's "distancing effect" or tactics in surrealist art-making or Walter Benjamin's efforts to develop new modes of alienated aesthetic analysis in response to the "age of technological reproducibility" [Brecht 1964, 91–99] [Benjamin 1968, 217–252]. Image sonification also joins the long-running digital humanities interest in "deformance," in which strategic acts of altering a text produce new versions with revealing differences to the original — indeed, sometimes questioning what we even mean by the concept of an "original" [McGann 2012] [Sample 2012]. At some level, all digital humanities efforts, even the most committed to positivistic statistical analysis, involve this kind of distortion of the empirical record. As Lisa Gitelman has taught us, "raw data is an oxymoron," and all data are "always already new" [Gitelman 2006] [Gitelman 2013]. To be sure, there are truths to discover, but when it comes to analyses of images and sounds, the truths are often multi-perspectival, multivocal, and rarely self-evident. By actively and critically playing with the representations produced as and through data, using computers to aid in the process of hearing images as well as viewing them, we can seek out fuller, richer interpretations of the materials we study. We can listen more deeply to what we are seeing.[2]
To sonify image data is to flip the contemporary obsession with data visualization, in which data of any kind are converted into what Franco Moretti famously called "graphs, maps, trees" [Moretti 2007]. If we stop privileging the optic, the viewer — now the listener — can pivot between the visual and the audial. This places the two halves of the audiovisual binary into dialogue with each other. After all, if we already visualize sound when creating digitally produced spectrograms of frequencies, why not sonify pixel information? New details and fresh interpretive possibilities, ideas, meanings, and implications emerge when one studies photographs by combining optical and audial perception together in new synesthetic modes of analysis. Understood this way, image sonification offers an underexplored, exploratory audio tactic that can lead to discoveries about visual sources. As Kevin L. Ferguson argued in the 2019 Debates in the Digital Humanities, "Rethinking our viewing practices in a digital age…requires an investment in experimental, theoretical methods that run counter to the rationalist uses of quantitative data often employed in DH work" [Ferguson 2019, 336]. If we shift from the aggrandizement of the statistical as well as the tyranny of the visual to more adventurous digital considerations of artefactual representation — if we embrace Fred Gibbs and Trevor Owens' foundational digital humanities call for a more critically aware and creatively expansive "hermeneutics of data" — we can activate the ear as well as the eye to perceive more accurately and revealingly a fuller interpretive picture of the past [Gibbs 2013].
I have argued elsewhere that digital image sonification offers a provocative method for pursuing a fresh historical understanding of archival photographs. In my earlier essay, I particularly noticed in particular how image glitching and sonification produced new interpretive perspectives on race in the US folk music revival [Kramer 2018b]. In this essay, I turn my eyes and ears to questions of gender, space, performance, and democracy in music performance at the height of the early 1960s folk revival [Kramer 2018a]. Once again, I am struck most of all by the capacity of digital image sonification to amplify the stakes of photographic representation.[3] The visual comes alive in a different register when transformed via its presence as pixels into sound. Hearing what we are viewing asks us to see a photograph anew. The ear aids the eye in perceptual reorientation. New perceptions provide opportunities for more perceptive interpretations.
As an experiment, I placed the Joan Baez image into an application, Photosounder, created by programmer Michel Rouzic.[4] Photosounder reads images left to right and correlates pixel brightness to frequencies of pink noise, with lower, quieter pitches generated by darker pixels and higher, chirpier pitches created by brighter areas of a digital image. There are a few parameters one can manipulate, but generally, it is a very simple image sonification program, only hinting at the variations with which one might experiment. Even so, what resulted was a kind of sonic x-ray of the image that brought out not so much the music Baez was making when the photograph was taken, but rather, to my ears, the dynamics of space and gender in her performance at the Greek Amphitheater.
Figure 2. 
Sonification using the application Photosounder of photograph of Joan Baez performing at the Hearst Greek Theatre, Berkeley Folk Music Festival Collection, Charles Deering McCormick Library of Special Collections, Northwestern University Libraries, n.d. (possibly 1963), photographer unknown, https://dc.library.northwestern.edu/items/87ac8798-6aaf-456e-9c3e-dc187c796115. Frequency scale logarithmic base set weighted toward the logarithmic at 2.0 on a proportional scale of 1 to 2.
Figure 3. 
Sonification with using the application Photosounder of photograph of Joan Baez performing at the Hearst Greek Theatre, Berkeley Folk Music Festival Collection, Charles Deering McCormick Library of Special Collections, Northwestern University Libraries, n.d. (possibly 1963), photographer unknown, https://dc.library.northwestern.edu/items/87ac8798-6aaf-456e-9c3e-dc187c796115. Frequency scale logarithmic base set toward the linear, at roughly 1.2 on a proportional scale of 1 to 2.
One fascinating quality of image sonification is that sound transforms space into time, creating an intensification of the spatial relations captured in the photograph by sequencing them temporally. The very dimensions of a two-dimensional photograph now have aural depth and presence.[5] This is a useful way in which to enter into thinking about qualities of audience-performer relationships that the photograph might quite literally flatten or even obscure or only suggest vaguely in visual form. For me, the most productive parameter to play with was the frequency scale logarithmic base. Set to a higher scale, it produced more of a whisper when the sonification reached the figure of Baez center stage. Set to a lower scale, it produced a series of chirps and whistles. Mostly, however, the darkness of the image save for Baez under the spotlight turned out to be intriguingly quiet. Lack of sound can, after all, be part of image sonification and AudioVisual DH analysis too.[6] The sound murmured across the dark areas of the photograph. Only when it arrived at the illuminated figure of Baez did it create a small bubble of sound with the frequency scale logarithmic base set to 2.0 (Figure 2) or burst out with a quick set of windy, echoing whistles with the frequency scale logarithmic base set to roughly 1.2 (Figure 3). As if to announce her spectral presence, far away from the top of the Greek Amphitheater where the photographer was, both sonifications alerted me to the dynamics of her presence on stage, a distant figure under a stark white spotlight, but also the focus of attention.
Figure 4. 
Audience at Joan Baez concert at the Hearst Greek Amphitheatre, Berkeley Folk Music Festival Collection, Charles Deering McCormick Library of Special Collections, Northwestern University Libraries, n.d. (possibly 1963), photographer unknown, https://dc.library.northwestern.edu/items/1c69c98a-98e0-4923-8730-19d0546480ef.
Figure 5. 
Sonification using the application Photosounder of photograph of Audience at Joan Baez concert at the Hearst Greek Amphitheatre, Berkeley Folk Music Festival Collection, Charles Deering McCormick Library of Special Collections, Northwestern University Libraries, n.d. (possibly 1963), photographer unknown, https://dc.library.northwestern.edu/items/1c69c98a-98e0-4923-8730-19d0546480ef. Frequency scale logarithmic base set weighted toward the logarithmic at 2.0 on a proportional scale of 1 to 2.
Two themes emerged the more I listened to the sonifications and looked at the photograph. First, despite the ideals of the folk revival as a decentralized, communal, democratic movement, the sonification intensified the distinction between Baez‚ the emerging celebrity on stage, and the audience. Second, I began to wonder more about the stakes of Baez as a female performer within the revival. On the first theme, the sonification amplified how this was no campfire "Kumbaya" session, but rather a spectacle at scale, with a star on stage who drew everyone's attention and the masses listening to and looking at her under the stark white spotlight. Compare, for instance, to another image sonification of the audience, probably taken from before or after the same concert (Figure 4 and Figure 5). In this one, the sonification was much noisier, picking up the many patterns of heads, shirts, spotlights, and parts of the Greek Amphitheater. Here is scattered attention, the dispersed buzz of people before or after shared concentration on a conventional concert recital performance. Taken together, the image sonifications of the two images signaled how the photographs of Baez at the Greek represent key tensions in the early 1960s folk revival. On the one hand, it sought to be anti-commercial and anti-hierarchical, shifting music-making from the power differentials between entertainer and audience to a shared experience of musical communion. On the other hand, its popularity led to increasingly conventional modes of presentation, precisely entertainment, a commercial undertaking with all the imbalances between a star and passive spectators.
Baez, who became perhaps the iconic female figure of the revival, was particularly caught up in these tensions. A second theme about gender erupted from the image sonification. At times, she tried to follow in the footsteps of Pete Seeger, adopting progressive and radical political causes or following in his collective singalong concert tradition; at other times, she embraced the role of a distant icon of the revival. On stage at the Greek, the sonification alerts us to the contradictions she sustained as a performer. She is but a whisper in one sonification, a little burst of whistles in the other. In both, the larger darkness almost swallows her up sonically in the epic space of the Greek Amphitheater. And yet she is also the only part of the image to generate sound, especially in the whistle sonification. This reminds us that it is Baez, who focuses attention from the audience, holds forth at the sole microphone we glimpse in the photograph. She is the only figure under the spotlight. Save for a quick whisper of sound from the proscenium of the stage and an illuminated walkway to the side of the stage, we only hear her in the sonifications.
And what of the sound generated from the image itself in this particular experiment? The ghostly, hollow quality of one sonification and the whistle blips of the other caused me to ponder the difference between Baez, the person, and Baez, the performer, how one was always lurking within the other. A spectral absence within a charismatic performance, the actual Baez offers an embodied presentation of authenticity that is oddly disembodied. The sonifications amplified for me a young woman was thrust into the spotlight of the folk revival at the height of its popularity. She had to negotiate the constraints of gender within the folk revival milieu. As such, Baez was able to hold forth, convene an audience, speak her musical truth, and articulate progressive political ideas publicly. At the same time, Baez was reduced within the revival. One would see this a few years later in her relationship on stage and in the music business to Bob Dylan, with whom she started performing around the time this image was taken. While he pushed forward with the freedom to reinvent himself, claim the mantle of the high modernist artist, Baez was often limited to gendered roles of the girlfriend, harmony singer, or placed in a kind of virginal Madonna stereotype within the folk movement [Baez 2009] [Hadju 2001].
All that just from some pink noise generated by pixel brightness? Some might complain that I am "reading" too much into the sonification. Others might contend the opposite: that all the sonification revealed was what, to a critical eye, the image already visualized. Precisely on both counts. The fuss of going through the process of sonification is not to get away from the contextualized interpretation of image data, but rather to read data that constitute the photographic artifact more probingly, carefully, and insightfully (or is it now insoundfully?). By shifting from visual to aural form, one can get closer to the fullness of what the artifact itself contains, what it suggests as some of its meanings and implications. Moving from merely optic investigation to a synesthetic movement between image and sound, between the optic and the visual, allows for far richer analysis, a more adventurous inquiry into even one photograph.
Overall, digitally listening to as well as looking at photographs allows for greater potential access to underlying performative, emotional, spatial, embodied, and contextual dimensions captured in imagery. As art historian Tina M. Campt writes about her non-digital approach to looking at photographs from the African Diaspora so too for digital image sonification: “When the practice of listening is not just about hearing but an attunement to different levels of photographic audibility,” what emerges is “an attunement to sonic frequencies of affect and impact.” The human sensorium is not neatly divided between eye and ear.[7] It is capable of new perceptions from the synesthetic interplay between the visual and the audial. Listening to images can lead to, as Campt puts it, an “ensemble of seeing, feeling, being affected, contacted, and moved beyond the distance of sight and observer” [Campt 2017, 41–42]. Campt is most interested in discerning subjectivities of Black fugitivity and futurity in the photographs she examines. Image sonification expands the repertoire she outlines by bringing the power of digital computation to this project, using it to treat photographic data not as totalized empirical facts, but rather as sources of multidimensional meanings that bring one more deeply into the images less discernable aspects.
Digital image sonification, pursued through the method I am proposing, also offers an avenue for crisscrossing between "distant" and "close" reading in digital humanities scholarship. As Alan Liu predicted in 2012, "one of the next frontiers for the digital humanities will be to discover technically and theoretically how to negotiate between distant and close reading" [Liu 2012, 17]. Sound can convey subtle qualities of timbre, tone, interrelationships between different elements, and other kinds of information that visual, textual, or statistical data do not display as effectively [Schedel 2014]. So too, sound presents particular qualities for negotiating the divide between distant and close reading because it represents space as time. This allows for the compacting of large amounts of information into a quick signal, allowing one to listen "at scale." It also, paradoxically, produces the ability to magnify minutiae by rendering data temporally, forcing an observer to take in details more slowly, as I did with the multiple sonifications of the Joan Baez photograph. Capable of going big or small with data, digital image sonification asks us to seek truth without resorting to overly simplistic and reductive assertions of objectivity. In the immersive "acoustic space," as Edmund Carpenter and Marshall McLuhan called it, "sight isolates" while "sound incorporates," as another scholar, Walter Ong, famously contended. "Whereas sight situates the observer outside what he views, at a distance," Ong argued, "sound pours into the hearer" [Ong 1982, 70] [Carpenter 1960, 64–69].[8] Scholars such as Emily Thompson, Jonathan Sterne, Veit Erlmann, Alexandra Supper, and Karin Bijsterveld have since complicated simplistic compartmentalization of the senses into sight and sound as discrete entities, which is precisely why thinking synesthetically about the interplay between the two might be most productive as an AudioVisual DH tactic [Erlmann 2014] [Sterne 2003] [Sterne 2012] [Supper 2014] [Supper 2015] [Supper 2016] [Thompson 2002] [Thompson 2013].[9] Digital image sonification brings us into an image's form and content, but because we are doing so synesthetically, digital image sonification also alienates us from what we hear and see, poising us in an intermedial and intermediary space between the visual and the aural — a fruitful perceptual interzone for consideration, interpretation, and intellectual discovery.
Finally, digital image sonification presents an example of how AudioVisual DH can push forward research in digital history and archival thinking as part of digital humanities writ large. By heightening the perception of fleeting feelings and sensations from archives of the past, we might realize Bethany Nowviskie's call for “speculative collections,” lively play with repository holdings to activate the dormant meanings of artifacts [Nowviskie 2016]. To my eye and ear, turning the Baez photograph in which music was being made back into sound summoned forth some of the “archival liveness” that Tom Schofield and colleagues picture digital technologies capable of accomplishing [Schofield 2015] [Ward 2019]. Rather than merely reproduce the knowledge of the archive flatly, rather than only look at the past only through the surface of the archive's gaze, image sonification delves more deeply and precisely into data to transpose archival knowledge into new keys. It opens up history's evidentiary record for what might be present — or absent — from sight alone. We cannot hear the music Joan Baez was making at the Greek Amphitheater that night in the early 1960s, but we can hear the stakes of her music-making more potently when we sonify the photograph of her. Digital image sonification invigorates both the content and the context of the image. Issues of gender, power, embodiment, spectacle, performance, hierarchy, performance, and more emerge. Synesthetically pivoting between the visual and the audio in a cyborgian dance of data, signal, image, sound, history, and human perception, image sonification activates an artifact's data in fresh, audiovisual ways. In doing so, it also activates the scholarly imagination.


[1] On archeological approaches, see [Wall 2014] [Einix 2014] [LaFrance 2006]. This research is particularly inspired by Wolfgang Ernst's theoretical insights, although he might not entirely agree with my turn back to questions of cultural history. See [Ernst 2013][Ernst 2016]. See also, [Bolder 2000] [Gitelman 2014] [Sayers 2018] [Chun 2005]. On the history of photography, see classic studies such as [Sontag 1977] [Barthes 1981].
[2] For a discussion of deep listening and critical play in the digital humanities, literary studies, linguistics, and sound studies, see [Clement 2015, 348–357]. Examples of the sonification of other sorts of data besides images include [Joque 2011] [Laumeister 2015] [Foo 2014] [Schedel 2014].
[3] For overviews of the history of the folk music revival, see [Cantwell 1997] [Filene 2000] [Cohen 2002] [Donaldson 2014] [Wald 2015].
[4] For more information, visit the website for Photosounder.
[5] For more on the social construction of spatial relations handled through digital methods, see [White 2010].
[6] One immediately thinks of John Cage's famous thinking about silence as useful here. See [Cage 1961].
[7] As Jonathan Sterne notes, the separation and classification of the human sensorium into five distinctive senses is a historical outcome of the Enlightenment, not a timeless biological phenomenon. See [Sterne 2003].
[8] With thanks to Robert Cantwell [Cantwell 1997] for reminding me of Ong's theories of sound and its significance.
[9] Many works in sound studies are useful for thinking beyond the optic without fetishizing the ear as a supposedly better sense. See also [Corbin 1998] [Smith 2001] [Rath 2003] [Bull 2004] [Smith 2004] [Novak 2015].

