A Design Methodology for Web-based Sound Archives

Annie Murray  <amurr_at_ucalgary_dot_ca>, University of Calgary


Well-designed digital tools facilitate the creation of new knowledge in the humanities. Good design is user-centered, focused, and needs-driven, all of which depend on a rich understanding of the target audience or end user. Unsworth’s scholarly primitives [Unsworth 2000] and the work of Palmer, Teffeau and Pirmann [Palmer et al. 2009] on scholarly information practices provide a framework for understanding how humanities scholars do their work. We propose applying this framework to the design of a spoken word archive, with the aim of designing a digital tool that is optimized for the documented practices of scholars. We propose that listening and annotation are key activities of humanities scholars performing literary criticism of audio recordings. Taking the SpokenWeb poetry project as an example, we discuss how designing a web-based tool with these key activities in mind could facilitate close and critical engagement with recordings of spoken poetry. We present a methodology for designing a web-based sound archive for literary criticism and we propose features and functionalities that facilitate this criticism.

[P]roviding the collections and tools needed for producing new scholarship is arguably the most important role for cyberinfrastructure and will require a digital resource base "that is developed for specific scholarly purposes".  (Palmer, Teffeau & Pirmann, Scholarly Information Practices in the Online Environment: Themes from the Literature and Implications for Library Service Development  [Palmer et al. 2009]


As more digital libraries and digital humanities projects are developed, it is crucial to ensure that they are designed with the user experience in mind so that they are useful, sustainable, and can help generate new methodologies and knowledge in the humanities. Academic and memory institutions increasingly recognize the importance of acquiring and making available unique or primary source materials that will stimulate new scholarship. As rich scholarly resources are increasingly made available on the Web, scholars and designers need to create user interfaces that take into account how humanities scholars work and behave. Instead of expecting researchers to conform to an interface that does little more than present a collection, we should design scholarly tools that conform and adapt to the detailed and documented practices of scholars. We should create tools to disseminate primary source materials, and these tools should be supportive of the formative stages of the research and writing practices of humanities scholars.
The researchers from the SpokenWeb project, based in Montreal, Canada, are designing a digital spoken word archive of digitized archival recordings of poetry readings given in Montreal from 1966-1974 by major North American poets. In doing so, we hope to develop new forms of critical engagement with literary recordings and to develop a comprehensive, modular and adaptable template for the handling of spoken word archives. Eventually, we hope that other scholars and institutions with analogous holdings of literary recordings can adopt and adapt SpokenWeb for their own recordings.
The primary goal of the SpokenWeb project is to create a sound archive that encourages scholars to engage with the sound recordings in ways that facilitate their research. Whereas for many sound archives the focus is accessibility (i.e. simply making a collection of sound recordings web accessible), SpokenWeb’s focus is interactivity and productivity. The aim is to include design features grounded in empirical research on the needs of humanities scholars, in order to facilitate the types of activities and tasks that are the building blocks of their research, and also to experiment with innovative features that help scholars engage with the sound archive in generative ways that they did not realize were possible.
UbuWeb is a web-based educational archive of avante-garde audio, visual and textual material, where the emphasis is strictly on accessibility: sound recordings on the site are accessible via a minimalistic playback interface where the only options are the playback and bookmarking of recordings. SpokenWeb, in contrast, aims to include features and tools which facilitate scholarship directly into the web interface through tools to compare multiple sound recordings or transcripts side-by-side, tools for sound visualization and playback manipulation, and the ability to annotate a poetic work with scholarly commentary.
While many scholars use sites like UbuWeb for their scholarly activity, most of that scholarly activity likely takes place outside the site. Users can download files from UbuWeb, for example, for closer analysis of a given piece. The minimal design of UbuWeb does not leave room for an extensive array of tools to help scholars engage with the site’s content. SpokenWeb’s aim is to design a sound archive that promotes in-site scholarly activity. Thus the goal is to have scholars both access and engage with the content on the web.
On a technical level, SpokenWeb is built using the WordPress content management system (CMS) and incorporates SoundCloud’s media player with its application programming interface (API). While we are evaluating several CMS systems, WordPress is the current front-runner based on its large user community and high degree of replicability. This ease of replication is important given our stated goal of creating a sound archive "recipe" that other cultural heritage institutions can follow, in order to open their own collections to scholarly activity. SoundCloud’s robust media player and API is particularly important, as it allows us to experiment with innovative features such as sound visualization, tethering the audio playback to a written transcript, and playback manipulation, features that all depend on an API. Archival copies of the sound recordings are stored offline in WAV format. Access copies are MP3s (encoded at 320 kilobits per second) that are streamed via the SoundCloud player. For copyright reasons, the majority of the sound recordings are password protected, with access limited to selected users for educational purposes. A small number of files are publicly accessible.
Screen capture of audio playback on the SpokenWeb site.
Figure 1. 
Screen capture of audio playback on the SpokenWeb site. Among the site’s features are sound visualization, in the form of a waveform display, and tethered playback, where the transcript is hyperlinked to different time points in the playback.
Using 89 recordings of poetry readings from a dynamic period in North American poetry, the SpokenWeb team of literary scholars, designers, computer scientists and librarians is developing a flexible and modular interface that tries to foster a critical practice of close listening. Close listening is an approach and practice of deep engagement with performed or recorded poems (sometimes referred to as audiotexts) that has been developed and encouraged by poet Charles Bernstein in his various writings and through his co-founding, with Al Filreis, of the PennSound website. Bernstein foregrounds the importance of sounded poetry, arguing that

To be heard, poetry needs to be sounded - whether in a process of active, or interactive, reading of a work or by the poet in performance. Unsounded poetry remains inert marks on a page, waiting to be called into use by saying, or hearing, the words aloud.  [Bernstein 1998, 7]

Bernstein affirms that the sound of poetry contributes to its meaning, and that "poets, especially twentieth-century innovative poets, work with sound as material, where sound is neither arbitrary nor secondary but constitutive"  [Bernstein 1998, 4]. He also cites the poetry reading in particular as

...one of the most important sites for the dissemination of poetic works in North America, yet studies of the distinctive features of the poem-in-performance have been rare (even full-length studies of a poet’s work routinely ignore the audiotext), and readings - no matter how well attended - are rarely reviewed by newspapers or magazines).....A large archive of audio and videos documents, dating back to an early recording of Tennyson’s almost inaudible voice, awaits serious study and interpretation.  [Bernstein 1998, 5]

The purpose of this article is to address a specific format (digital audio) used by researchers undertaking a particular humanist activity (literary criticism) when they engage with recordings of poetry readings. Drawing upon the literature of scholarly information use and behavior, we determined that two key activities in literary criticism - reading and notetaking - would likely find their corollary activities of listening and notetaking in the online audio environment. Our desire to develop a sound archive based on the actual behavior of humanities scholars is challenged by limited knowledge of how scholars of any kind conduct research that relies on listening instead of reading. Despite our goal to develop tools appropriate to the literary enterprise, criticism of sounded poems is new and consequently under-documented within the field of information studies. In this absence, we are (somewhat) hopefully adapting known behaviors of literary scholars and imagining how they might be replicated or adapted in an online audio environment. While we fully intend this to be an exercise in evidence-based design, we will rely on an iterative design process that includes user testing to elucidate user needs.
There is a need for research that explores how scholars engage with audio recordings in their work. Ideally, observation of researchers using audio formats would be the basis for a considered ethnography of scholarly listening practices. Historically, audio recordings have presented challenges as a research format. Recordings are often stored on fragile and degrading magnetic media, require specific and possibly obsolete playback equipment, and are hard to manipulate and annotate. As digital objects, sound recordings are still somewhat difficult to navigate, annotate and manipulate. Most media players in sound archives tend to offer very limited playback functionalities. Digital audio presents interesting options for the navigation and annotation of sound; options that are simply not possible in the analogue realm. With well-developed digital audio workstations, we may be able to "unlock" much media content from earlier format limitations.
We explore models of scholarly behavior and investigate the different phases of scholarly research in order to reveal what specific scholarly functions we could and should facilitate in an online sound archive. Taking into account other models of research processes developed by researchers who studied literary critics or music scholars, we speculate what system requirements or design features might best suit literary criticism of sounded poems. We draw upon cognitive and multimedia studies that ground learning in more than one sense modality at a time (summarized in [Murray and Wiercinski 2012]. For the sake of brevity, we focus on two activities that are central to literary criticism - reading and note taking - and suggest features to facilitate and enhance these core activities in a web-based sound archive. We are guided by the following idea expressed in Our Cultural Commonwealth that: "providing the collections and tools needed for producing new scholarship is arguably the most important role for cyberinfrastructure and will require a digital resource base 'that is developed for specific scholarly processes' " (cited in [Palmer et al. 2009, 34]). In this spirit, we present a brief design methodology at the paper’s conclusion, with the goal of responding to known scholarly habits and processes.

Scholarly Primitives, Information Activities, and Practices

John Unsworth proposed the notion of scholarly primitives, which are "basic functions common to scholarly activity across disciplines, over time, and independent of theoretical orientation"  [Unsworth 2000]. These intentionally abstract primitives, the atomic elements of scholarly activities and processes, include the following:
  • Discovering
  • Annotating
  • Comparing
  • Referring
  • Sampling
  • Illustrating
  • Representing
Scholarly primitives "are basic to scholarship across eras and across media," and arose from the need to build improved networked tools for humanities scholars. The goal was to better understand the scholarly process and its composite activities in order to develop tools that facilitate them, to "imagine some basic functions of scholarship that might be embodied in tools which, given a common architecture, could be combined to accomplish higher-order (axiomatic) functions"  [Unsworth 2000]. Unsworth’s list was offered in the spirit of starting a conversation, and was not meant to be definitive or exhaustive.
Palmer and her colleagues refined the scholarly primitives concept by emphasizing "the explicit role of information in the conduct of research and production of scholarship"  [Palmer and Cragin 2008] and by "emphasizing a sense of the primitive as something at the base or beginning of a larger process"  [Palmer et al. 2009]. Palmer et al. modify Unsworth’s vocabulary and propose a more elaborate framework that consists of scholarly information activities which include searching, collecting, reading, writing, and collaborating. The activities are then comprised of more granular activities, which they label as primitives. Further, they propose the idea of "cross-cutting primitives" which are component, granular activities that are not tied to any one particular information activity and which are typically applicable to more than one.
  • 1. Searching
    • 1.1 Direct searching
    • 1.2 Chaining
    • 1.3 Browsing
    • 1.4 Probing
    • 1.5 Accessing
  • 2. Collecting
    • 2.1 Gathering
    • 2.2 Organizing
  • 3. Reading
    • 3.1 Scanning
    • 3.2 Assessing
    • 3.3 Rereading
  • 4. Writing
    • 4.1 Assembling
    • 4.2 Co-authoring
    • 4.3 Disseminating
  • 5. Collaborating
    • 5.1 Coordinating
    • 5.2 Networking
    • 5.3 Consulting
  • 6. Cross-cutting Primitives
    • 6.1 Monitoring
    • 6.2 Notetaking
    • 6.3 Translating
    • 6.4 Data Practices
Whereas Unsworth is deliberately format neutral, Palmer et al. are primarily grounded in a text-based world of information. Their framework includes the activity of reading - a text-based activity - but excludes listening and viewing, activities that are associated with audio, video, or image formats. An audio recording of an interview or a video recording of a news broadcast contain information in much the same way that a text document does. So for our purposes, we remedy this text-centric tendency by supplementing Palmer’s scholarly activity of reading with the additional activities of listening and viewing. With audio formats, the primitives tied to reading (i.e., scanning, assessing, and re-reading) also roughly translate to listening, with a slight adjustment. The activity of listening, for example, would include the primitives of scanning, assessing, and re-listening.
The multi-institutional and interdisciplinary Project Bamboo strikes a balance between Unsworth’s minimalist, abstract and format-neutral primitives and Palmer et al.’s more detailed but textually-anchored information activities by introducing "themes" of scholarly practice. These "themes" originate from a workshop that "...brought together scholars, IT professionals, and librarians from around the world to chart a direction for cyberinfrastructure development in the humanities"  [Project Bamboo 2010, 1].
Project Bamboo’s framework proposes additional format-neutral activities such as modeling, visualizing, and synthesizing which enrich the contributions of Unsworth and Palmer. Table 1 shows the relationship between the Bamboo scholarly themes, the Unsworth primitives, and Palmer et al.’s scholarly information activities [Project Bamboo 2010].
Bamboo theme of scholarly practice Unsworth primitive OCLC scholarly information activity
Gathering / Foraging Discovery Searching (direct searching, chaining, browsing, probing, accessing)
Synthesizing / Filtering Comparing
Collecting (gathering, organizing)
Contextualizing Referring Searching (chaining, browsing, probing)
Collecting (organizing)
Cross-cutting (monitoring)
Refining and Critiquing
Reading (scanning, assessing, rereading)
Cross-cutting (notetaking, translating)
Writing (assembling)
Collaborating (consulting)
Documenting methods Representing Writing (disseminating)
Cross-cutting (translating)
Managing data Discovering
Searching (accessing)
Collecting (organizing)
Collaborating (coordinating, consulting)
Annotating /
Annotating Writing (assembling)
Cross-cutting (notetaking)
Modeling / visualizing Illustrating
Cross-cutting (translating)
Writing (assembling)
Overlapping teaching and research Representing Collaborating (coordinating)
Cross-cutting (translating)
Sharing / dissemination / publishing Representing Writing (disseminating)
Funding Suggested parenthetically No analogue
Collaborating Common thread throughout scholarly primitives, not listed separately Writing (co-authoring)
Collaborating (coordinating, networking, consulting)
Citation, credit, peer-review Referring Reading (assessing)
Writing (dissemination)
Collaborating (consulting)
Table 1. 
The relationship between the Project Bamboo scholarly themes, Unsworth’s primitives, and Palmer et al.’s scholarly information activities ([Project Bamboo 2010, 2–3], reproduced with permission)
For us, the conceptual framework of the scholarly primitives functions as a type of abstract checklist for building new digital tools. The primitives call attention to the types of activities that scholars need to do. Accordingly, we need to plan and design environments with functionalities that will facilitate the primitives.
To take Unsworth’s primitive of "discovery" as an example, we will ensure that the SpokenWeb sound archive facilitates scholars’ discovery of the content within our archive, and that it promotes discovery of connections and relationships between individual items such as poems, particular recordings, transcriptions, and so forth. This could take the form of search functionality, perhaps in the form of a simple search box, but could also include more sophisticated browsing tools which allow a scholar to explore the recordings without having a fixed starting point (i.e. a particular author, topic, or terminology) in mind. For instance, our browsing tools should facilitate serendipity.
Unsworth’s primitives are a good starting point for this type of checklist, but by their very nature they are abstract and general. To improve our design checklist, we look at Palmer et al.’s scholarly information activities and the Bamboo project’s themes of scholarly practice. Both of these frameworks are more specific than Unsworth’s list, and are grounded in empirical research that shows, among other things, that the types of activities that scholars engage in will vary as a result of their disciplinary approach. From a design perspective, then, questions about the intended audience begin to emerge. The SpokenWeb project has to take into account the discipline-specific behavior of literary scholars.
Palmer et al.’s Scholarly Information Practices in the Online Environment summarizes scholarly information activities across disciplines. Figure 2 provides a summary of the scholarly primitives associated with the humanities and sciences, as well as those involved in interdisciplinary ventures.
Venn diagram showing three intersecting domains: Humanities, Sciences, and Interdisciplinary.
Figure 2. 
Scholarly primitives associated with disciplinary approach ([Palmer et al. 2009, 35], reproduced with permission
Not surprisingly, scholars from different disciplines tend to engage in certain activities more than others. For example, browsing and rereading are important for humanities scholars, as we see in Figure 2, whereas direct searching and data sharing are more important for scientists. Our project pays close attention to Palmer et al.’s conclusion that "...humanities scholars and other researchers deeply engaged in interpreting source material rely heavily on browsing, collecting, re-reading and notetaking"  [Palmer et al. 2009, 35]. Accordingly, we are designing the SpokenWeb with features that facilitate these core activities.
In addition to questions of audience we want to pay close attention to the related and important question of format. As previously discussed, Palmer et al. are somewhat biased towards text-based information. Palmer et al. do list the specific formats that are typically used by a given discipline (e.g. in the "Source materials by discipline reported in the RLG reports" table on page 5), but they do not talk about the relationship between scholarly activities and non-text formats. Non-textual formats are increasingly used in scholarship, and they deserve careful attention from designers of tools. Scholarly activities change depending on the information format being used. In our view, activities that need to be supported for the use of audio formats are listening, re-listening, and manipulating the playback of recordings.
As noted, we have by necessity extended the scholarly information activity of "reading" as a placeholder for a slightly broader spectrum of intake or "consumptive" activities extending beyond the world of text into audio and visual realms. Our project supplements the list of core humanities primitives that we will prioritize in the site’s design (i.e., browsing, collecting, re-reading and notetaking) with the additional scholarly activities of listening and viewing, and their corresponding primitives (e.g., re-listening, re-viewing, et cetera).
Unsworth’s primitives thus serve as a useful starting point or checklist for those who are designing digital tools. Palmer et al.’s scholarly information activities and Project Bamboo’s themes of scholarly practice provide empirically-grounded considerations for that checklist and raise important questions of audience and format.

Reading Audio and Annotating Sound

We were unable to find empirical research that provided a detailed account of how scholars work with audio recordings.
Clara Chu’s descriptive model of the research process of literary critics shows the role of information sources at each stage, though audio recordings are not specifically addressed. As she notes, "the body of literature dealing with the information needs and uses of literary scholars is small"  [Chu 1999, 249]. Her study identifies the following phases of literary critical work:
  • Idea Generation
  • Preparation
  • Elaboration
  • Analysis and Writing
  • Dissemination
  • Further Writing and Dissemination
Chu notes that literary criticism is not a "clearly defined step-by-step sequential process." In this model, information use, and the reading and re-reading of the text being studied occurs extensively in both the Preparation and Analysis and Writing stages. Naturally, we then ask how literary criticism might occur in a digital audio environment. Does the use of sounded poems yield any variation to the model of literary criticism suggested by Chu? Indeed, she recommends that a study is needed of the "way information technologies and the availability of electronic texts may be affecting literary critics’ work, communication, and information seeking"  [Chu 1999, 270].

Reading Audio

We wonder the extent to which the practice of careful reading and re-reading of text is replicated when a literary recording, as opposed to a printed text, is the object of study. To facilitate the practice Charles Bernstein has termed "close listening," we are building SpokenWeb to explore how scholarly reading practices might inform scholarly listening practices. We assume that critics will need to listen repeatedly to particular passages in a web-based audio environment. Thus the chosen media player should facilitate the user’s ability to listen, slow down, pause, loop and repeat particular passages of the recording. Further, there may be a feedback loop between the listening tools or playback device and the process of close listening. Close listening could be a much richer experience in a well-designed web-based sound archive, with features that facilitate sophisticated manipulation of playback, than might be possible with playback on an analog reel-to-reel machine, for example. New tools, with the right features, could facilitate new ways of close listening.
Given the absence of a body of literature which explains how scholars approach a sounded as opposed to a printed poem, we provisionally suggest that the attentiveness to the text, signaled by careful and repeated reading of a poem, will have strong carryover to the way a scholar would listen critically to a performed poem. To facilitate the reading of poems with both the eyes and the ears, the SpokenWeb has integrated transcripts of the audio recordings into the web interface. It is possible for scholars to listen to and/or read the content of the poetry readings. Since much learning we do is multimodal in nature, we recognize that other modalities inform listening [Murray and Wiercinski 2012]. We believe having the option to read words as one listens to poems could benefit the researcher by enhancing comprehension and analysis of the recording.
The ways in which scholars read is varied. In her work interviewing humanities scholars about their practices, Oya Rieger noted:

Reading, which is a critical process in research, is a multiply nuanced concept, with each type serving a unique purpose. The scholars I interviewed referred to multiple types of reading such as deep reading, close reading, skimming, or eyeballing. They emphasized that re-reading was a significant part of interpretive work and involved periodic interactions with selected texts.  [Rieger 2010, 87]

Chu suggests that reading is a process that occurs repeatedly throughout the different phases of a literary critic’s work [Chu 1999]. Humanities scholars in particular tend to read and re-read texts, and as Palmer, Teffau and Pirmann suggest, "Re-reading is one of the primary reasons that scholars build personal collections. For humanities scholars, rereading a work is a significant part of interpretation and analysis"  [Palmer et al. 2009, 21]. Basic audio players featured in most web-based collections of literary recordings are not conducive to repeated listening of poems, as they do not allow the listener certain basic controls over the recording. In the SpokenWeb interface, we strive to allow the listener more control over the recording so that the reading/listening process can be conducted as much as possible on the scholar’s terms. For this reason, we have adopted Soundcloud’s media player, designed for electronic musicians and DJs, because it allows for more user control of the recording. Many libraries and archives are limited in digitization resources and could become focused on the digitization process itself and the attendant storage or metadata creation costs to the extent that the usability of the digitized content understandably becomes a low priority. For this reason, we hope to contribute an adaptable and user-friendly platform for providing access to digitized or born-digital audio recordings.
Perhaps the work habits of music scholars can shed light on how a group of researchers works with recordings and textual sources simultaneously, since a composer or music historian might be listening to a performance and studying a score at the same time in the way a literary critic might sit and listen to a recording and consult a print book or a transcription at the same time.
According to Brown, the most commonly used information resources in music research are print, audio, and video sources. She notes that "[t]he relatively high ranking of recordings shows the great importance of listening in the research process"  [Brown 2002, 82]. While the study did not provide details about the scholarly listening process, one participant in the study remarked that "one of the things that is so basic to this kind of work is that you are constantly working back and forth from the example [a music recording] to the text [the criticism being written], back to the example, back to the text"  [Brown 2002, 86]. She also notes "this constant going back and forth is a real feature of writing about music"  [Brown 2002, 86]. We suspect that this is how criticism of sounded poems might be performed as well: the critic is constantly listening and re-playing parts of the recording as she writes. Therefore, it is essential to make this kind of switching between activities (listening, reading, and writing) as natural and seamless as possible in a web-based environment.
It is often noted how Web browsers and online research materials tend to lead to a more shallow kind of reading, or at least change the nature of the reading experience [Nicholas and Clark 2012]] [Cull 2011] [Carr 2010]. These cultural critics tend to mourn the gradual reduction of sustained reading practices that largely characterized reading in the print world. We thus wonder how the practice of literary criticism will evolve with audiotexts and the increasing use of online resources in general. Not only are scholars engaging with an audio performance of a poem, but they are also doing so through a web-based tool. What does this mean for the critical practice of a literary scholar? This type of literary study will bring with it the need to explore how listening online compares to reading in print; the new habits that would emerge from such a practice; and a consideration of what it means to work with poetry in auditory instead of visual and typographic terms. Finally, what are the implications of our interactions with sound being mediated through a visual interface on a computer screen?

Annotating Sound

Having established the primacy of reading for humanities scholars, and the emerging importance of reading and listening for scholars of sounded poetry, we can now address a related scholarly primitive, annotation, and the related information practice of notetaking. Brockman et al. (as cited in [Palmer et al. 2009] and Palmer & Neumann [Palmer and Neumann 2002] establish reading and writing as fundamental activities for humanities scholars. Scholarly notetaking is an activity that enhances reading and forms a bridge towards a later finished work of written criticism. As a scholarly practice, notetaking is critical to both the reading and writing phases of literary work.
Hillesund’s [Hillesund 2010] study of a group of humanities and social science researchers sheds lights on scholarly work processes in its efforts to describe reading behavior all the while acknowledging that "reading is a most familiar activity, solidly packed and sedimented. It is one of those deep and complex phenomena that are so close to the mundane that their basic traits are hard to discover and talk about."What becomes clear is how physical and material the act of reading is, how tied to the body, space and objects. According to Hillesund, annotation improves comprehension of text, slows the pace of reading, and helps scholars to record and remember points. The "annotation habit is probably a way of processing information, giving it time to fit into schemas in long-term memory and provide time and space for reflection and discovery of inferences." Given the "kinaesthesia, and motor control (dexterity) as well as tactile and visual perception" that characterize scholarly reading and writing, we can see how important and challenging it is to develop ways of delivering digital content that work in keeping with deeply embedded physical and mental processes associated with reading and writing. Thus, reading and writing are deeply intertwined in the scholarly process, and notetaking is a central behavior to both. Most digital tools have a long way to go in facilitating this most critical activity in scholarship. Again, we note that tools for text annotation and manipulation have received much more development and critical attention in the Digital Humanities, while tools for annotation of audio formats instead take their cues from web-based entertainment sites, e-learning environments and online music delivery and discovery tools.
The sheer volume of valuable but fragile analogue audio material in libraries and archives suggests that many institutions will face the challenge of not only offering digitized recordings, but delivering them in such a way as to make them useful to scholars.
Annotating a digital source such as a recording may come less naturally to a researcher, however. Bradley and Vetch [Bradley and Vetch 2007] and Hillesund [Hillesund 2010] have observed that digital tools are not as likely to support annotation. This is something to strive for in interface design since notes represent the beginnings of written criticism. As Audenaert and Furuta suggest, having note taking capabilities in digital tools would greatly support scholarship:

Notetaking, however, is not merely the semi-formal representation of facts. Instead it is an integrated part of the iterative writing process. Scholars think of their notes specifically in terms of how that information will be represented in the published form of their work and organize them accordingly. To be successful, support for notetaking should be designed with this in mind and should provide a clear path to transition from information notes to a final publishable manuscript.  [Audenaert and Furuta 2010, 290]

They also suggest the importance of being able to export notes from a given system into a writing software such as Microsoft Word [Audenaert and Furuta 2010, 289]. Therefore, if skillfully deployed, annotation features in a digital environment would support both scholarly reading and writing, offering a more natural and seamless feel to a possibly unfamiliar digital workspace. Moreover, there is no need to assume that annotations will take the form of written or typed comments. In a web-based sound archive, the researchers may wish to record a spoken comment or annotation that responds to a particular passage. Similarly, an annotation might take the form of an image or a video. Our early steps in providing an opportunity for users to annotate recordings in SpokenWeb is through the adoption of SoundCloud’s commenting feature which allows users to make comments at any point in a recording. A future phase in development would establish a means for exporting the time-stamped comments into the scholar’s preferred writing environment.


In recent years, considerable work has addressed how a humanities scholar’s "workbench" or digital research environment might look and function [Toms and O’Brien 2008] [Toms and Flora 2005] [Palmer and Neumann 2002] [Rieger 2010] [Audenaert and Furuta 2010] [Project Bamboo 2010]. Similarly, work has been undertaken to determine user requirements for the humanities [Bowman et al. 2007].
We wish to direct attention specifically to how digitized or born digital audio recordings might best be optimized in an online work environment. Despite a lack of research on the information behavior of scholars who work with audio formats, we establish a checklist of features and system requirements for a web-based tool for analyzing literary recordings. Drawing upon an analysis of humanities scholarship, we suggest that the following features (see Figure 3) would serve the various activities, primitives and overall scholarly behavior central to the humanities scholar’s work.
Figure 3. 
From Primitives to Design Features. This table maps some of Unsworth’s scholarly primitives to Palmer at al.’s scholarly information activities, then proposes equivalent activities specific to a sound archive. We suggest features that could enhance and support these audio-based activities and provide examples when possible.
Below is a list of web sites and audio tools listed in Figure 3:
As Palmer and Neumann have noted,

interface design and human-computer interaction has, for the most part, ignored the needs of the humanities scholar. When the interface is considered, it is limited to discussions of items such as the Boolean logic problem and the vocabulary mismatch between system provision and user understanding ... Despite the profound impact of technology on this scholarly community, little is known about how computers have affected humanities scholars’ work flow, unless it is to say that scholars adopt technologies when they augment established research practices  [Palmer and Neumann 2002, 104–5]

. By exploring how researchers use a particular type of primary source, we hope to contribute to this format-based gap in user behavior information.
Palmer et al.’s study suggests that "collecting, reading, writing and collaborating, and especially the cross-cutting primitives, are much more sparsely supported online and often only as a byproduct of existing systems rather than as a deliberately designed feature"  [Palmer et al. 2009, 42]. Deliberately designed features are precisely what we want to bring to web-based audio interfaces.
Part of the design phase of the project will examine which activities scholars carry over into the digital realm. We are intrigued by the possibility of developing a tool that can offer features to support existing habits, but also possibly enable new activities to help scholars. As Rimmer et al. note, "the media used (paper, books, shelves, etc. or screens, keyboards, mice) afford different kinds of interactions"  [Rimmer et al. 2008, 1389]. Since humanities scholars often function with habits based in both the print and electronic world, we hope to facilitate a smooth overlap between these worlds, and develop an interface that allows for seamless integration with normal scholarly activities.
Our evidence-based design methodology is the means by which we can build a tool humanities scholars will use and exploit as readily as their traditional textual sources. Rieger notes that "multimodal environments require distinct modes of engagement from writers and readers"  [Rieger 2010, 112]. While there is a long history of literary critics working with printed texts, the history and methods of scholars working with audio recordings remain underdocumented or unknown. As such, it is much more difficult to anticipate user behavior within such environments. We do not yet know how scholars use web-based sound recordings in their work. We do not know what interesting habits could emerge. Thus we have begun the design of SpokenWeb with the goal of delivering a flexible and modular user experience. The use of web-based audio files for literary research will be inherently multimodal since the recordings will be embedded in a graphical user interface and the listener will have the option of consulting transcripts of the recording or focusing solely on listening.
The LAIRAH project analyzed 21 digital humanities projects in order to elucidate the factors that led to their use and uptake in scholarly communities. Noting that only two projects carried out formal user testing in the early stages of project design, the researchers observe:

User consultation was relatively rarely undertaken, despite the fact that it helps projects to design effective resources, and to avoid developing in ways that users may find over complicated or confusing. However, user testing, like disseminating information, is a skill that most humanities scholars have not acquired. It is therefore important that digital projects should be willing to work with those who already have expertise in this area, for example, researchers from Human Computer Interaction, Library and Information Studies, or practitioner librarians.  [Warwick et al. 2008, 93]

Indeed, user awareness, engagement and testing was one of the LAIRAH’s six concluding recommendations for good practice in the construction of digital humanities projects [Warwick et al. 2008, 394]. We hope that in both observation and user testing we can glean information that could be useful to other digital projects intended for scholars who make extensive use of audio recordings. As Palmer and Cragin note "directly engaging domain scholars as collaborators or partners in research design and interpretation of results is important for reducing the chain of inference required to determine implications for the design and development of technologies for specific research communities"  [Warwick et al. 2008, 198].
As much as we believe digital humanities projects should be more oriented around user experience and informed by well-conceived user testing, we simultaneously see the merit of the viewpoint offered by Gay and Hembrooke in their book Activity-Centered Design: An Ecological Approach to Designing Smart Tools and Usable Systems:

User-centered methods also fail to identify future uses, needs, and problems that users and developers might not independently envision. This is especially important for nascent technologies, which people will inevitably view in the relatively constrictive terms of old technologies (such as using a digital hand-held machine to replace the old portable audiotape guide system in a museum).  [Gay and Hembrooke 2004, 19]

We want to ensure that our sound archive performs in a way that reflects scholarly practice, but since web-based audio is relatively new, we look forward to anticipating and designing around new or unexpected behaviors. To this end, we believe that the Unsworth primitives, Palmer et al.’s scholarly activities, and the Bamboo themes can function as a checklist of the types of scholarly activities that we need to design for; they are empirically based and remind us to design for known needs and requirements. But the primitives and themes can also be helpful in generating new features that take advantage of the opportunities afforded by new technologies. As stated above, user centered methods can be limiting in that they are inherently conservative, backwards-looking design methods. The Unsworth primitives can be especially helpful in this regard. They are abstract in nature and are not tied to any particular discipline or format and can therefore help us to think creatively to envision new features that take full advantage of the potential afforded by new technologies while minimizing the impact of the "constrictive terms of old technologies"  [Gay and Hembrooke 2004, 19]. The Unsworth primitives can, in a sense, function as a type of thought recipe whereby a designer is reminded of a general scholarly activity (e.g. discovery) that will need to be designed for, but with nothing further: the designer is forced to improvise and experiment with different ways of how to realize this primitive in a new digital substratum.
Ideally, then, the interface design will provide the more familiar and comfortable features that facilitate the types of activities that scholars know that they need to do, but will also be generative and experimental in taking advantage of the opportunities provided by new technology and will develop new features that scholars will find advantageous, and will come to need, but of which they are currently unaware.

An Evidence-based Design Methodology

  • Become familiar with Unsworth primitives, Palmer et al’s scholarly information activities, and the Bamboo themes of scholarly practice.
  • Consult available research (e.g. [Palmer et al. 2009]) to help identify all of the primitives and practices that are the most relevant to the project’s target user community. (For example, our project involves literary scholars, so we have chosen reading and annotation as key activities to support.)
  • Identify the information sources and format types specific to your project.
  • Using information gleaned from steps 1-3, adopt, adapt, or create features that would facilitate relevant primitives and practices. [1]
  • Conduct user testing to evaluate the proposed features. Redesign or modify as needed.


We have explored the applicability of various models and conceptions of scholarly behavior, particularly of humanities scholars, to interface design for a digital spoken word archive. Scholarly primitives [Unsworth 2000], scholarly information practices [Palmer et al. 2009] and the Bamboo themes of scholarly practice (n.d.) have all informed our sense of which scholarly behaviors need to be supported and facilitated in a sound archive.
The question of format looms large in our project, since audio formats present basic challenges of accessibility, usability and annotatability. Since our design project involves delivering and providing a workspace for analyzing a challenging format, we have chosen to focus attention on two scholarly habits - note taking and deep/close reading - that are especially relevant for literary critics. We have suggested ways in which a web-based tool can better facilitate these core activities so intrinsic to the research and writing process of the literary critic.
Surprisingly little information is available that reports on how scholars use audio recordings. This is striking given the historical and current importance of audio recordings in fields such as music, law, anthropology, oral history and communication studies, not to mention the ubiquity of sound recordings on the web today. Most studies of scholarly behavior analyze how scholars use texts, and thus the models and frameworks to describe scholarly activities are biased towards text and print-based information resources. We have tried to imagine how scholars use audio recordings based on well-established, even empirical, observations of scholarly behaviors. But since not much is known specifically about how researchers use web-based audio recordings, it has been necessary to draw upon evidence-based design principles developed by scholars of e-learning and multimedia learning. Thus, in order to begin to articulate design principles for web-based spoken word archives, our project is assembling evidence and principles from cognitive studies and web design and is investigating how they could apply to a humanities research tool.
The potential of deeper scholarly engagement with born digital or digitized sound recordings is promising. We see a fortuitous convergence of scholarly and technical factors, all of which suggest that web-based research using audio recordings will be made easier. We can benefit from useful models of scholarly behavior, the growing availability of archival media on the web, the deepening understanding of multimodal learning, the growth of performance studies, the ever-expanding array of functionalities in web-based applications and the potential for a new mode of literary analysis. It is our hope that we will contribute to an emerging and evidence-based list of design criteria that is based on scholarly behavior. Ideally, we can contribute to the formulation of functional requirements for scholarly software for humanities scholars who use audio recordings. Web-based tools need to fit in with established behaviors and practices in literary criticism, but we should also actively facilitate the emergence of sophisticated and responsive tools which can contribute to new or enhanced practices in literary scholarship. The creation of new knowledge in the humanities depends not only on better understanding the role of sound in the work of humanities scholars, but also incorporating this knowledge into the design of sound archives.


[1]The mapping from primitives and practices to interface feature may be relatively straightforward. For example, in our project designed for humanities scholars, the scholarly practice of notetaking maps directly into a feature that facilitates audio annotation. Alternatively, the primitives and practices can function more abstractly, generating thinking and questions that could lead to experimental features. For example, Unsworth’s discovery primitive challenges designers to create new features that facilitate such a rich primitive in the unique context of their project.

