Adam Bradley is a PhD candidate in both the department of English Language and Literature and Systems Design Engineering. He is interested in the intersections between technology and traditional literary studies with a focus on early 20th century poetics. His current work focuses on digital tool design for literary criticism and investigations into how philology can still function within a technological context.
Currently director of the interaction design lab Logic&Form, Travis is a creative engineer, designer and researcher. He holds an MA of Interface Culture and a BSc of Interactive Arts, with a specialization in Interaction Design.
Mark Hancock is an Associate Professor of Management Sciences in the Faculty of Engineering at the University of Waterloo and Associate Director of the Games Institute. His research is in the field of Human-Computer Interaction and includes the design and development of interfaces and interaction techniques for digital surfaces. His research also focuses on the science of games and interaction?using concepts from game design, such as engagement, immersion, and enjoyment, to inform the design of more motivating and compelling novel interfaces.
Sheelagh Carpendale is a Professor in the Department of Computer Science at the University of Calgary where she holds a Canada Research Chair in Information Visualization and NSERC/AITF/SMART Technologies Industrial Research Chair in Interactive Technologies. Her research on information visualization, large interactive displays, and new media draws on her background in Computer Science, Art and Design.
This is the source
In the Digital Humanities, there is a fast-growing body of research that uses data visualization to explore the structures of language. While new techniques are proliferating they still fall short of offering whole language experimentation. We provide a mathematical technique that maps words and symbols to ordered unique numerical values, showing that this mapping is one-to-one and onto. We demonstrate this technique through linear, planar, and volumetric visualizations of data sets as large as the Oxford English Dictionary and as small as a single poem. The visualizations of this space have been designed to engage the viewer in the analogic practice of comparison already in use by literary critics but on a scale inaccessible by other means. We studied our visualization with expert participants from many fields including English studies, Information Visualization, Human-Computer Interaction, and Computer Graphics. We present our findings from this study and discuss both the criticisms and validations of our approach.
A new, comprehensive technique for mapping language.
One of the goals of the literary critic is to analyze language and its embedded complexity. For example, when literary critics examine a poem’s form, they consider many characteristics of the words it contains, including the similarities and differences in orthography, sound, the visible pattern it produces, rhythmic structure, and countless others. Nearly all of this information is available through visual inspection of the poem and contained in what may already be our greatest visualization technique — the written word. However, this same inspection carries with it the biases introduced by the semantic meanings of the words themselves: it is difficult to pay attention to the structural parts of the word “apple” without imagining the fruit it represents.
One possible method of aiding in the process of literary criticism is to provide
an alternate representation
There have been a variety of examples in the digital humanities and information
visualization disciplines that provide alternate representations of language,
such as Word Clouds
In this paper, we present a design study of an information visualization that is a recoverable representation of language. Specifically, we present the design of a visualization we have named Language DNA (L-DNA) that visually encodes any symbol system, in our case the letters and phonemes of words in the English language, and a qualitative evaluation with several literary critics and designers to investigate the need and use of such a visualization.
Our study is an examination of how to visualize language in ways that can build ontologies of words based on the needs of the literary critic. Much work has been done in the area of text visualization but none have yet approached a level that can produce whole language interactions. Our work builds on previous text visualization techniques while expanding on the scope of information that can be shown in our system, Language DNA.
Many previous text visualizations focused on some form of text. For instance
there are visualizations of documents
Within the domain of digital humanities, there are three main streams of
visualization that tend to pervade the literature. All three types are based
in text analysis but approach the problem from different directions. The
first is the GIS type tools for organizing spatial data in the humanities
The second category involves tools used to augment reading, text analysis
tools that highlight relationships in texts, often with visualizations. This
category includes work such as Clement’s distant reading of Gertrude Stein
The third category of research is the straight visualization projects which
are of two types: The first are the projects that use toolkits such as
Voyant tools smashed
together produce effects like those seen in images from
particle accelerators.
The development of our technique was motivated by the fact that existing visualizations and text analysis tools, while usually aesthetic pleasing, cannot easily be used for the types of analyses expected or desirable for literary criticism or linguistics. Our Language DNA is an attempt to build a system that can handle multiple levels of information to display complex structural and content-related relationships within texts.
In his book
what if criticism is a science as well as an art?
The visualization algorithm we present can accommodate as much or as little information that a critic could want, giving the possibility of visualizing as little as a single letter, as much as entire corpora. If we are to imagine a space where the literary critic or linguist can experiment using a language-based mathematics, it must have these three characteristics:
First, a visualization algorithm is needed to encode language such that we can create a space that is both consistent and reversible. In mathematical terms, this would be referred to as one-to-one and onto. The need for consistency and reversibility arises from the requirement in the analysis process of preserving the ability for human interpretation of the words. In order for the visualization to be “readable” by a critic, each word must be consistently mapped and that mapping must be reversible into the original work. Most existing visualization techniques distort the original texts without providing an avenue for reconstituting them. This is problematic when studying things like poetics where the spatial component of the text is integral to its meaning.
Second, it is important that the algorithm uses a plotting space that is
infinite. Imagine that we were to approach the problem of metaphor. As an
example, in theory we do not know if the chain of meaning created by
metaphoric relations between words is finite. It stands to reason then, that
without an understanding of the full requirements of a system that an
infinite space is a safe decision. If we are to start to approach these
types of questions our lack of knowledge should not limit the size of
possibilities. Specifically, the critic must be able to analyze words and
literature that are perhaps not known to the visualization designer. With
our technique, we can encode anywhere from one letter to an entire language,
to the entire literatures of one language, and even multiple languages, in a
space where each point belongs to one individual piece of original
information. The infinite space means that in creating experiments
the literary critic is not limited to our present understanding of
language.
The third requirement is the need for the ability to layer symbols in order to make comparisons. For instance, it should be possible to overlay a poem within the context of the entire language or other poetry. The need for layering arises from the analogic basis of most types of literary critical and linguistic inquiries. This should extend to any types of symbols, as comparisons are not always rooted in the Latin alphabet. Based on what we recognize as the possible requirements of such a system, we designed our Language DNA visualization with the three characteristics of consistency and reversibility, infinite plotting space, and layering in mind.
An important property of our mapping is that the words be recoverable from the visual space. Mathematically, this requires that the mapping be a bijection (i.e., that words both map to a unique place in the visual space, and that each point in the visual space maps back to a word). As an example of the technique we introduce a mathematical translation of words to numbers that relies on the lexicographical ordering of letters. This is essentially a mapping of alphabetical order and is one of possibly infinite ways to group the data. We have chosen this technique as a first demonstration because we are all familiar with the way we order a dictionary, but we must stress that we can map the data many different ways. We define the mapping 𝑔 so that each letter is mapped to its position in the alphabet, as follows:
𝑔 ∶ 𝐴 → ℤ
where 𝐴 is the set of alphabetical characters {𝑎, 𝑏 … , 𝑧} and:
𝑔(𝑎) = 1, 𝑔(𝑏) = 2, … , 𝑔(𝑧) = 26
Note that this mapping is currently written using base 10 numbers for the integers (1 to 26, with an implied 0 for no character), but our mapping requires a base 27 representation (or more generally base N+1, where N is the number of characters in the language), which for convenience we will symbolically represent as follows:
110 = 𝑎27 (i.e., 1
in base 10 is represented as ‘a’ in base 27)
210 = 𝑏27
...
2610 = 𝑧27
Thus, we can define our mapping of words to a one-dimensional number line as follows:
𝑓: 𝑊 → (0,1)
where 𝑊 is the set of alphabetical words (e.g.,apple
, dog
,
the
, etc.) such that for each 𝑤 𝜖 𝑊,
𝑤 = 𝑥1 𝑥2 ... x𝑛,
and:
𝑓 (𝑤) = 0.(𝑥1)𝑔(𝑥2) ...
𝑔(𝑥𝑛)
For example, for 𝑤 = “dog”, we have x1 = ‘d’, 𝑥2 = ‘o’,
𝑥3 = ‘g’, therefore:
𝑓 ("dog") = 0.𝑑𝑜𝑔
Note that this is a base 27 number, but could be converted to base 10:
0.𝑑𝑜𝑔27 = 4x27-1 + 15x27-2 + 7x27-3 = 0.169079910
If we relax the restriction that each word needs to end (i.e., we allow words to have an infinite sequence of letters), it becomes clear that 𝑓 is a bijection, since every word generates a unique base 27 representation (one-to-one: the property that if two words map to the same number, they must be the same word) and each number between 0 and 1 can be converted to base 27 to recover the sequence of letters (onto: the property that every number has a word that can map onto it).
The mapping above describes how an arbitrary word can be mapped onto a number line, which already allows the visual mapping of words onto an axis in 1D space (similar to a lexicographical axis). Here, we describe a method, inspired by Cantor’s Diagonalization to map an individual word onto 2D space directly (and more generally onto n-dimensional space).
We can split the word in two by considering every other character, for
instance
𝑓2:𝑊 →
(0,1)2
𝑓2(𝑤) = 𝑓2(𝑥1𝑥2 ...
𝑥2𝑛) = (𝑓(𝑥1𝑥3 ...
𝑥2𝑛-1) ,
(𝑓(𝑥2𝑥4 ... 𝑥2𝑛))
For example, if our word is applesauce
(Fig. 3):
𝑓2 ("applesauce") =
(0.𝑎𝑝𝑒𝑎𝑐,
0.𝑝𝑙𝑠𝑢𝑒)
which, in base 10 would be:
𝑓2 ("applesauce") =
(0.0592410,0.6100587)
This mapping can easily be extended to n-dimensions by taking every nth character in the base
27 representation of (𝑤). f2 is also clearly a bijection, because every word
can be split into alternating characters to generate two base 27
representations (one-to-one) and each pair of numbers between 0 and 1 can be
converted to base 27 to recover the two parts of the word, which can then be
reassembled (onto). Thus, every word in the English language can be mapped
onto a 2D space using f2,
and every 2D point can be mapped to a word
, where a word is a
sequence of possibly infinite letters which may well not have associated
semantics. Note that this mapping does not account for things like homonyms,
but with a simple addition to the mapping we could easily differentiate
words for any number of their ontological characteristics.
Since whole natural languages are immense, it is important to discuss scale, both of what is being visualized and the size of the resulting visualization. We can base a visualization size calculation on the number of words being visualized, and then determine the length of a 1D L-DNA visualization that draws at a density of a single pixel for each word or unit (it is important to remember that these calculations are for orthography, they will change depending on the symbol system used). Two measures are needed to accomplish this: the smallest and the largest distance between two words. Since our algorithm already normalizes words in 1D to be between 0 and 1, we can assume that the difference between the largest and smallest words is approximately 1.0 (with the words ‘a’ and ‘zygote’, this is already correct to 1 decimal place). In our analysis of words from the Oxford English Dictionary (OED), the two closest words using our algorithm are “abandoner” and “abandoning”, with the first seven letters in common and the next being very close in the alphabet. The difference in values from our algorithm for these two words is:
0.abandoning27 - 0.abandoner27 = 1.37 x 10-1110
Thus, to present a number line from 0 to 1 with numbers only 1.37×10-11 apart represented as different pixels would
require:
1.0 ÷ (1.37 x 10-11) = 73.1 billion
pixels
Note that in 2D, our algorithm fairs far better. This same pair of words would be broken down into two pairs of coordinates:
(0.aadnn27, 0.bnoig27) and (0.aadnr27, 0.bnoe27)
which has at most four letters in common for each dimension and would require only:
1.0 ÷ (0.aadnr27 -
0.aadnn27) = 1.0 ÷
(2.79x10-7) = 3.6 million pixels
To put this into perspective, a 1D visualization using our algorithm would require the width of 38.1 million 1080p screens (1920 × 1080 pixels) placed side-by-side, and a 2D visualization would require 6.2 million 1080p screens arranged to form a rectangle.
We start with three examples to demonstrate how L-DNA can be used to visualize
language. The first is a dictionary mapping for the English language, the second
is a view of multiple languages, and the third is a mapping of English phonemes
to illustrate the applicability of this approach to any set of symbols
Fig. 4 shows all 370,624 words parsed with criteria that eliminate diacritics
and punctuation the Oxford English Dictionaryf2. The result is a mapping that privileges the
first two letters of each word. That is, the x-axis can be read as an
alphabetical ordering of the first letters of words, and the y-axis can be
read as an alphabetical ordering of the second letters. This makes the top
left box AA
, where you would find words such as Aardvark
(note
that few words in English begin with two A’s, which is why this box is quite
sparse). This property is recursive, so that within each box, the third and
fourth letters are similarly privileged. For example, in the BA
box,
there is an NA
box, which has another NA
box that contains the
word BANANA
. This initial visualization shows how we can start to
understand where each word belongs within the 2d whole.
Because our algorithm privileges the spelling of words, this 2D
representation can be thought of as a form of 2D orthography (specifically
spelling rules). It is essentially a two-dimensional layout of alphabetical
order. This version of L-DNA reveals a bird’s-eye view
of the
language that was not previously available to the literary critic, linguist,
or lexicographer; a critic could previously flip through a dictionary’s
pages or even a list of ordered English words, but this visualization
instead provides a new 2d spatial location for each word in this
dictionary.
Our second example compares multiple languages (English, French, German, and Spanish). Fig. 5 shows these four languages each represented in 1D on the 0 to 1 number line using our algorithm, stacked for comparison. Visual inspection reveals a similar sparseness in the ‘Q’ portion of the line for all languages (i.e., few words in any of these languages begin with ‘Q’ and any letter other than ‘U’), but additional sparseness in French, German, and Spanish exists near the end of the alphabet (‘W’, ‘X’, ‘Y’).
Fig. 6 also shows a side-by-side comparison of multiple languages in 2D, and Fig. 7 overlays these four languages. Fig. 8 shows a close-up of the AL region of the overlaid image. These side-by-side comparisons or overlays allow for elementary analogic comparisons and can be expanded on with more complex symbol encoding.
The above images were generated with the constraint that we only had access
to open source dictionaries
We chose the next example (Fig. 9), English phonemes, to demonstrate the
robustness of the technique to arbitrary symbolic representations of
language, and to create an analogue between the spellings of words and the
sounds of words. The mapping is organized in like sound units: vowels (e.g.,
AA, AE
), semivowels (e.g., W,Y
), stops (e.g.,
B,D,K
), affricates (e.g., C, H, JH
), fricatives (e.g., D,
SH, V
), aspirates (e.g., HH
), liquids (e.g., L, R
),
and nasals (e.g., M, N, NG
).
By organizing words into phonemes, some interesting observations can be made. It appears clear that a portion of the phonemes are used primarily for the first syllable and another distinct set is used primarily for the second syllable. This can be observed through the densely populated top and right columns, with the majority of the bottom-left part of the image containing almost no English words. In addition, the top-right corner is mostly empty, with the exception of a few very dense groups, representing the few phonemes that are used for both the first and second syllables.
The final example that we created was to insert a single poem into the space that we created for English words and phonemes. This is a first step in being able to use these spaces for analogic comparisons.
Some interesting patterns can be observed in the poem through visualizing it in this manner. Firstly, in terms of orthography (Fig. 10) it becomes possible to identify visual rhymes by cluster groups within the image. In Fig. 11, this same phoneme visualization can be used to identify rhyming patterns within a poem. As the phonemes group together it is possible to see the types of sounds being repeated in the piece. Although this is easy to do with a 16 line sonnet, it becomes much more difficult with a poem of any significant length (e.g. Milton’s 10,000 + lines of verse in
macro-structuresin poetry. Each diagram below is laid out in two dimensions. This is a decision made during the encoding process and can be n-dimensional based on the amount of information you wish to build into the model. In this case we have chosen to show 2d representations for simplicity. In Figure 10 we present what we label our alphabetical order visualization where we represent words by their spellings. A visualization of spelling alone may not lead to many insights, but is useful for simple demonstrations. One area where this simple encoding could be used would be to visually compare irregular spellings in Elizabethan drama. It would provide quick visual access to the places in texts that differed and needed an editor’s attention. The real power in this method comes from being able to encode as many connections as desired. Work has been done in the digital humanities and computer science in the last few years in word embedding models and the consistency of our method could aid in the process of detecting connections in texts by using vectors. In figure 11, we have graphed phonemes in two dimensions. Any highlights that form vertical lines are showing alliteration in the poem. An example is in the line from Donne’s poem:
Or like a thiefe, which till deaths doome be read. The words
deathsand
doomline up vertically to indicate alliteration in this particular encoding. If we wanted to visualize rhymes we would simply encode the phonemes in reverse, privileging the final phoneme and we would generate a similar graph. The n-dimensional nature of the models allows for as much or as little data coding as needed, including relations between words. The only limitations on the questions that can be asked are the imagination of the analyst.
We wanted to discuss this project with a cross-section of scholars to develop a better understanding of how people understood L-DNA and whether or not they saw potential uses for their research. Our goal was to gain insight into whether this technique could inspire reactions and possibly spark interest in the approach.
We intentionally sought participants from a variety of disciplines. We had 14 participants which included 1 visual artist, 3 literary critics, 1 rhetorician, 2 digital media critics, 1 database programmer, 1 business analyst, 1 linguist, 2 interaction designers, 1 graphic designer, and 1 marketing specialist.
Each participant took part in a thirty-minute interview and was first shown L-DNA visualizations that we had intentionally left void of any legends or axis labels, so that we could ask questions about their initial interpretations. After showing the image in Fig. 4, the mapping of the OED in two dimensions, we asked what they thought the image might be. We then showed participants Fig. 7, four languages plotted in the same space, and the interviewer gave a thorough explanation of the how the algorithm works and what they were seeing. We took time to make sure they were comfortable with the explanation and asked about their understanding. We then showed them Fig. 8 to be able to further explain L-DNA and asked questions based on the participants’ understanding. We also asked participants to complete a post-interview questionnaire. Five questions were asked on a 5-point scale, with an opportunity to provide free form answers:
Q1. Once explained to what extent is the visualization readable?
Q2. Do you think the white space has meaning?
Q3. Is the white space necessary?
Q4. Does representing languages by colour and words as points work well?
Q5. Does this spatial representation of language trigger new ideas?
The following three questions asked for free form answers only:
Q6. What are your initial interpretations of this visualization?
Q7. Can you imagine a more suitable or readable structure?
Q8. Please provide any criticism you have about this visualization.
The scale questions were answered as follows. For Q1, 6 out of 14 participants told us the visualization was clear after the explanation (5 out of 5 on the scale). FOR Q2, 9 out of 14 people said that the white space carried meaning to them (5 out of 5) and 6 participants thought that this white space was completely necessary. For Q4, 7 out of 14 people ranked a 5 out of 5 for the visualization approach. 10 out of 14 participants said that the visualization triggered new ideas for research (5 out of 5).
We have formulated our discussion around the free-form questions. Since our participants were experts from a variety of fields and domains, the similarities in answers in some cases are particularly interesting. In other places it is the difference in answers that encourages us as researchers in terms of the potential of L-DNA as a tool for approaching questions about language. In this discussion we include the questions and a discussion of the general themes that arose in the responses.
Interestingly, when shown the images without any labels, participants tended to engage in metaphoric comparisons of what they were seeing. The omission of a legend led each participant to find something that was cognitively analogous to what they were being shown. For example, some responses were:
In response to this process, five out of nineteen (26%) of our participants from varying backgrounds and fields, with no explanation of what they were looking at, described Fig. 4 as appearing like DNA — the inspiration for the name of our technique. This result also demonstrates the power of representation held within L-DNA. Some of the responses received from participants included references to stamped or fading paper, city grids, abstract art, and digital clock faces. Because we were trying to create a space that could handle an ontology of words, the DNA metaphor was highly applicable based on the implications of describing parts of a long chain of information.
In our interpretation of the study data, the questions about whitespace
(Q2-Q3) produced perhaps the most interesting results. It was during this
question that most of the participants began to hypothesize about the
space
in the languages they use every day. Essentially the parts
of the visualization with an absence of marks inspired thought, because they
were in stark contrast to the actual dots drawn on the screen. This result
strongly indicates the analogic or comparative possibilities of this
technique — the literary critic can begin to understand what makes a word
English by recognizing what is not a word, or by investigating what words
poets or writers use that push the boundaries of language. This result is
encouraging for the domain-specific problems of literary criticism. One of
our requirements is a space for analogues and with the whitespace in this
simple mapping there is significant affect. The response of our participants
to the relationship between the whitespace and the space occupied by words
creates a relationship that gives insight into what sets of symbols we use
and which we do not use in our language. The sheer size of the whitespace in
comparison forces an understanding of how few of the possible arrangements
of letters we actually use. With further work we think it is possible to
show that more complex mappings will produce more complex analogues.
It was generally agreed upon that it was the relation of the empty space to the marked space that created meaning and inspired insight into what the visualizations were showing and what they could show.
It was in the white, or the lack of spellings, that our participants saw the potential for growth in the language, or commented on the enormous range of letter combinations that we do not use in English. This is encouraging because as we build “meaning” into these visualizations we expect the response to be comparative, and we expect new interpretations will result from these comparisons. Initial reasons were as follows:
For the researchers this result was surprising, but was explainable. Without
meaning
being built into this version we were simply showing a
part of orthography (spelling) and, in this stripped down version of the
possibilities of the space, it was the comparison between what was empty and
what was marked that sparked the interest of our participants. This is the
exact response we were interested in and we anticipate with more complex
representations we will be able to see more complex analogies. Some of the
responses to this whitespace analogy are as follows:
These responses were typical of all our participants and are very promising for future work.
When asked if the images they were seeing inspired any new thought processes, they proved to be exciting to our participants and the answers that follow show the breadth of their thinking:
Interestingly the 1 person out of 14 who said no to being inspired (included below) touched directly on the fact that the simplicity of our representation of orthography was limiting but suggested in his negative response that it could be interesting for literary criticism if we could find a way to include meaning, which is fruitful ground for future work and possible with the three criteria we laid out above for the development of the technique:
Our participants were in some cases critical of the design of L-DNA. In particular, the most common criticism centered on making the technique interactive, which is a clear (and previously planned) next step in our iterative design process. Another criticism was that the encoding we used was arbitrary, and its meaning was not immediately clear:
We see this interactivity as the next steps in developing an application for these types of interactions and it is in that interactivity that the objections to the arbitrariness of the design will be addressed. We hypothesize that being able to investigate the space dynamically, and by defining other symbol systems to encode, the literary critic will be able to explore the types of meaning and associations being looked for by our participants.
From our qualitative study, the overwhelming result was the importance of the white space in our visualizations to the entire group that was interviewed. This reaction has influenced the direction of our future work and demonstrates that this space can be used for the types of comparisons that we are interested in, namely those that lead to interpretive possibilities. We recognize that we have presented a simplified form, but the technique itself allows for infinite complexity. Some of our participants talked about the need to include information that gives meaning to relationships and that is the next step in developing mappings that can solve the original problem of creating a space that can be experimented in with language and literary theory. Our study confirmed the idea that this technique has the potential to answer these much more complex questions as they relate to the domain in question. The sheer breadth of answers to our question in relation to inspiring new ideas is extremely promising and we take it as a success in developing the type of space that can inspire investigation.
In response to the discussion with participants about the interest in whitespace and the lack of density in certain regions, we produced a density and inverted density map to highlight the white spaces, shown in Fig. 12 and Fig. 13.
This was in direct response to comments from our participants such as:
In this iteration of our design, instead of rendering each word in the
language we instead cluster groups of words into the boxes representing
pairs of words (AA
, AB
, etc.), and render the box using
transparency that corresponds to the density of words therein. The inverted
version of this mapping highlights all of the non-English words, which were
clearly of interest to participants.
We have discussed how the blank spaces are compelling and how the absence of words in these spaces generally seems to intrigue people.
It is interesting to note that this space is and has been filled in many
interesting ways. For example, E. E. Cummings poem ygUDuh does not contain a
single English word, yet it can be read as English, where the words
take on the sonic characteristics of English when read aloud. For example,
the first few lines of the poem are:
That when read allowed becomes a phonetic map for a type of early 20th century urban slang exemplified by the
poem:
In Fig. 14 we have plotted ygUDuh overlaid on the OED grid. Note how these
non words
exist largely on the edge of the word spaces and the
white spaces. This may be because, while they are not English words — hence
the white space proximity — they have similar vowel and consonant structures
to English words. By seeing these words overlaid on the whole language, we
can see a visual representation of Cummins’ craft, of the attempt to make
non English words that sound like English when read aloud. The fact that all
of the non-words are situated on the edges of heavily populated space tells
us that these arrangements of letters that we try to make into words when
reading the poem are closer
to English words than we think, at least
in terms of spelling.
Another example where new types of words
or at least English
communications are evolving is in instant messaging, text messaging, and
social media. It seems that for ease and speed, we can give up many letters
— chiefly the vowels — in words and still retain meaning. Fig. 15 shows MSN
words
overlaid on the OED visualization. It is interesting to
note that many of the new words
(marked with red dots), fall in the
spaces where very little or no words exist. This demonstrates that even in a
type of shorthand, like the one used in instant messaging (e.g., btw
,
lol
, ttyl
, etc.) that many of the newly created words are
spelled with letter combinations that simply don’t exist in the language.
This is partially a result of the volume of acronyms used in instant
messaging but it becomes obvious by reading
the image that many of
the words used fall on the top line of each row suggesting (such as with row
A, and row I) that many of these words
and acronyms begin with those
letters. In this way our technique produces visuals that allow us to ask
further questions about the organization of our data.
We have also begun to integrate interactive elements into our visualization,
some of which were planned prior to our qualitative study, and some of which
were inspired by our results. In particular, we have already created a
version of L-DNA which incorporates a brushing technique that presents the
words beneath the cursor
, both when dragging across words and
when dragging across the whitespace. We have also created a version of the
density map (Section 9.1) that allows zooming into the recursive letter
pairs. For example, it is possible to click or tap on the BA
square,
then the NA
square, then another NA
square to then see the
word BANANA
as shown in the static image of figure 3.
In this paper we have introduced L-DNA and presented the findings of a qualitative study of its design. In L-DNA, we have developed a mapping of symbol systems to visual space, which we have demonstrated using language. Our formulation has several properties that are valuable for the analysis of language and are not available in some other common visualizations of language. L-DNA has the following important features:
We have mathematized language to make exploring and experimenting with language easier, but the results of said experiments need to have the possibility of being reversed out of the space to be able to assign meaning once again to the language. We have also presented a qualitative study, which provided encouraging results that indicate the power of the type of representation provided by L-DNA, the benefit of the whitespace that it generates, and its possibilities to provide inspiration (even if reluctantly), as well as some useful criticisms that led to iterations in our design.