Chelsea R. Canon is a PhD candidate studying the communication challenges posed by changing climates and climate hazards. She is interested in how academic research about climate communication informs the practice of climate communication, and vice versa. Her dissertation applies network analysis to model and visualize these knowledge flows.
Douglas P. Boyle is Professor and Chair in the Department of Geography at the University of Nevada, Reno. He is the former Nevada State Climatologist and the former Director of the Nevada Water Resources Institute. His research uses integrated computer-based modeling of hydrologic processes to understand the impacts of historic and future climate and water resources in arid and semi-arid environments using paleoclimate information, global climate model output, and instrumental ground-based information.
K.J. Hepworth is an information design practitioner-researcher, employed as Senior Lecturer in Communication Design at the University of South Australia in Adelaide, Australia. Her current research is on ethical visualization of lived experience perspectives.
This is the source
Knowledge mapping combines network analysis and data visualization to summarize research domains and illustrate their structure. In this paper, we present a framework for ethical and effective visualization of knowledge networks, which we developed while building a knowledge map of climate communication research. Using the climate communication knowledge map as an example, we highlight the practical and ethical challenges encountered in creating such visualizations and show how they can be navigated in ways that produce more trustworthy and more useful products. Our recommendations balance tensions between qualitative and quantitative and objective and subjective aspects of knowledge mapping. They demonstrate the importance of critical practices in the development of knowledge maps, illustrate the intertwined nature of analysis and results in such projects, and emphasize the constructedness of the resulting visualization. We argue that the only way to produce an effective knowledge map is to produce an ethical one, which requires attention to the ways trust and accountability can be produced at every step of analysis and production. This extends the literature on ethical visualization in digital humanities projects by offering a clear example of the utility of a critical approach for a traditional, science-oriented knowledge mapping project.
A framework for critical practices of ethical and effective visualization networks.
Scholars have long been captivated by the idea that knowledge generation is outpacing
their ability to keep up, necessitating creative techniques and solutions to organize,
manage, and access information. From scientist Vannevar Bush’s memex
Knowledge mapping is one such survey and synthesis technique. It combines data
visualization and network analysis to depict large collections of information spatially,
as if they were a landscape viewed from above. Knowledge mapping is a general term for an
analysis informed by and applied in multiple areas of scholarship, including information
science
Ethical visualization is the practice of acknowledging and mitigating the potential for
harm that is inherent to particular visualizations
In this paper, we demonstrate how principles of ethical visualization and data feminism
can be applied in the development of a knowledge map to produce a more trustworthy and
more useful product. We do this by providing a detailed account of the path we followed to
build a knowledge map of climate communication research, highlighting practical challenges
encountered in creating such a visualization and how ideas from ethical visualization
literature helped navigate them. Specifically, we: (1) demonstrate the importance of
critical practices in the development of a complex type of chart, (2) illustrate the
intertwined nature of analysis and results in big data projects, and (3) take a first step
toward a feminist bibliometrics informed by critical work in digital humanities. This
addresses a previously identified need for clear guidance on the visual communication of
knowledge networks
Ultimately, we emphasize the constructedness of these types of visualizations, showing why a purely quantitative, purely objective depiction of a knowledge network is neither possible nor particularly useful. We argue that the best way to produce an effective knowledge map is to produce an ethical one, which requires attention to the ways trust and accountability can be produced at every step of analysis and production.
Knowledge maps are powerful tools for planning, collaborating, teaching, and
communicating, because they depict metaknowledge about how topics and problems are
structured god trick,
but it can in fact be understood as a concept
gesturing at the positionality and situatedness of all knowledge, and the necessity of
combining multiple partial perspectives to understand a system critical scrutiny of what is known,
how, and by whom
and suggest that we might best understand what metaknowledge is
by thinking about what we, as individual scholars with particular trainings, career paths,
and expertise, think about as we read a journal article in order to index a broader context of ideas, [scholars], and disciplines
– the kind of
knowledge gained from being a participant in a system through time
While it may be preferable to gain metaknowledge the traditional way, by participating in
a knowledge system over a lifetime, this is not always practical or possible. Knowledge
maps act as a tool which – just like regular maps of landscapes – are useful for sharing
information, for planning where to go and how to get there, for noticing patterns or
pathways that might not have been apparent to boots on the ground, and for coordinating
collective effort in the face of complexity. They support switching between general
knowledge and specific pieces of information and evidence, an ability which has been
identified as a fundamental practice of critical scholarship charting a path between the simple and false and the complex and unusable
When constructed in partnership with residents from the mapped research domain, knowledge
maps can prompt identification and reassessment of taken-for granted assumptions and
highlight forces shaping problem definition in a field. In fact, examples of this work in
climate domains are what initially inspired our attempt to map the climate communication
research landscape
These projects are notable because they leverage knowledge maps as communication and collaboration tools while being sensitive to audience needs and effects on those depicted in the visually powerful network charts. Rather than viewing analysis as an endpoint, these studies distill the information spaces into clear visualizations for interested audiences. Knowledge maps should be built from this perspective, to be both useful and usable and to assist viewers in extracting meaning from information spaces and analyses that may otherwise be esoterically complex.
The climate communication knowledge mapping project was initially situated in the
tradition of science mapping, a kind of knowledge mapping undertaken by computer
scientists and network analysts. Science mapping uses bibliometric data to reveal the
structure of a knowledge domain by creating networks of papers or authors linked by
citations or collaborations. However, despite having domain expertise in climate
communication and fluency in network analysis techniques (often considered the ideal for
this type of project), we immediately ran into ethical and practical challenges that were
not well addressed in the science mapping literature, perhaps because too much emphasis is
placed on the objectivity of network maps, the assumed impartiality of algorithmic
curation, and the potential of data speaking for itself
Perhaps because of the impact of scientific approaches to visualization on knowledge
mapping, a significant amount of data in knowledge mapping projects is not pre-processed
before it is visualized. It is thought that network analysis itself controls for any
errors or ambiguities in the underlying data rich-get-richer
phenomenon). In the example of a citation network, certain
papers accumulate high numbers of citations while others are never cited.
In this section, we undertake the important data feminist approach of showing our work
Data for the knowledge map came from Clarivate’s Web of Science database, selected for
its accessibility despite known gaps in its coverage and because previous work suggests
that it entails about 50% as much cleaning and curation time as Scopus and about 5% as
much as Google Scholar (TS = (climat* NEAR chang* AND communicat*))
from a previous systematic review of climate communication,
conducted by an expert in the field
One of the reasons data is not generally cleaned before analysis is the expectation
that selecting and extracting the climate,
communication,
and change
in nearly every academic discipline meant that
the giant component included many irrelevant items when constructed with the complete
Web of Science data pull (Figure 1).
We explored programmatic filtering strategies to prune the irrelevant data, but these all failed, to the point that it seemed much more time would be spent fine-tuning an algorithm specifically suited to disambiguating this one dataset than it would take to disambiguate it manually. Manual (human) review resulted in the removal of ~50% of the returned records (2,997 retained; 2,937 removed). Though in some ways it seemed alarming to remove half of the returned data, it also seemed misleading not to do so, because clearly irrelevant items (for example, papers about inter-model communication of climate signals, effects of climate change on insect pheromonal communication, and changes in workplace communication climates) distorted the visualized network (Figure 1). This highlights the impossibility of achieving an objective delineation of a particular research area from an unprocessed dataset and the need to rely on human judgment in data visualizations even at these early stages. This process of data preparation opens a knowledge mapping analysis up to perpetuating harm in two ways: first, that whether an algorithm or a human applies inclusion/exclusion criteria, some error is likely to occur, and second, that the underlying data does not adequately or equally represent the breadth of scholarship in a particular knowledge domain. To mitigate this, we added annotations to our final product (Figure 6) intended to guard against interpretation of the knowledge map as a complete picture of every possible research output in the climate communication space.
Because we wanted to produce a knowledge map of collaboration patterns within climate
communication scholarship, we projected the bibliometric data from Web of Science into a
co-authorship network, where nodes are individual scholars and links are formed between
them when they write papers together. To present a clear co-authorship network, it is
necessary to disambiguate author names
Author network studies wishing to disambiguate name data face a choice: implement
algorithmic disambiguation strategies of varying complexity or undertake the
time-consuming process of manually disambiguating the data. Simple algorithmic
disambiguation is the norm, partly due to research suggesting that more computationally
or time intensive approaches don’t improve results
The simplest algorithmic approaches are first initial
disambiguation and all
initial
disambiguation. A pattern of initials is selected and all names are
converted to match this pattern (for example, a full author name being converted to
first initial and full last name). First initial disambiguation is the most common but
also the most fraught: split individuals change network statistics less overall than do
merged authors, and first initial disambiguation’s weakness is that it merges many
authors
Very different pictures emerged when each of the four different disambiguation
approaches was applied (Figure 2). Though the overall network is
smaller, many more nodes are included in the First Initial
giant component than
in Full Name,
with All Initials
falling somewhere in the middle. Table 1 provides quantitative support for these visual impressions. The
number of authors contained in the giant component and the number of links connecting
them changes significantly, by hundreds of individuals in the overall network and by
over a thousand individuals in the associated giant components. Most notable in Figure 2 is the fact that a salient feature of the manually disambiguated
network, the bridge that spans from the left to the right of the giant component without
traveling through the core, is not as clear in any of the networks generated from
algorithmic disambiguation. But each of the networks appears individually reasonable,
both from a network standpoint and a climate communication one. This makes it clear how
researchers could (wittingly or unintentionally) select a disambiguation strategy that
conveys a particular message. In this case, the First Initial
network suggests
thriving exchange and dense connection, All Initials
offers better-defined
clusters, and Full Name
presents neat divisions between communities with just a
few bridges between. It’s therefore important to be attentive to how interpretations may
be influenced by visualization choices.
Studies do not agree about which algorithmic disambiguation method is most effective at
approximating the true network, though it’s been shown that between 8-39% of individuals
in a dataset can be affected by merging or splitting First Initial
disambiguation as the lower limit of the range of error on the
actual number of authors in the database, and , All Initials
as the upper limit,
but they do not usually visualize the resulting differences to see how more qualitative
structural conclusions may be affected. Clearly, the success of a disambiguation
strategy depends on meta-characteristics of the dataset itself. The most common example
of this is the near-impossibility of using initials-based disambiguation on datasets
with high participation from individuals with Chinese or Korean last names
Even if it were possible to estimate the average distortion effects from different approaches to algorithmic disambiguation, it would be hard to predict which authors would be the most affected, meaning that analysis choices may have differential effects on different communities of scholars. Even if only a small percentage of error is introduced overall, that error may be distributed unevenly across the network, raising real concerns about whose and what types of contributions may be emphasized or obscured. Figure 3 shows how different authors’ network positions are affected by different disambiguation strategies: node size changes (e.g., the node for S. Dessai), as does community membership (e.g., the node for S. Lewandowsky) and the visibility of underlying data errors (e.g., two nodes for E. Maibach linked together). The relative importance of nodes acting as brokers (large nodes here have high betweenness, one measure of a brokering role) changes too, and lumped or split authors open or close network pathways and affect community delineation. Table 2 shows how overall rankings of authors change order as nodes are merged or split.
If the goal of the network analysis is to be useful to the people represented, this sort of error can’t be waved away as acceptable distortion, and the only way to understand how such error affects an analysis is to visualize it and see. Though author disambiguation is rarely framed as an ethical issue, it clearly is one, as disambiguation strategies affect not only author rankings but the structural roles of authors in a network and the apparent structural function of the knowledge domain as a whole. In this case, we mitigated potential harm caused by inaccurate evaluations by building our final visualization on the manually disambiguated author dataset (though it’s important to note, there are almost certainly errors even in this carefully curated dataset). However, most knowledge mapping projects will not have the benefit of a manually disambiguated dataset to compare to. In those cases, the best thing to do is understand the potential consequences of choosing different disambiguation strategies and avoid over-interpreting the results.
Several strategies have been developed for transforming networks into 2D visualizations
Both researchers and viewers can be misled by network layouts that suggest patterns
where there are none
For the climate communication knowledge map, we selected a layout produced by the Force Atlas algorithm and performed a small amount of editing to space out overlapping nodes and make the community size more interpretable to the human eye. In this case, our decision was less about mitigating potential harm and more about increasing understanding for viewers of the knowledge map while portraying the communities as legible groups of individuals. However, Figure 5 makes it clear that some layouts will emphasize connectivity and cohesiveness, and others will emphasize distance, despite that they are views of the same set of nodes and relationships. This illustrates that there is no option to choose an objective view of a network dataset, so the onus is on the researcher to make layout choices intentionally, as these arguments can have consequences for individuals and communities depicted. It also shows that just like datasets, tools for visualizing data can make implicit arguments which must be considered carefully.
After the network layout is complete, it is customary to divide the nodes into
communities using algorithms such as Louvain community detection, a modularity
maximization algorithm that identifies communities by finding groups of nodes sharing
frequent connections within their group, but as few as possible connections to other
communities
This illustrates the conundrum of categorization that is well-recognized in ethical visualization and data feminism: by placing nodes into groups, the knowledge map makes a strong argument that there is some inherent quality of those nodes that justifies their grouping, despite that the underlying data is not always so decisive. This is especially so when the network has been filtered to represent connections above a certain strength, such as authors that have collaborated at least twice, or papers that have been cited together at least ten times. This common practice eases the computational task of creating and analyzing the networks, but it also makes it easier to identify communities that seem much less interconnected than they actually are (or, conversely, give too much weight to connections which are strongly present based on the visualized logic, but which may be essentially meaningless in the real world). Figure 5 demonstrates how clear communities become at different levels of filtering, offering another example of how seemingly technical choices made in knowledge network design make implicit arguments. Another way of saying this is that determinations of exactly what is signal and what is noise in a network dataset are subjective considerations.
In the climate communication knowledge map, we attempted to moderate the visual force
of the communities by providing extensive annotations on their composition, and by
resisting giving the three core
communities particular names that were simply not
justified by their heterogeneous compositions.
Each of the previous sections has emphasized the constructedness of knowledge maps, demonstrating that every step in the production of a knowledge map entails subjective and value-laden judgments which foreground or obscure elements of the depicted knowledge domain and therefore run the risk of perpetuating different types of harm. Issues of data selection, disambiguation, layout, and community detection processes combine to make it difficult to validate or ground truth a network map. But it’s worth asking, given the constructedness and subjectivity of a knowledge map, whether a ground truth would even be meaningful.
The traditional sense of a ground truth – going to a particular spot on a landscape and confirming that reality matches modeled values – is probably going to fail for knowledge maps. Though they should absolutely be intelligible to members of the depicted community, it is unlikely that every scholar would agree with the version of reality presented in the chart. Knowledge maps approximate a consensus about knowledge structure based on authorship and citation data, but this is knowledge on average. Individual experts may or may not agree, and the knowledge map may not be a good representation of an individual’s subjective experience of the knowledge domain, or even of the specific connections depicted based on their existing co-author relationships, since a network represents a diversity of connections as a single type of link. In the absence of a ground truth, perhaps a more important question to ask is how a network visualization can become trustworthy.
While building the climate communication knowledge map, we learned that the only way to produce trustworthy maps is to have attention to ethical dimensions and potentials for harm at every step of the design process. This can be accomplished with vigilance for implicit arguments made by customary analysis choices and tools, with the inclusion of annotations and caveats in the visualization, and by remaining constantly aware that any purely data-driven view of a knowledge domain will be partial. The key utility of ethical visualization perspectives in the development of the climate communication knowledge map was to push back against our worries that we were somehow biasing our results in an inappropriate way by making these human-driven choices surrounding data curation and presentation. In the next section, we offer suggestions for how to apply these perspectives in future knowledge mapping projects.
Ethical and effective visualizations of knowledge networks must inspire trust in two key
groups: people depicted in the knowledge map, and people using the map to acquire
knowledge or make decisions. We draw on our experience in the climate communication
knowledge mapping project and on previous work on ethical visualization in the digital
humanities to offer ethical visualization strategies for network data and knowledge maps
specifically. This represents an important step in downscaling
recommendations
from ethical visualization and data feminism to demonstrate their application and utility
in specific projects. We encourage readers to consult other literature on ethical
visualization (especially
Our recommendations are intended to assist researchers in balancing tensions between
quantitative and qualitative, and objective and subjective, aspects of knowledge mapping.
We advocate a humanistic approach to visualization design, recognizing that working with
this type of data successfully requires coupling quantitative fluency with the more
humanities-oriented practice of contextualizing findings, and tolerance for and
transparency around the trial-and-error approach necessary to creating a useful product
Knowledge maps have a dual purpose: they are a tool used in studying knowledge domains,
and a visual artifact representing those domains. The first focuses primarily on
creating knowledge and insight for the immediate researchers involved and can be best
understood as a process. The second focuses primarily on assisting others in gaining
insight into the mapped domain, and can be best understood as a product. However, as the
climate communication knowledge map makes clear, these two types of knowledge mapping
are generally impossible to separate: knowledge mapping proceeds in an iterative and
exploratory fashion, where impressions gained in experimenting with different analyses
and layouts inform the final version of the map, and where the layout of the map informs
a researcher’s conclusions. Franco Moretti described this as a
heterogeneity of problem and solution
in distant reading; we argue that the
same applies to knowledge mapping
The intertwining of product and process does not guarantee that a good product will follow from a sound process (or that an apparently good product was generated with a sound process). For example, we often observe knowledge maps developed primarily to facilitate analysis being included in research reports. In these cases, researchers have generally extracted meaning from the visualization to support their interpretations. However, without adjustment, these visualizations often remain incomprehensible to those outside the research team. We suggest that ethical and effective knowledge mappers should remain aware of how process and product intertwine in a knowledge mapping project, and how this can both enhance and trouble the process of producing a knowledge map for use beyond the research team.
Anecdotally, we learned while developing and sharing the climate communication knowledge maps that despite their visual and rhetorical force, viewers are often unsure how to understand these charts. We also learned that, once viewers felt they understood what the nodes and connections and color scheme meant, they quickly formed and held onto inaccurate impressions about what the chart conveyed. While it is not possible to moderate the rhetorical force of a visualization entirely, it is possible to embed guidelines and caveats for interpretation that guide viewers towards the interpretations intended by the visualization designer. This does not preclude the viewer reaching their own conclusions; rather, it fulfills the promise of a knowledge map as a communication tool. Ethical and effective knowledge mappers should take the opportunity to translate and annotate a knowledge map, asking themselves which insights have the most rhetorical force, and which may need to be moderated with caveats. This is why the knowledge maps presented in this paper include textual information about their genesis and interpretation.
A common practice in data feminism is to disclose the positionality of the researchers, so that viewers of the resulting data visualizations might understand how researchers’ perspectives and life experiences may influence their understandings of the phenomena portrayed. We encourage knowledge mappers to consider not just their own positionality, but the positionality of network theory, tools, and data sources when drawing conclusions from a knowledge mapping project.
First, it is necessary to consider the implicit arguments made by networks themselves.
A network chart necessarily conveys a message of connection, especially since
unconnected elements are generally omitted from analysis. Network thinking is
fundamentally structuralist, meaning that it privileges relationships over attributes
and affords the system a level of agency that it may or may not have
Subsequently, it is important to disclose the tools used for analysis and, especially,
for visualization. Because the visualization tools actively shape the conclusions of the
analysis (as discussed in 4.1), it is important to be transparent about how this may
have occurred even if the effects are not apparent to the research team
Knowledge networks are often assumed to be exempt from privacy concerns because they
are constructed from published works. It’s important to remember that a network
(especially the co-author networks presented in this paper) is made up of individual
people. Certain nodes may assume standout roles that wouldn’t be apparent in a list of
search results or even in tabular data, and certain patterns of connection may occur
that do not represent current relationships and collaborations. Because there is no
objective way to characterize node roles or performance, or even which nodes truly
belong
in a network, caution is warranted in calling out
specific people
depicted, or associating merit with network position alone
On the other hand, it’s precisely the transparency and associated lack of anonymity
that can make knowledge network tools useful. Laura Kati Corlew and her team relied on
this in building a network mapping and knowledge discovery tool for climate workers in
the Pacific Islands
As knowledge maps proliferate, they may be used increasingly for evaluation and analysis, and so an ethical and effective knowledge mapper should consider the potential ramifications of such a map having a life of its own after publication, and the possible agendas users might bring to interpretation of these visualizations, especially when choosing to use personally identifying data in a knowledge map.
The final strategy for ethical and effective visualization of knowledge networks is
inspired by data feminism’s maxim of showing one’s work
The first way to put your cards on the table is to make analysis decisions transparent, as we have done in this article. This may be done in scholarly publications, or in metadata and annotations accompanying published visualizations. Like disclosing positionalities, articulating the strategies used to create visualization products reveals implicit impressions the designer may not have been aware of, but which could be important to viewers. Sharing this information safeguards transparency by aiding other researchers in creating similar maps, should they wish to.
The second way to put your cards on the table is to tell the story you as the researcher see in the visualization. What insight were you seeking that led you to build the knowledge map? Did you find evidence for it, or against it? Narratives explicate network structure and concepts in an intuitive way, guiding viewers to reasonable scales of interpretation and guarding against out-of-context presentations of the knowledge network. They can also shed light on what the system diagram on its own might not reveal clearly, such as a possible function of or reason for absent links between authors or communities.
There is a fine line to walk here, given that the point of visual communication is simplicity. Over-annotating a visualization might make it harder to engage with. We attempted to address this in our final knowledge map (Figure 6) by providing a below-the-fold presentation of narrative context for the network. Other network projects, especially interactive ones, may discover additional ways to support the juxtaposition of narrative and network.
We began this paper with the observation that scholars have long been captivated by the quest to develop high-tech tools for information management and knowledge synthesis. From the vantage point of the 21st century, we recognize that the success of these tools is not simply a matter of technological progress, but one that requires ethical thought. Just as a network mapping platform like Gephi provides the technical tools for working with big network data, an ethical visualization framework provides the ethical tools for reducing the harm and increasing the impact of knowledge maps – essentially, for making them deliver on their promise of augmenting our perceptual abilities when faced with large and complex information spaces. In this paper, we have drawn on lessons learned in the climate communication knowledge mapping project to offer strategies for ethical and effective visualizations of knowledge networks. This approach acknowledges the potential for harm that comes with such visualizations and works actively to mitigate it. Taking an ethical approach to knowledge mapping is vital because the choices made in such an analysis determine whose and which contributions are ultimately legible.
We have attempted to provide guidelines for knowledge mappers seeking to support users in synthesizing and selecting meaningful insights from the mapped domain. Our recommendations balance tensions between qualitative and quantitative and objective and subjective aspects of knowledge mapping. They demonstrate the importance of critical practices in the development of knowledge maps, illustrate the intertwined nature of analysis and results in such projects, and emphasize the constructedness of the visualization. We show the only way to produce an effective knowledge map is to produce an ethical one, which requires attention to the ways trust and accountability can be produced at every step of analysis and production.
This framework for ethical visualization of knowledge networks contributes to early conversations about a feminist bibliometrics and demonstrates the direct relevance of digital humanities tools to work carried out across diverse other fields. Connecting bibliometrics literature with digital humanities discourse invites more qualitative and interpretive perspectives into bibliometric knowledge mapping, demonstrating that the data-centric and science-focused techniques frequently applied to knowledge network visualization may fail to produce a useful and impactful portrait of a knowledge domain without attending to these ethical concerns. A central purpose of this paper is to offer a clear example of the utility of a critical approach to knowledge mapping in a traditional knowledge mapping project.
The idea of separating wheat from chaff is a common trope in knowledge mapping. While knowledge mapping can absolutely facilitate discovery of relevant or applicable knowledge, assuming this is its key strength overlooks its real potential. Essentially, a system-level depiction may provoke a reassessment of what is wheat and what is chaff, and recognition that these designations are situation-dependent. As knowledge production increases and we continue to rely on quantitative and visual tools to help us navigate information spaces, it may be best to abandon the threshing trope and instead adopt a more kaleidoscopic model of knowledge mapping’s goals. Knowledge maps provide meaningful and multipurpose ways for individuals to explore a knowledge domain, and different individuals may need to turn the kaleidoscope in unique ways to find useful or compelling information and explanations which gel with, and usefully complement, their own experiences of a knowledge network.