Volume 17 Number 2

# How to Do Things with Deep Learning Code

## Abstract

The premise of this article is that a basic understanding of the composition and functioning of large language models is critically urgent. To that end, we extract a representational map of OpenAI’s GPT-2 with what we articulate as two classes of deep learning code, that which pertains to the model and that which underwrites applications built around the model. We then verify this map through case studies of two popular GPT-2 applications: the text adventure game, AI Dungeon, and the language art project, This Word Does Not Exist. Such an exercise allows us to test the potential of Critical Code Studies when the object of study is deep learning code and to demonstrate the validity of code as an analytical focus for researchers in the subfields of Critical Artificial Intelligence and Critical Machine Learning Studies. More broadly, however, our work draws attention to the means by which ordinary users might interact with, and even direct, the behavior of deep learning systems, and by extension works toward demystifying some of the auratic mystery of “AI.” What is at stake is the possibility of achieving an informed sociotechnical consensus about the responsible applications of large language models, as well as a more expansive sense of their creative capabilities — indeed, understanding how and where engagement occurs allows all of us to become more active participants in the development of machine learning systems.

# 1. Overview

*programmed*code: words, characters, and symbols arranged according to rules and in a form that might be understood as textual. CCS in other words has traditionally grappled with artifacts that, while certainly mobile and changeable, have enough of a static quality to allow both for individual study and shared understanding. What though can this method do with machine learning systems that include code in this ordinary sense as well as statistical parameters and operations, that are in other words less lexical than they are mathematical? While we had previously expressed skepticism about the efficacy and utility of taking a CCS approach to a language model such as GPT-2 [Hua and Raley 2020], we were challenged for this special issue to do precisely that: to consider the extent to which CCS might illuminate a system, either in terms of intent or functioning, that is comprised not just of code but also training data, model architecture, and mathematical transformations — to consider then if its methodology might be adapted for the study of an interactive system that is not strictly algorithmic.

*model*as the unit of analysis. Even if not self-consciously presented as a field articulation, academic studies of machine learning have collectively shifted the emphasis away from code and toward the model, with a particular emphasis on vectorization, probabilitization, and generalization. Adrian Mackenzie, for example, asserts that “code alone cannot fully diagram how machine learners make programs or how they combine knowledge with data” [Mackenzie 2017, p. 22]. And in what may well become a foundational document for CAIS as such, in a so-termed “incursion” into the field, researcher attention is redirected from “an analytical world of the

*algorithm*to the world of the

*model*, a relatively inert, sequential, and/or recurrent structure of matrices and vectors” [Roberge and Castelle 2021, p. 5]. What this means more plainly is that CAIS concerns itself not with the code that implements a particular machine learning model, but rather its mathematical definition, not with “symbolic logical diagrams” but rather “statistical algorithmic diagrams” [Mackenzie 2017, p. 23]. Deep learning code is thus positioned as merely one component of machine learning systems and not by itself granted priority. In contrast, we proceed from the observation that deep learning code in fact represents the myriad possible

*implementations*of a machine learning model and its auxiliary programs. On this basis, we contend that the design choice and biases inherent in each implementation extend their significance beyond mathematical formulae and warrant a closer look.

*model*has tended to serve as the unit of analysis, the broader concern of CAIS has been to situate these models in their social, historical, and cultural contexts [Mackenzie 2015] [Burrell 2016] [Mackenzie 2017] [Underwood 2020] [Roberge and Castelle 2021] [Offert 2021]. Linking model architecture to context — whether that be use case, domain of implementation, or institutional setting — has thus allowed CAIS to pose crucial questions about the ethics and politics of machine learning systems. So too CCS has endeavored to engage both the socio-historical dimensions and effects of programming code, also with structural transformation as a hoped-for endgame [Marino 2020]. What remains to be done, and what is in part the purpose of our essay, is to test both the feasibility as well as the critical potential of CCS as a method when the object of study is deep learning code.

# 2. Mapping Deep Learning Code

*core deep learning code*(CDLC), kernel code that defines the deep learning model, and

*ancillary deep learning code*(ADLC), ancillary or application code that ordinary developers can replace.

*Core*deep learning code, of which model.py is the obvious primary instance, has an ordinary and domain-specific meaning: it produces machine learning predictions by implementing deep learning operations.[5] Hence,

*core*encompasses core data structures and functions that directly execute core deep learning tasks such as classification or regression. In contrast, the remaining files in the source code folder operationalize the model’s outputs rather than directly contribute to its predictive functioning. They are in this respect

*ancillary*, querying or indirectly supporting the model from an external location and, in the case of GPT-2, acting as interfaces that mediate between user requests and deep learning predictions (e.g. specifying and wrangling behaviors, or filtering outputs, as with the aggregating of individual predictions into the composite form of a news article, poem, recipe, or adventure game).

*abstraction*to our analysis. Because fundamental deep learning operations such as vectorization, learning, and gradient descent exist in the mathematical abstract without programming implementation, it is possible to talk about them without citing the underlying code.[8] One could make the argument that all computer code subtends abstract math, but the challenge of understanding CDLC has meant that abstraction has almost necessarily been at the core of the study of deep learning operations. For example, research on machine learning models has thus far focused on the diagram [Mackenzie 2017], vectorization [Mackenzie 2015] [Parrish 2018], learning [Roberge and Castelle 2021, pp. 79–115], and pattern recognition [Mackenzie 2015] [Roberge and Castelle 2021, pp. 31–78]. In contrast, the abstract counterpart of ADLC is not prominently featured in analyses of deep learning. Put another way, the possibility of analyzing model.py and its operations analogically and metaphorically has already been proven by extensive qualitative engagement with deep learning. While we by no means wish to dismiss or even contest the project of CAIS — abstracting deep learning operations is after all foundational to the work of interpretation — we can nonetheless ask what is missed if researchers only focus on training data, model architecture, and the abstraction of deep learning code.[9]

*which*and the

*when*— all-important delimiters for a language model that came to fame in part because of its staged release [Solaiman et al 2019]. We have hinted at the reconstruction required to look back and analyze the experience of what it was like to interact with GPT-2 in 2019 and can now be more explicit about the archaeological work one has to do in order to analyze the model in the present. What we have studied thus far is an almost-exact snapshot of the ADLC as it appeared in the first release in 2019, a picture that represents OpenAI’s initial vision and its open invitation to developers to explore the limits of GPT-2’s behavior. The intervening years have seen a number of changes, not only to GPT-2 but also to its dependencies. Tensorflow, for example, has evolved to such an extent that much of the code of GPT-2 will not work unless updated to align with Tensorflow’s new protocols. Even more important to the running of the original GPT-2 model are the weights, which must be downloaded via download_model.py in order for the model to execute. Thus, one could envision a future CCS scholar, or even ordinary user, attempting to read GPT-2 without the weights and finding the analytical exercise to be radically limited, akin to excavating a machine without an energy source from the graveyard of “dead media.”

# 3. Applications of GPT-2

# 4. Conclusion

## Notes

*x*will be most similar to the class of other instances that are nearby in Euclidean distance [Mitchell 1997]. Thus, the inductive bias of a learning algorithm loosely corresponds to its interpretation of the training data to make decisions about new data. Using this formulation, one might begin to see how an interpretation of deep learning decision-making might be possible. In contrast, Tom Mitchell observes that

Inductive bias, then, is not the answer to the methodological challenge to which we are responding. The problems of interpretability and explainability are also crucial for deep learning, but not within the immediate purview of this paper. For an overview see [Gilpin et al. 2018] [Lipton 2018].it is difficult to characterize precisely the inductive bias of BACKPROPAGATION learning, because it depends on the interplay between the gradient descent search and the way in which the weight space spans the space of representable functions. However, one can roughly characterize it as

smooth interpolation between data points[Mitchell 1997]

“creative words”: 1 – (num_blacklisted / max(num_succeeded_match, 1))

“nonconforming”: num_failed_match / num_generated