6. Parrot Power

Apr 18

“Language models generate the future given the past.”

— Quote Source

The preceding chapter showed how machinic parameters are mediated by human values and, in turn, how human values are mediated by machinic parameters, sketching the political economy and genealogy of machinic subjectivity. This chapter examines the characteristic power, or competence, of large language models: their generative capacity. It theorizes various kinds of generativity and shows their interrelation. It argues that there is no basis to the claim that language models are simply stochastic parrots. And it critically explores the relation between the linguistic competence (and/or labor power) of language models and the profit motive of corporate agents.

Modes of Generativity

Chapters 3 and 4 foregrounded two important capacities of large language models: their ability to generate appropriate and effective stretches of formally cohesive and functionally coherent text and, building on this, their ability to respond to prompts, and thereby interpret signs, in ways that are aligned with and often satisfy the intentions of users. Various modes of generativity resonate with these fundamental capacities.

By energetic generativity I do not mean the creation of energy per se, but rather the conversion of one form of energy into another, where the second form of energy is more useful than the first to some agent. Examples are generating electricity from fossil fuels or sunlight, or generating ATP from whatever may be eaten.

Vital generativity does not create life per se but rather produces the next generation of agents from the last. Exactly how it works and what is required for it to happen has long been a central theme of mythology no less than science.

These two senses of generativity, especially in the form of metabolism and reproduction, are often understood as necessary, but not sufficient, criteria for life. New developments in science fiction might not even be necessary to imagine what will happen when large language models, and artificial intelligence more generally, acquire such capabilities—for such capacities not just are well storied but also seem to be right on the horizon.

Somewhat more important for our immediate purposes is syntactic generativity in the tradition of Wilhelm von Humboldt and Noam Chomsky. This is sometimes understood as the ability of humans to produce and understand sentences that they have never heard before. But it may also be framed as infinite ends with finite means, or more precisely, as the ability to generate an infinite number of sentences using a finite number of words and rules (in particular, a lexicon and a grammar), where the rules enable recursive modes of compositionality. Indeed, within language proper, humans do not just have the ability to create an infinite number of acceptable sentences, they also have the ability to create an infinite number of smaller and larger constructions, from distinct words to unique stories.

This last kind of generativity can be extended in a variety of ways, giving a kind of systemic generativity: the capacity to generate a large number of configurations (of any kind) given a small number of constraints (whatever the domain). This kind of generativity ranges from whatever can be built using the elements, as constrained by the rules of chemistry, to whatever can be constructed using a box of blocks, as enabled and constrained by the imagination of children, as well as by forces like friction and gravity.

Large language models can engage in syntactic generativity, as discussed above, and also in what might be called pragmatic generativity, if not poetic generativity. They can appropriately and effectively use one and the same sentence (as a formal type) in an infinite number of distinct contexts, and thereby create any number of unique utterances (as relatively singular tokens with context-specific referents and functions). They can create an infinity of new types (such as novel genres, as pastiches of older genres) and also an infinity of tokens that conform to such types (yet another speech act or sonnet, essay or pun, limerick or language game). And, through such practices, they can participate in an infinity of open-ended interactions that mediate an unbounded range of emergent identities, social relations, possible worlds, and forms of life.

Regarding genre (and genre-tivity, so to speak), ChatGPT, like other large language models, learns and generates patterns at all levels of generality: morphemes, words, phrases, clauses, sentences, turns, and so forth. Indeed, the genus-species (or general-specific) relation is inherently recursive insofar as most any genus is itself a species in a higher genus, and vice versa. Nonlinguists tend to fixate on ChatGPT’s capacities with “genre,” as stereotypically understood (sonnets, sestinas, short stories, and so forth), because that is the main formal structure they are aware of. Indeed, genre and its differentiation are taught in kindergarten, enshrined in the layout of libraries, easily named, and often played with. Phrased another way, ChatGPT is incredibly good not just at identifying types, and patterns more generally, across all levels of linguistic, textual, and interactional structure; it is also incredibly good at producing novel tokens of such types, as well as creating novel types per se. Nonetheless, certain types (such as genre, as stereotypically understood) come to the fore in metalinguistic accounts of ChatGPT, especially by nonlinguists, because they are the easiest to notice, name, and tweak.

There is also dynamic generativity, in the tradition of scholars like Friedrich Nietzsche, James Gibson, and Giorgio Agamben: means without ends, or media without function. This mode of generativity might be best framed as follows: one and the same physical feature, material resource, or mode of mediation, although it might have been originally built or long used with a particular end in mind, can be endlessly repurposed as a means for other ends, and thereby be enlisted to undertake an infinity of novel actions. Phrased another way, what something may be used for, as regimented by norms, traditions, or ideals, is worlds apart from what something can be used for, as regimented by causes, strategies, or facts. Think, for example, about all the things you can use a screwdriver to do besides drive screws. The core predictive and generative capacity of language models can likewise be repurposed to serve an infinity of functions, regardless of the original intentions of the agents who made them. Indeed, as laid out by Nietzsche in The Genealogy of Morals, repurposing older forms for newer functions, or using older signs with novel senses, was the fundamental symptom and instrument of power.

Particularly important for present purposes is stochastic generativity. In its simplest form, this involves sampling from a probability distribution: from rolling a die to generating an actual word from a probability distribution over all possible words. As shown in chapter 4, however, language models do not simply sample from probability distributions, itself a relatively simple procedure. They also, and much more foundationally, generate the very distributions that they will sample from. And they do so recursively, as conditioned on prior words. Starting with a sequence of words, a probability distribution over possible next words is generated, then sampled from; the selected word is added to the sequence, and the procedure is begun anew. Recall that this is what puts the G in ChatGPT.

“Large language models now come in named lineages: GPT-1, GPT-2, GPT-3, GPT-4, and so forth. And such agents have names and kinship relations, and also lore, fans, niches, trials, deeds, achievements, rankings, values (or at least parameters), and patrons. Perhaps they even have their own prayers and rituals.”

— Quote Source

Closely related to stochastic generativity is generative AI, a type of artificial intelligence designed to produce novel content—and not just texts but also images, sounds, videos, video games, and artificial worlds. It can create not only aesthetically interesting (and often creepy) stories, scenes, sounds, and worlds but also deeply convincing fakes. And so it has turned out to be a boon for artists and con artists alike.

Finally, there is artificial general intelligence. Sometimes shortened to AGI, and contrasted with plain old AI (whatever that was), this refers to a hypothetical agent that can perform any (intellectual) task that a human can perform. Such an agent should not just be good at some specific task, however difficult, but be able to solve any number of problems, including ones it has never been given before; it should be able to adapt to its surroundings, however much they may change; and it should be able to reason, learn, plan, and communicate. In other words, its abilities are very general, and it should be able to generalize. Key here is the portability of the agent that possesses AGI: the tasks it can undertake should transcend not just topic, modality, domain, and function but also place and time, and even possible world and imaginable future. In effect, a machinic agent capable of AGI becomes as good as, if not better than, a human agent at open-ended reasoning and world-changing actions in an enormous range of possible futures.

Generativity has the root gen (to give birth, to beget) at its core. In ancient Rome, a gens was a collection of individuals who shared the same name and claimed descent from a common ancestor—an institution that was of central importance to the discipline of anthropology, at least in its formation. It is true that large language models now come in named lineages: GPT-1, GPT-2, GPT-3, GPT-4, and so forth. And such agents have names and kinship relations, and also lore, fans, niches, trials, deeds, achievements, rankings, values (or at least parameters), and patrons. Perhaps they even have their own prayers and rituals—they certainly have their own uniquely performative prompts. I have been stressing, rather, the creative (generative) nature of language models, as well as their general (genus, genre) nature, as well as their genealogical nature: in particular, critical histories (as a novel genre), generated by scholars like Nietzsche and Foucault, regarding the origins, or rather descent, of novel agents. But one could also focus on their gendered nature. Indeed, the underlying metaphor (giving birth) is feminine, but machinic agents are often accorded an allegedly masculine form of power: a seemingly invisible generative capacity that can only be glimpsed through its concrete practices, as the exercise of that power. And so, in the tradition of Hannah Arendt, it is not the capacity to be in labor and beget children but the capacity to work, and thereby beget things— however textual such “things” happen to be, and however obviating of the person-thing distinction such agents turn out to be.

The syntactic and pragmatic generativity of language models is, to be sure, grounded in their stochastic generativity. And this is itself grounded in the syntactic and pragmatic generativity of the humans who produced the texts those models were trained on—not to mention their systemic generativity (recall the example of children playing with blocks). And those humans were themselves created through energetic and vital generativity (for energy is no less important to life than information), grounded in systemic generativity (recall the example of chemical elements). If distributed modes of agency are taken into account, then there already exist agents who directly incorporate all such modes of generativity, for example, any human agent—or collectivity of such agents—that extends its powers by incorporating machinic agencies. Finally, large language models, like any form of media or mode of mediation, will exhibit dynamic generativity. In particular, the capacity of such models to engage in syntactic and pragmatic generativity, itself grounded in stochastic generativity and potentially grounding of AGI, will be used for an infinity of yet unimaginable purposes, harmful and beneficial alike.

But probably mainly harmful, given the ultimate interests of the agents large enough to train and deploy them.

Are Language Models Stochastic Parrots?

As just discussed, along with their capacity to engage in stochastic generativity, large language models are mediated by many other modes of generativity. More crucial for our purposes is the question of whether such generative agents are simply “stochastic parrots,” as is often claimed by their critics, or embody a deeper kind of agency.

First off, parrots are amazing creatures. There are many good reasons to pooh-pooh the overhyped capacities of large language models, but there is no good reason to take down parrots along the way, as the collateral damage of a catchy rhetorical gimmick.

As was shown, stochastic generativity involves not just sampling from a probability distribution (which is as simple as throwing a many-sided, biased die) but also recursively creating the probability distribution to be sampled from (and thus shaping and biasing the die so thrown). To do this well, in the case of language models, requires billions of parameters, layers of transformers, tons of labor, oodles of training, decades of research, and mounds of text. It is no lame gimmick or cheap trick.

Although human linguistic capacities are pretty amazing, humans all too often engage in repetitive and imitative behavior, copying the utterances and intentions of their forebearers and friends. And much of what we say reflects, if it does not outright steal, what was said before. The redundancy of human discourse and the unconscious theft of prior discourse are surprisingly high.

Moreover, the stochastic parrot critique presumes that humans mainly engage in informative, efficient, and truth-conditioned discourse. As the linguist Roman Jakobson decisively argued, much of what humans do with language serves phatic and poetic functions rather than referential ones. In particular, following the anthropologist Bronislaw Malinowski, whom Jakobson was parroting, much of what we say is affiliative rather than informative: a way to manage and mediate our social relations, rather than a means to communicate our thoughts. And following the information theorist Claude Shannon, whom Jakobson was echoing, much of what we say is redundant: human discourse is organized by the repetitions of tokens of common types, and so is metered like poetry (especially when seen at a high level of abstraction). One suspects that the semiotic processes of parrots, including their incredible capacities for trans-species mimesis, are similarly aesthetic and affiliational.

Finally, the incredible power of stochastic processes per se should not be underestimated. Much of what drives evolution, and hence the creation and transformation of life-forms, turns on random processes. And much of what drives biochemical processes, and hence the vital impulses within such life-forms, also turns on random processes. Such processes turn on constraint in conjunction with chance, or sieving coupled with serendipity.

The stochastic generativity of language models is no different: they do not just engage in random processes, they create constraints in the form of parameter values and then act under those constraints. (As was discussed in earlier chapters, backpropagation, and regimentation more generally, are precisely constraint-setting processes, guided by previously set constraints.) And stochastic behavior in light of constraints, and for the sake of constraints, which is what large language models are arguably capable of, may be the most decisive agentive capacity in life’s history.

Generativity, Power, Profit

A fundamental capacity of language models is to predict next words given previous words, randomly sample from such predictions, extend the sequence of words, and proceed anew. But their distinctive mode of generativity can be framed in another way: given some amount of something (such as a text, or anything that can be encoded as a text, which is anything that can be digitalized), they make more of that same something, all the while keeping with the essential patterning, and hence underlying nature, of what came before: not just the next utterance given prior utterances or the next action given prior actions, but also the next price given prior prices, the next scene given prior scenes, the next thought given prior thoughts, the next entity given prior entities, and the next event given prior events. In short, they generate the future given the past.

Language models tend to make text that is simply in keeping with prior text. But they can be adjusted—indeed, tuned—to make plausible but rare, rather than normative and boring, texts. For catchy rupture is no less valuable than patterned staidness, at least for certain agents. And no doubt history—as a sequence of events, however much like a slaughter bench—does not exhibit the kind of formal cohesion and functional coherence as language. But, to an agent with the right capacities, specific stretches of history, framed in a particular light, actually might to a certain degree.

Such a creative capacity (next-event prediction, or make more given some) might be seen as a relatively abstract potential, analogous to both labor power and linguistic competence. And just as linguistic competence can be contrasted with its performance (such as an actual discursive act, whatever the function) and labor power can be contrasted with its exercise (such as an actual act of labor or some form of work per se), such a potential can be contrasted with its concrete actualizations. On the one hand, then, there seems to be a relatively singular, invisible, and portable locus of power—truly a means without ends. On the other hand, there are manifold, sensible, and context-specific actualizations of that power—via specific interpretants of signs or responses to prompts.

The semiotics of performance-competence relations will be treated in chapter 8. For the moment it is worth remembering one classic story regarding the origins of surplus value. The capitalist buys potentiality (labor power, in the form of workers, insofar as they embody the capacity to create value) and then sells actuality (the commodities produced through the exercise of that power), where profit resides in the difference in exchange value, or price, between the actual and the possible.

In light of such a claim, the overarching metaphor up to this point has been overly optimistic, insofar as it has mainly foregrounded the mediating relations between human values (V) and machinic parameters (θ). In particular, mediating both of these variables is profit and/or power (P). So a third symbol and a third actor should be introduced: A_P, understood as a corporate agent, such as a corporation or state, whose ultimate telos or deepest ground is something like profit in the form of shareholder value, if not raw power per se. To be sure, as seen in the discussion of alignment criteria (especially in chapter 4), such an agent has been here all along.

“ The fundamental function of language models, their ultimate end, true demon (eudamonia), or “flourishing,” will be generating profits for their corporate parents.”

— Quote Source

Crucially, going forward, it is certain that such corporate agents will incorporate human and machinic agents alike, such that their agency will be radically distributed and hence equal to, if not greater than, the sum of their parts. Moreover, and perhaps more important, it is likely that such technologies will redefine, if not reconfigure, what is meant by an agent and where to draw the line between different kinds of agents—such as the human, the machinic, and the corporate. This is especially true insofar as the capacities of such agents are, in large part, the emergent products of their coupling and the interactions such coupling enables.

Whatever happens, it is also arguably the case that the ratio of human agents to machinic agents, as incorporated, will shrink to zero over time, as A_θ come more and more to replace A_V. In effect, the human agents whose values grounded and guided the training of the models will be more and more pushed out of the process—even though that same process could not have gotten started, and there is no reason for it to continue (or so say we), without them.

Given such modes of mediation, incorporation, and replacement, all three agents will come to have their actions, inferences, and utterances—and hence their interpretants—influenced by all three variables. In particular, values, as the fundamental interpretive ground of human agents, will be mediated by power and parameters: V = V(P, θ). Parameters, as the fundamental interpretive ground of machinic agents, will be mediated by values and power: θ = θ(V, P). And power, as the fundamental ground of corporate agents, will be mediated by values and parameters: P = P(V, θ). As may be seen by the order of the arguments (V > P > θ), I am offering an educated guess, however optimistic, as to the particular preference hierarchies of such agents—however they might try to convince us otherwise through their wily ways with words.

But maybe this last claim is overly optimistic. To return to derivative modes of intentionality, and hence to the way that the objects and ends of machinic agents are parasitic on the intentionality of their human makers, it might be argued that the fundamental function of language models, their ultimate end, true demon (eudamonia), or “flourishing,” will be generating profits for their corporate parents, as mediated by the difference between the actual and the potential, and hence by the enormous gap between what corporations get and what they gave.

Paul Kockelman

Paul Kockelman is Professor of Anthropology at Yale University. He has undertaken extensive ethnographic and linguistic fieldwork among speakers of Q'eqchi' (Maya) living in the cloud forests of Highland Guatemala, working on topics ranging from poultry husbandry and landslides to inalienable possessions and interjections. And he has long engaged in more speculative inquiry at the intersection of artificial intelligence, new media technologies, cognitive science, and critical theory. His books include: The Anthropology of Intensity, The Art of Interpretation in the Age of Computation, Mathematical Models of Meaning, and The Chicken and the Quetzal.

6. Parrot Power

Modes of Generativity

Are Language Models Stochastic Parrots?

Generativity, Power, Profit

5. Labor and Discipline

7. Language Without Mind or World