10. The Problem with Alignment

Apr 18

This chapter returns to some of the issues raised in the introduction, while reviewing and reworking key themes of this essay. I review various senses of alignment, as this term applies to semiotic processes, and compare them with the alignment of machinic parameters and human values. I then discuss and critique the alignment problem—the idea that it may be difficult, if not impossible, to make machinic behavior align with human values in truly important ways, given the open-endedness of the world and the uncertainty of the future.

“Will the systems that we design and train ultimately do what we want and expect? And how do we ensure that they behave accordingly in an open-ended world and uncertain future?”

— Quote Source

In everyday English the term alignment has many related senses. It may refer to the proper adjustment of components (in a system) for appropriate functioning (of that system). It can refer to an agreement, or alliance, between two or more parties. It can refer to the ground plan of a railroad or highway system (as opposed to the profile). And, of course, it can refer to entities being in a line, or, perhaps more frequently, to entities being in the appropriate relative positions, such that, wherever they happen to be, they face the same direction. In other words, it often refers to the orientation or stance of agents as opposed to their position. Are they directed to the same objects? Are they guided by the same ends?

Alignment

With all this in mind, it is easy to envision various kinds of semiotic alignment. This may turn on representations, such as beliefs and assertions, aligning with the world (insofar as they are true). It may turn on the world coming to align with performative utterances, insofar as the latter are felicitous—and hence not just appropriate in context but also transformative of context. In other words, just as signs can come into alignment with objects, objects can come into alignment with signs. This may turn on interpretant-object relations corresponding to or coming into alignment with sign-object relations. For example, does the interpreter come to look in the direction that the signer is pointing? Does the addressee come to believe—or not—what the speaker is saying? Is some kind of intersubjective agreement between agents achieved? And this may turn on semiotic agents having, or at least coming to have, the same values, interpretive grounds, or guiding principles. For example, do the agents share, or come to share, a set of conventions or a model of causal relations? Are their ontologies and partonomies in agreement? Do they have similar preference hierarchies or evaluative standards? Do they agree on what constitutes good alignment criteria? In short, semiotic processes and their conditions and consequences are easily framed in terms of different modes of alignment: between objects and signs; between sign-object and interpretant-object relations; and between the values of signifying and interpreting agents.

These ideas, in a slightly extended sense, have already been used to frame the relation between different kinds of semiotic agents. Recall figure 5, which showed how machinic interpretants (or “responses”) can be brought into alignment with human signs (or “prompts”), insofar as machine parameters (θ) are brought into alignment with human values (V). It is also relatively easy, however anxiety-provoking, to add corporate agents, be they corporations or states, into the mix, for values and parameters can also be made to align with—as well as counter-align against—power and profit (P).

Problems with Alignment

The alignment of machine behavior with human interests is so important among AI ethicists that it has its own name: the alignment problem. More carefully, such a problem might be formulated as follows: Will the systems that we design and train ultimately do what we want and expect? And how do we ensure that they behave accordingly in an open-ended world and uncertain future? Phrased another way: How do we make sure that machine behavior (and thus the parameters that guide it, as well as the algorithms that underlie it) aligns with human values? And not just for now, but for all time, come what may?

Many futurists, ethicists, and experts in artificial intelligence have pondered these questions. And the extended discussion in chapter 4 regarding reinforcement learning with human feedback and the training of reward models showed one way that this challenge is being met, at least in a relatively circumscribed domain of satisfying users’ intentions and corporate interests when responding to prompts. Rather than delve further into the large, speculative, and unresolved literature around this topic, I pose five other problems—tangentially related to the alignment problem—that are, if not more pressing, at least more in line with the arguments of this essay.

First is what might best be called the de-alignment problem: with the advent of large language models and more and more sophisticated forms of artificial intelligence, the capacity of humans to create, share, and improve their values—which is perhaps the true generative endowment of the human species—may be weakened. And this diminishment is due, at least in part, to interference by noise and interception by enemies insofar as public discourse and private conversations come to be more and more mediated, and thus affected and directed, by what are essentially weaponized chatbots.

Second, and closely related to the first, is the realignment problem: human values may come to be more and more mediated by machine parameters (rather than vice versa), which may themselves be mediated by corporate agents with dubious and selfish, if not outright malicious, values and interests.

Third is the provincial-value problem: just as not all human voices are in the training corpus, not all human values determine machine parameters. So whose values were machines aligning with in the first place? And who gets to determine whose values machine parameters will align with in the future? In other words, never mind whether machinic parameters will align with human values; the question is which values, to what ultimate end, how we could know, and who should decide.

Fourth is the posthuman problem: machines might easily be trained to align with human values; the problem is that human values, whichever collectivity they happen to come from, may not be all that great to begin with—at least when the ultimate repercussions of value-guided semiotic processes (and hence human-specific modes of attention, inference, and action)—are examined on larger scales.

Fifth is the Pandora problem, which requires some explanation. As was shown in the discussion of dynamic generativity, how a tool can be used (dynamically) and how a tool may be used (deontically) are worlds apart. In other words, everything is reducible to its affordances: what is physically possible to do with it rather than what is normatively appropriate. This means that to rein in any machinic agent, a particularly powerful tool, we have to rein in all human and corporate agents from now on, insofar as they might be prone to abuse that agent’s generative potential. In effect, every powerful new technology needs to be policed forever after to ensure appropriate usage—and such modes of surveillance and control ultimately may be worse for our collective existence than the modes of misuse they were designed to stop.

Finally, there is the real alignment problem: unchecked wealth inequality and resource extraction—and thus social hierarchies and environmental degradation—already pulled humans out of alignment with each other and with the earth (and most other living kinds). In other words, AV is wildly out of alignment not just with itself but also with its true generative matrix, A_E, understood as the mother ship of all agency. So all this attention to large language models and generative AI is a massive distraction from the really pressing issues that currently hurt life on earth. Indeed, given the resources they consume, the conversations and debates they degrade, and the social relations they elide and strain, they are only adding fuel to the fire.

Finally, there is the real alignment problem: unchecked wealth inequality and resource extraction—and thus social hierarchies and environmental degradation—already pulled humans out of alignment with each other and with the earth (and most other living kinds). In other words, AV is wildly out of alignment not just with itself but also with its true generative matrix, AE, understood as the mother ship of all agency. So all this attention to large language models and generative AI is a massive distraction from the really pressing issues that currently hurt life on earth. Indeed, given the resources they consume, the conversations and debates they degrade, and the social relations they elide and strain, they are only adding fuel to the fire.

To return to the introduction, ChatGPT and the like are what we might call hyperagents: agents imbued with excessive hype relative to other agents. Such unjustified attention may be due to the fact that large language models seems to be coming for the jobs of the writing—or at least chattering—classes. Such people are precisely the ones who currently—but perhaps not for long—write articles, books, blog posts, screenplays, and tweets. So the simplest interpretant of this essay is that it is nothing but a symptom of the author’s anxiety in the face of his own obsolescence.

But hopefully my arguments have offered more than that, thereby affording a wider range of interpretations. By focusing on the conditions for and consequences of techno-horizons, I have tried to cut through some of the bullshit that is espoused by large language models (and their makers, masters, and marketers), however eloquent and enchanting they may seem. And by bringing to light some of the more subterranean ways that values, parameters, and profits—as guiding principles—ground inference, action, intuition, and affect, I have sketched some of the ways that semiosis and sociality may be radically realigned in the facelessness of our new interlocutors.

Paul Kockelman

Paul Kockelman is Professor of Anthropology at Yale University. He has undertaken extensive ethnographic and linguistic fieldwork among speakers of Q'eqchi' (Maya) living in the cloud forests of Highland Guatemala, working on topics ranging from poultry husbandry and landslides to inalienable possessions and interjections. And he has long engaged in more speculative inquiry at the intersection of artificial intelligence, new media technologies, cognitive science, and critical theory. His books include: The Anthropology of Intensity, The Art of Interpretation in the Age of Computation, Mathematical Models of Meaning, and The Chicken and the Quetzal.

10. The Problem with Alignment

Alignment

Problems with Alignment

9. On Interpretation

References