7. Language Without Mind or World

Apr 18

As powerful as next-word prediction and response generation are, and notwithstanding my spirited defense of parrots, the capacities of large language models are about as far from genuine linguistic competence, much less human sentience and sapience, as can be. In light of all the hype surrounding language models, it is useful to discuss not just their limits but also the conditions of possibility for our limited awareness of those limits.

“Language models are best at learning the kind of knowledge that is talked about, and thereby made explicit in texts, as preserved in various corpora. And through such texts, they learn fictional claims as much as factual ones and are exposed to empty talk as much as sincere convictions, and so are likely to make false connections as well as true ones.”

— Quote Source

The Absence of Real Objects

Recall that the decisive slash for language models, as that which separates relatively immediate signs from relatively mediated objects, divides earlier parts of a text from later parts of the text. This is because the main task asked of such models, especially in the context of pretraining, is to predict next words conditioned on previous words. This should be contrasted with the decisive slashes of human agents: either the slash that separates relatively public representations (such as speech acts) from relatively private representations (such as mental states) or the slash that separates such representations, public or private, from the world per se (as that which is represented).

In other words, language models in the strict sense are not designed to represent states of the world or patterned relations among such states in ways that are truthful (or at least useful). They are designed to represent words in texts, and/or patterned relations among such words, in ways that are useful (or at least profitable). In a certain sense, their key capacity (nextword prediction) is their main limitation: worldlessness.

The same idea may be formulated in many other ways, each of which adds its own distinctive emphasis. What language models model is essentially horizontal relations among words, as opposed to vertical relations between words and worlds. Language models, for all the wonders of their word embeddings, model sense but not reference and thereby concepts in relation to other concepts, but not concepts in relation to things or propositions in relation to truth values. As already discussed, even though language models model cotext (understood as co-occurring text) as opposed to context, those who theorize, engineer, and train such models constantly conflate context with cotext. Thus they lexically shield themselves from everything outside of the lexicon. Cognitively sophisticated agents build representations of their environment that go beyond the experientially given, and they use such representations to flexibly act on their environments, insofar as such representations allow them to determine favorable courses of action. Language models are certainly not agents in this cognitively sophisticated sense.

Finally, and perhaps most generally, language models lack a reality principle—the shock that arises when one’s representation of the world fails to correspond to the world per se, and the search for a better representation that is set in motion by that shock.

No doubt most of us are so ideologically cocooned by our social networks, media outlets, personal beliefs, and cultural values that we too are cushioned from such shocks—at least in the short term. And no doubt many consumers will explicitly demand that companies respect their values, and thereby make language models that reflect their worldviews rather than the world, and hence reflect their faith as opposed to facts. There will be Christian language models well as Muslim language models, conservative language models and liberal ones too. And so language models, and their users, will be multiply cocooned from such indexical confrontations. Some more than others.

Some of these limitations will be remedied soon enough. So called “entity embeddings,” in addition to word embeddings, have already been introduced. And referent and experience embeddings, and thus what might best be called world embeddings as opposed to word embeddings, are not far off. Moreover, once language models can take multimodal experiences of reality as inputs (in addition to their usual textual inputs), as indexed to specific positions, times, and events in their environment, they will build better models of reality. And once language models are embodied enough (say, in robots) that they can have physical actions (if not “deeds”) as well as discursive actions as their outputs, they will be more and more forced to contend with the hard edges of reality. When such capacities are added to language models, their loss functions will no longer be measured simply in terms of their predictive power (via cross-entropy loss) or reward (via their satisfaction of alignment criteria) but also in terms of how much they—or their overlords—gain or lose, succeed or suffer, given the consequences of their actions, in light of their inferences from such experiences, and in reference to some utility function or set of existential values. And such forms of data, styles of training, and mode of evaluation are, arguably, not too far off either. In short, a deeper respect for reality can be engineered for such machinic agents— at least to a certain degree, and if only as shaped and softened by corporate interests.

It is sometimes argued that if a language model, or machinic agent more generally, can carry out conversations in such a way that its human interlocutor cannot tell the difference between it and a human, then the language model must have learned what the average human knows about the world and so “know” what a human knows. This is a plausible claim. Simply by learning how to predict next words, language models learn oodles of information about the world. By learning what predicates typically apply to what subjects and what consequents (or then-clauses) typically follow what antecedents (or if-clauses), language models learn a lot of substantive content (in the form of correlations): part-whole relations, cause-effect relations, species-genus relations, spatial and temporal relations, agent-action relations, sign-object relations, premise-conclusion relations, and far beyond. And language models not only learn general knowledge (e.g., penguins are birds, humans have arms, fire causes smoke, cane is the Italian word for “dog,” and so forth), they also learn singular facts: where Napoleon was born, what he did, and why he died. It should be no surprise that they do well on a wide range of standardized tests, for such propositions constitute a large part of human values.

However, language models are best at learning the kind of knowledge that is talked about, and thereby made explicit in texts, as preserved in various corpora. And through such texts, they learn fictional claims as much as factual ones and are exposed to empty talk as much as sincere convictions, and so are likely to make false connections as well as true ones. Indeed, they only learn to predict the correct predicate of a subject if that predicate occurs with that subject frequently enough in the model’s training corpus. If something is said enough, whatever its truth value, a language model will tend to offer it as output.

It is often claimed that such models are prone to “hallucinate,” meaning that they make false assertions with great confidence or simply make up facts per se. The truth is that all the language models, in the strict sense (and so prior to fine-tuning via human reinforcement learning), ever do is engage in next-word prediction. This often leads them to say true things, but it also leads them to say false things. In other words, the models themselves always relate to reality the same way (tenuously); it is only users who perceive and label their outputs as perceptions when they get something right and hallucinations otherwise.

Finally, as much as language models learn from texts, they do not necessarily learn tacit knowledge, embodied intuitions, deep presuppositions, and the like. In particular, they have a harder time learning all the things that cannot be articulated, written down, or made explicit. In other words, even though machinic agents can learn quite a lot just by “reading about” the world, there is so much they cannot learn and/or so many claims they cannot competently weigh in on, insofar as they do not yet reside in the world—as sensing, acting, and feeling agents, with bodies, habits, memories, relatively singular biographies, and group-specific histories.

To be sure, and looking ahead to the next section, large language models currently know so little about their own limitations, fine-tuning aside, that they will do their best to bullshit, if not machine-splain, their way through any question they are given.

“The list of things that language models do not yet have, or cannot do, is enormous.”

— Quote Source

Minding Language

The preceding section focused on the limited knowledge that language models have of the world. Closely related, but not equivalent, is their limited capacity to undertake logical arguments, engage in evidence-based reasoning, or offer novel and helpful hypotheses.

In addition to their pretraining through next-word prediction, some language models are also fine-tuned on logical relations. For example, given two sentences, a model can be trained to determine whether the second sentence is entailed by the first, in the sense that whenever the first sentence is true, the second sentence is true as well. Such training gives language models some ability to engage in logical operations, and far more sophisticated training methods are already underway. And, as just discussed, simply by learning to predict next words, language models learn lots of facts. This gives them the ability to answer a wide range of questions, which may constitute a sign of intelligence to a casual observer. But these abilities are not evidence of some deep capacity to reason. They are, rather, symptomatic of a huge training corpus, an enormous number of parameters, massive computational resources, and human ingenuity and labor.

To be sure, computational systems are already incredibly good at deductive logic: given true propositions as premises, they can generate true propositions as conclusions. And they are also incredibly good at inductive logic; indeed, large language models, and machine learning algorithms more generally, are essentially induction machines. But such an ability does not mean that language models, in the midst of interaction, while they undertake your commands and answer your questions, are good at induction and deduction per se. They are not yet good logicians in interactional time, only in computational time. Finally, and perhaps most importantly, a real test of intelligence—or at least creative intelligence—is arguably abduction, or hypothesis formulation, in the tradition of Charles Sanders Peirce: given a remarkable pattern or a surprising event (in light of deep-seated expectations), construct a creative, plausible, and testable explanation for it—one that goes far beyond all that has been said already. And large language models can hardly do this at all.

Moreover, language models do not yet weigh sources of evidence, assess the strength of citational chains, or otherwise judge the plausibility of claims. This does not mean, as was discussed in chapter 4, that they cannot be fine-tuned to align with criteria like “truthfulness.” But that just involves satisfying preferences while training, not actually checking sources before responding. And it does not mean that they do not say that they do: they will often tell you the reasons for their claims. But such reason giving is usually more text generation based on next-word prediction. In other words, insofar as reason giving, as a discursive pattern, is found in the texts that a language model was trained on, the model will give reasons for its claims. They are merely going through the motions of reasoning.

Augmented language models will certainly be trained to assess the truth value of their claims: the sources of evidence for their assertions and how numerous, coherent, and credible they are. Much of the value of these models, for individual users and corporate agents alike, will turn on generating true assertions (relative to the worldview of users), not sentences per se. For search engines, scientists, and schoolchildren, no less than screenplay writers and science fiction authors, will utilize language models more and more—however much they disavow them. But for this to happen language models will need to reference, or otherwise remember, their sources, so they will have to come clean about whose works they have used, exploited, or appropriated. And so there will be a tension between fully disclosing sources and providing strong evidence for assertions, and hence a tension between exchange value and truth value. Credible sources should be not just recognized but also remunerated.

One could go on in this fashion. The list of things that language models do not yet have, or cannot do, is enormous: no body, no self, no real use of conventions, no nonderivative intentionality, no consciousness, no ostensive-inferential communication, and so forth. Also enormous is the list of capacities that language models will soon acquire, or so we are promised, through up and coming techniques like scaffolding and chain-of-thought, as well as through increases in computational power, improvements in algorithms, and greater access to context. (And given their rapid progress and broad powers, I would not bet against them on just about any task in the long-run.) Rather than go down the rabbit hole of listing their limitations, or making predictions, I will focus on a more pressing issue: that which mediates our sense of what such agents can and cannot do.

A key rhetorical strategy of corporate agents, as used to attach us to large language models as their flashiest new app, is this: on the one hand, stoke the hype (and stem the fears) regarding what the future of large language models will bring; on the other hand, lower expectations regarding the current abilities of their products.

Framed another way, the hype surrounding large language models arguably turns on several closely related slashes, understood as semiotic horizons:

the slash that separates the present capacities of such models from their future potential;
the slash that separates the actual performance of any such model from its underlying competence;
the slash that separates text (and cotext) from context, and hence strings of words from the world per se;
the slash that separates worldviews from worlds, or maps from terrains;
the slash that separates the inputs and outputs of language models (as experienced by users) from their inner workings (as understood by experts);
the slash that separates whatever immediate enjoyment or utility the models provide from the more mediated exploitation their existence presupposes, as well as the more mediated suffering and risk their adoption entails.

All these slashes are related to two other slashes long ago explored by critical theorists: the slash that separates what goes on in the market (or realm of exchange) from what goes on in the factory (or realm of production) and, perhaps most generally, the slash that separate subjective experience from objective reality and/or existing social relations.

In effect, such slashes separate the worldlines of machinic agents (that is, all the conditions for and consequences of the existence and nature of such agents) from the horizons of human agents (that is, what such agents are aware of and/or can reason about).

For slashes are, in a certain sense, symptoms of the presence of experiential horizons. If they did not exist, we would not need semiosis: for signs mediate our understanding of what is on the other side of slashes. Yet slashes themselves are closely related to barriers and obstacles. Indeed, their presence is often the trace of an unequal social relation: those who have relatively unmediated access to certain objects and events versus those who require additional signs to otherwise experience such entities; those who build barriers and impose limits versus those who suffer their existence or find ways to transcend them.

As argued early on by critical theorists, certain slashes lead to the systematic misrecognition of the origins of value, and this misrecognition leads to two distortions: we tend to see current conditions as the only conditions (what is intersubjectively believed is treated as objectively real); and we grant too much agency to nonhuman agents (be they machines or corporations, fantasies or deities) and too little agency to human agents.

One need not commit to such a worldview, however penetrating it may seem. Nonetheless, it is but a few short steps from value (as grounded in praxis) to values (as grounded in and grounding of semiotic practices) to parameters (as mediated by values) to profits (as mediated by the coupling of values and parameters). Moreover, it may be that human agents have ceded so much agency to corporate agents that such a particular worldview has become the world, or at least remade the world in its own image. In any case, I want to follow a related—but slightly less restrictive—line of thought.

In the spirit of the anthropologist Alfred Gell, it could be said that large language models really are magic, at least in the following ways. Such models seem to be labor-saving devices. Indeed, they seem to make the (marginal) cost of labor, at least to produce certain items, go to zero. For most of us, it is very difficult to imagine how such models generate the products (responses, or texts) that they do. It is truly beyond our understanding. Such magical abilities become a testament to, or constitute strong evidence for, their creator’s abilities. Just check the change in the valuation of companies like OpenAI after ChatGPT was originally introduced. The abilities of one model, compared to others, may show how weak other magicians are: Google’s reputation sank in the face of ChatGPT when all it had to show was Bard. Finally, the magical abilities of such models capture our attention and thereby distract us from more pressing concerns: climate change and environmental degradation, wealth inequality, corporate intrusion and dispossession, interference and noise, surveillance and censorship, damage to open discourse and the public sphere, and even real advances in artificial intelligence.

To be sure, I have at times in this essay engaged in a similar bait-and-switch as the one introduced above: talk up—if only to vilify—what machinic agents will soon be capable of; also continuously point out their current limitations. In other words, fixate on their current limits while fetishizing their immanent potential. That said, I have also worked hard to carry readers across such slashes to the origins—and effects—of the slashes themselves.

Paul Kockelman

Paul Kockelman is Professor of Anthropology at Yale University. He has undertaken extensive ethnographic and linguistic fieldwork among speakers of Q'eqchi' (Maya) living in the cloud forests of Highland Guatemala, working on topics ranging from poultry husbandry and landslides to inalienable possessions and interjections. And he has long engaged in more speculative inquiry at the intersection of artificial intelligence, new media technologies, cognitive science, and critical theory. His books include: The Anthropology of Intensity, The Art of Interpretation in the Age of Computation, Mathematical Models of Meaning, and The Chicken and the Quetzal.

7. Language Without Mind or World

The Absence of Real Objects

Minding Language

6. Parrot Power

8. Metasemiosis and Monsters