Random thoughts on random topics

Zhengzhou station

On ChatGPT and LLMs in general

An intuitive response to the phenomenon of ChatGPT is that in a sense it redefines the concept of understanding. But one thing to notice right away is that it is not the existence of the program as such but the interactions that humans have with it that is responsible for that. That might seem a mere detail, but it is not.

In many of the discussions around ChatGPT and other LLMs one can discern echos of arguments and observations that have surrounded AI ever since it origins in the early sixties of the last century. Those debates largely centred around the relation between human intelligence and artificial intelligence. And many of the arguments were of a largeIy conceptual (or, if you will, terminological) nature. (I think it was Douglas Hofstadter who once said: ‘Intelligence is the next thing that AI cannot do.’)

Now, as then, there seems to be a silent assumption at work that informs the reactions of both those who claim that LLMs understand language and those that disagree. That assumption is that linguistic competence, and understanding more generally, is an individual property that in some way or other resides in the individual as such, as a particular property of the individual brain, as a mental state. And then the crucial question is whether that property or state, or at least a property of state that is sufficiently analogous, can be ascribed to an LLM.

That assumption seems to be based on a conception of understanding that is, at best, narrow, but that can perhaps better be described as wrong to begin with. If we look at when and to what we ascribe understanding, or a lack thereof, we see that understanding is a concept that is intimately tied to a practice, to an ability of an individual to function in certain ways within the wider context of a community. And ascription of understanding is then subject to a range of considerations, some of which derive, not from the individual, but from the community. We can see this if we look at, e.g., shifts over time, at the level of the community and at the individual level. When we say of a high school student that they understand calculus we employ different criteria then when we are dealing with math students, or math professors, or with civil engineers. Likewise, we would probably say, I venture, that our understanding of calculus now is different from, and in certain sense deeper than, that of seventeen century scientists. And there is also a matter of application: for certain purposes a certain measure, and even a certain type, of understanding is required that might not be appropriate or sufficient for other purposes.

So understanding is a travelling concept: it differs from individual to individual (pupil, expert), it develops historically, it is tied to particular applications, … As a result the criteria for ascribing (or denying) understanding are also not strict and uniform. And, most importantly, in almost all cases the criteria we employ to ascribe understanding have practical component. Understanding calculus is intrinsically tied to an ability to do something, viz., solving calculus problems, making calculations to determine the trajectory of a space craft, … There seems to be no understanding that is not tied to (the ability to perform) actions.

From this perspective, it seems indeed appropriate to say that the development of a new technology changes the concept of understanding with respect to a particular subject matter. But note that it is not the technology as such, but its introduction in a wider context, i.e., its application to problems and tasks that human also apply their understanding to and the interactions between the technology and humans, that is a crucial driver here. For example, the development of technologies (PET, fMRI, …) for scanning the brain haven’t had the same kind of effects because scanning the brain is not something that humans can do without that technology. (Which is not to say that the development of neuroscience has not led to conceptual changes, it has, but these have arguably proceeded in different, more indirect ways.) Whether such changes will catch on is hard to predict but seems intimately connected with the role the new technology is given to play in our everyday lives. So in a certain sense it is us who decide: accepting AI (in the form of LLMs, robots, …) in our everyday lives will subtly but inevitably prompt us to change our views on these newfellow beings’, change the way in which we use concepts such as understanding, feeling, thinking, imagining, etc., so as to maintain a certain measure of coherence in how and with what/whom we live our lives.

That being said, there is the question as to whether LLMs understand language in a more narrow sense, and that seems to be the sense in which the matter is being discussed by and large. Here the key observation seems to be that as a matter of fact we have no insight whatsoever into how LLMs work. And apparently, that lack of insight is for many people sufficient reason not to attribute understanding to an LLM: insight in how things work is an essential element of our concept of understanding, and a specific one at that. For, of course, we know how an LLM works (well, to a certain extent …) because we build it ourselves, but apparently that is not what we are after: we want a particular type of understanding, one that is allows us to generalise beyond the concrete phenomenon that we are dealing with. Here logic comes to the fore as a way (one of the ways, surely) to obtain that kind of understanding: discerning logical patterns, logical connections, in the many layers of data that are created, manipulated, applied in the building and application of an LLM.

LLMs and linguistics. It seems clear that an LLM is not a linguistic theory, and arguably it is not a theory in the ordinary sense of the word at all. The kind of questions that linguists are interested in, such as how the morphosyntactic structures in a particular language developed over time, or how language contact gives rise to so-called pidgins and creoles, whether there are upper limits on embedding structures and if so, what determines them, all these are not questions that an LLM answers. It even does not make sense, it seems, to formulate them in the context of an LLM in the first place.

The questions mentioned above are typically questions that are raised in descriptive and in theoretical linguistics. When it comes to psycho-linguistics things might be different, as the questions that one tries to answer there might indeed have counterparts with respect to LLMs. For example, some questions about language acquisition, for example concerning the nature and amount of data, or the role of correction and explicit instruction, could have counterparts when it comes to the construction of LLMs. Likewise, one could imagine that certain language pathologies have counterparts in malfunctioning LLMs. This is because the underlying material substrates (the brain in the case of neurolinguistics and neural networks in the case of an LLM) are more aligned, which makes the supervenient concepts more akin.

Martin Stokhof
from: Aantekeningen/Notes
date: 19/02/2023