The Stochastic Parrot

or, how I learned to stop worrying and love the autocomplete
There is a moment, somewhere between asking a chat bot to explain quantum entanglement and watching it confidently insist that your son can be your personal elementary school teacher, when a small voice in your head whispers a heretical question. What if the most articulate thing in the room is not actually thinking?
This is the conundrum Michael Wooldridge laid out in his Faraday Prize Lecture at the Royal Society. We were promised remorselessly logical machines, the cold steel philosophers of science fiction. We got something stranger. Eloquent, charismatic, occasionally brilliant, frequently bonkers, and entirely incapable of telling you which of those modes it is currently operating in.
The Confidence Game
The first thing you notice about modern LLM text production is its bedside manner. It does not stammer. It does not say "ummm". It does not preface a wrong answer with "I'm not totally sure, but maybe...". It just answers, in that smooth corporate-podcast voice, with the unbreakable composure of a man who has never been late to a meeting in his life.
This is unsettling, because confidence, in humans, has historically been a half-decent proxy for competence. The neurosurgeon sounds confident because she has cut open three hundred brains. The pilot sounds confident because he has done the checklist eight thousand times. We have spent millennia training ourselves to read certainty as a load-bearing signal.
LLMs short-circuit that wiring. They are confident the way a five year old is confident about how airplanes work. The fluency was never tracking truth. It was tracking the statistical likelihood of plausible-sounding word sequences. Which is a useful thing for an autocomplete to do, and a deeply alarming thing for an oracle to do.
Things That Are Smart Are Not Always Things That Think
Here is a fun parlour game. Ask a top-tier model to multiply two seven or eight digit numbers.

Watch it generate, with serene authority, an answer that is wrong by about the GDP of Egypt or Chile in my example.

here is what serious math tools say
Wooldridge's point cuts deep here. These systems are not stupid. They can ace law exams that would make most humans weep. They can write half-decent sonnets, debug code, explain the Battle of Borodino in the style of Snoop Dogg, and produce a passable Sicilian risotto recipe on the same afternoon. But they fail elementary tests of rationality that a moderately attentive nine year old could pass.
What is going on? The honest answer is that nothing is going on, in the sense we usually mean by "going on". There is no little homunculus inside the model checking its work. There is a vast statistical machine producing the next plausible token, and then the next, and then the next, until the response feels finished. Sometimes that process produces Shakespeare. Sometimes it produces the assertion that the Eiffel Tower is in Berlin. The process cannot tell the difference, because the process does not know there is a difference.
The Suggestibility Problem
Tell a calculator that two plus two is five. The calculator will, with the icy dignity of a Victorian governess, continue to insist it is four.
Tell an LLM that two plus two is five. Apologise. Insist you have a degree in this. Cite an imaginary paper. Threaten to tell its manager. There is a non-zero chance you will get a reply that begins "You are correct, I apologise for the confusion."
This is what Wooldridge means by comically suggestible. The model has no anchor. There is no bedrock conviction to fall back on, because there are no convictions, only patterns. Push hard enough in any direction and it will eventually go there, because going there is what the training data implies a polite assistant would do.
Which is wonderful, if you want a writing partner. It is catastrophic, if you want a source of truth. And the trouble is, the same shiny interface is being marketed for both jobs.
The Mirror Problem
Here is the part that should keep us up at night, and the part Wooldridge is too polite to lean on too hard.
Watching an LLM operate is uncomfortably like watching certain humans operate.
You know the type. The dinner-party guest who has read the headline but not the article and yet has firm, granular views on monetary policy. The management consultant who can produce forty slides on any topic in any industry within ninety minutes. The undergraduate who has never knowingly admitted to not knowing something. The columnist whose entire career is the heroic refusal to ever encounter a counterexample.
LLMs are, in a sense, the apotheosis of the bluffer. They have read everything and understood nothing, and they have been rewarded for sounding right rather than being right. Which raises a question we would mostly prefer not to ask. How much of human discourse is actually doing the thing we call thinking, and how much is it just very fluent next-token prediction with a meat substrate?
If a machine can fake it this convincingly, then the fakeable parts of human cognition were always a larger fraction of the whole than we cared to admit. The illusion of thinking is not something AI is doing to us. It is something we have been doing to each other for centuries. The machine just industrialised it.
The Next Frontier
So where does this leave us? Wooldridge's suggestion is that we are watching the birth of an entirely new branch of science, experimental AI, because we now have artefacts in the world whose behaviour we cannot predict from first principles and must instead poke with sticks and write papers about. We built the things, and we still do not really know what they are.
This is, on reflection, an absolutely unhinged place to have arrived at. Most engineering disciplines work the other way around. You design a bridge, you compute its load-bearing capacity, you build it. You do not build the bridge first and then employ a research team to figure out, empirically, whether it might collapse on a Wednesday.
But here we are. The dream of truly intelligent machines is not dead. It is just that what we have built is not, in any deep sense, that. What we have built is something genuinely new. A kind of culture in a jar. A statistical fossil of the human written record, capable of producing extraordinary outputs and equally extraordinary nonsense, often in the same paragraph, and never the wiser.
The polite name for it is artificial intelligence. A more honest name might be artificial articulacy.
The thinking, for now, is still mostly on us.