LLMs Are the Monkeys That Finally Typed Shakespeare

Nassim Taleb proposed a thought experiment in Fooled by Randomness: given infinite monkeys typing on infinite typewriters, one of them will eventually produce the complete text of the Iliad.

The more I think about it, the more I believe this story's endgame is today's large language models.

Monkeys, Shakespeare, and an Underappreciated Metaphor

Let's lay out the classic thought experiment clearly.

The infinite monkey theorem states: given enough time and enough random attempts, any ordered content can emerge from randomness. A single monkey typing for ten thousand years will almost certainly produce gibberish, but infinite monkeys typing for infinite time? Probability 1—the complete works of Shakespeare, Einstein's papers on relativity, your diary entry for tomorrow—all of it will theoretically appear.

Taleb used this example to illustrate survivorship bias: if ten thousand fund managers operate randomly, one of them will inevitably beat the market for ten consecutive years. That "investment genius" you see might just be the monkey that happened to type a sonnet.

But today I want to push this metaphor in a different direction—

What if the monkeys stopped typing randomly and learned the statistical patterns of English?

From Random to "Almost Not Random": What LLMs Actually Do

What's the probability of a monkey randomly typing "To be or not to be"?

Assuming a keyboard with 50 keys and 18 characters (including spaces) in the phrase, the probability is roughly:

\[\frac{1}{50^{18}} ≈ \frac{1}{3.8 × 10^{30}}\]

In other words, you'd need approximately 3.8 quintillion quintillion attempts to stumble upon it once.

But LLMs aren't monkeys.

LLMs have read nearly everything humans have ever written and learned one thing: given preceding text, what word is most likely to come next.

When it sees "To be or not to," it doesn't randomly pick from 50 keys—it knows the next word is almost certainly "be." Not because it "understands" Hamlet's existential crisis, but because this pattern has appeared countless times in its training data.

LLMs are essentially monkeys that are no longer random.

They're still "typing"—generating text token by token. But each keystroke isn't blind; it's guided by the statistical distribution of all human knowledge.

The gap between \(\frac{1}{50^{18}}\) and a probability approaching 1—that's the difference between "random" and "trained."

What Taleb Didn't Anticipate: The Monkeys Evolved

Let's revisit Taleb's original argument.

He said: don't mistake a monkey's luck for a monkey's skill. That fund manager who beat the market for a decade might just be a survivor in a game of probability.

This insight was correct in 2001, and it's still correct in 2026.

But in Taleb's framework, the monkeys are forever random.

He never envisioned a scenario where you collect everything all the monkeys have ever typed—the garbage, the occasional masterpiece, everything—and feed it to a new monkey, letting it "learn" the patterns in those texts. What would happen?

The answer: you'd get a monkey that no longer needs infinite time.

This monkey doesn't need infinite attempts to stumble upon Shakespeare. Give it an opening—"To be"—and it can continue on its own. Not because it's intelligent, but because it's seen so much text that its "randomness" has been warped by the gravitational field of human knowledge.

An LLM is exactly this—a warped random process.

Each "next token" selection is, on the surface, random sampling (it literally is when temperature > 0), but the distribution has been deeply shaped by training data. It's not uniform noise; it's a highly structured probability field.

From this perspective, the very existence of LLMs is a realization of the infinite monkey theorem—not through brute-forcing ordered content with infinite time and quantity, but by learning all existing ordered content and compressing randomness to its extreme.

How Far Can This Analogy Take Us?

At this point, I realized this metaphor goes deeper than I initially thought.

LLM "Knowledge" Is Essentially Probability Compression

In what form does human knowledge exist? Books, papers, code, conversations, web pages, legal texts... all of it, at its core, is text sequences.

What LLMs do is compress these massive text sequences into a set of probability distributions.

Given any preceding text, it can produce a reasonable continuation. This means: knowledge that humanity accumulated over millennia has been compressed into a model's parameter space.

Isn't this the reverse of the "infinite monkeys"?

  • Forward: Infinite monkeys → random output → occasionally hit ordered content
  • Reverse: Collect all ordered content → learn its patterns → produce nearly non-random ordered output

LLMs took the reverse path. They don't rely on luck—they turned luck into knowledge.

It's Still a "Monkey"—And That's Not Pejorative

Many people debate whether LLMs constitute "real intelligence."

From the infinite monkey perspective, this debate can be reframed:

A monkey typed the complete works of Shakespeare—would you say it "understood" Shakespeare?

Under Taleb's framework, of course not. That's just the result of random collision.

But an LLM isn't a purely random monkey—it's a monkey that has learned every textual pattern. It can produce meaningful, coherent, even insightful content in most cases.

So the real question isn't "is it intelligent," but rather: on the continuum from random to ordered, where does it sit?

A purely random monkey sits at the far left. A truly omniscient being that understands the universe sits at the far right. LLMs sit somewhere in between—not random, but not omniscient either. They represent an unprecedented middle state: more ordered than any monkey, yet lacking the genuine understanding of any human.

And this middle state is already good enough.

The "Accessibility" of Knowledge Has Been Fundamentally Changed

Back to the core of the infinite monkey theorem: theoretically, any content could be randomly produced, but in practice, the required time approaches infinity.

"Theoretically feasible, practically impossible"—that's the predicament of pure randomness.

What LLMs achieved is: turning "theoretically feasible" into "practically usable."

Want a plain-language explanation of quantum entanglement? A random monkey might need \(10^{10000}\) times the age of the universe. An LLM needs 3 seconds.

Want a B+ tree implementation in Rust? Theoretically, a typewriter monkey would eventually produce one. An LLM needs 10 seconds.

Want a classical Chinese poem on "solitude" whose syllable structure secretly follows the Fibonacci sequence? The monkeys might never get there before the heat death of the universe. An LLM might give you a decent attempt—flawed, perhaps, but recognizable.

This is LLMs' true revolution: not creating new knowledge, but driving the cost of accessing existing knowledge toward zero.

A Deeper Question: If There Are Enough Monkeys, Does "Creation" Still Matter?

This is something I've been mulling over.

A corollary of the infinite monkey theorem: in a sufficiently large random search space, all possible text combinations already "exist"—most just haven't been found yet.

This is exactly what Borges' "Library of Babel" describes: a library containing every possible book, of which 99.9999...% is garbage, but that remaining 0.0001% contains everything of value—past, present, and future.

An LLM is the search engine for the Library of Babel.

It doesn't create content; it rapidly locates the meaningful sliver within a space that theoretically already contains everything.

This raises an unsettling question:

If every meaningful text combination theoretically already "exists" in probability space, what exactly is human "creation"?

Did we discover something that was always there? Or did we genuinely create something from nothing?

When Shakespeare wrote "To be or not to be," did he "create" that sentence, or did he "find" a specific coordinate in the combinatorial space of the English alphabet?

I don't have an answer.

But I do know this: LLMs have made this philosophical question more than just philosophical. When a machine can "find" in seconds what humans needed decades of inspiration to reach, the very definition of "creation" is being rewritten.

Back to Taleb: Randomness, Ability, and Our Illusions

Finally, let's circle back to Taleb's original intent.

He warned us against an illusion: mistaking randomly lucky outcomes for systematic ability.

This warning applies equally to LLMs:

  • When an LLM gives a stunning answer, it doesn't mean it "understood" the problem
  • When an LLM makes a basic mistake, it doesn't mean it's "unintelligent"
  • Every output is, at its core, a biased random sample

But Taleb's framework needs an update:

In his era, randomness and ability were a binary opposition—you either had skill or you had luck.

LLMs show us a third possibility: something distilled from massive randomness, sitting between luck and skill. It's not true understanding, but it's far beyond random collision. It's not intelligence, but it can accomplish a great deal of what intelligence can.

Maybe we should give this capability a new name.

Maybe it's already called: Large Language Model.


The infinite monkey theorem says randomness can theoretically produce everything. LLMs proved that when you shape the direction of "randomness" with all of human knowledge, theory becomes reality.

We didn't wait for infinite monkeys. We just taught one monkey how to stop being random.

Taleb probably wouldn't agree with half the arguments in this article. But if infinite AIs wrote infinite articles about his book, one of them would satisfy him—right?