Act I — What it is
You see words.
It sees debris.
Before the model can process text, it breaks it into smaller pieces called tokens.
A typical tokenizer uses a vocabulary on the order of ~50,000 text pieces: whole words, subwords, punctuation, and spacing patterns.
Under the hood:
a transformer.
Your tokens pass through a pipeline. Each stage transforms them — building understanding, layer by layer.
"hello"→[15339]
✂️
Tokenize
[0.23, -0.7, 1.1…]
📐
Embed
Q·Kᵀ / √d
👁
Attention
ReLU(W·x+b)
Feed-Fwd
×96 layers
🔁
Repeat
P("world")=.31
🎯
Softmax
// at each layer, every neuron does:
output = activate( w₁·x₁ + w₂·x₂ + … + bias )

// multiply. add. repeat. billions of times.
No magic. Just math at inhuman speed.
How does it know which words matter most?
The breakthrough:
Attention.
In 2017, a paper crystallized the modern breakthrough: let the model learn which other tokens matter for the current one.
Tap a word to see what it pays attention to
In this toy example, tap “it”. You will see strong attention to “cat” — the kind of pattern real models learn at much larger scale.
Vaswani et al., "Attention Is All You Need," NeurIPS 2017
So how does it generate a response?
Watch it think.
One token at a time.
At each step, it scores the next-token options, selects or samples one, appends it, then repeats.
The meaning of life is
It does not store a hidden definition of "the meaning of life."
It has learned which continuations are statistically plausible in context.
What decides whether it plays it safe… or gets creative?
Same brain.
Different temperature.
Temperature reshapes the probability distribution. Drag the slider. Watch the output transform.
0.1
Focused 0.0 2.0 Wild
> a question that has been debated by philosophers for centuries.
Same weights. Same architecture. The only difference is how sharply it favors its highest-probability guesses.
How many numbers does it take to simulate understanding?
The weight of
knowledge.
Every "parameter" is a learned number. Frontier models are built from hundreds of billions of them.
GPT-2 (2019)
1.5B
GPT-3 (2020)
175B
Llama 3 (2024)
405B
Human brain
100T synapses
Public parameter counts exist for some open models. Many frontier closed models do not publish them. Hover the grid below — each cell is a toy weight.
How does a machine "learn"?
Learning by
getting it wrong.
Predict the next token. Measure the error. Adjust the weights. Over time, gradient descent pushes the model toward settings that make better predictions.
Start (random)Optimal weights
Epoch: 0 — Loss: 4.20
After pretraining, you can fine-tune or specialize a model on narrower data. Medical notes → medical assistant. Code → coding assistant. Same base machinery. Narrower expertise.
Then human preference training pushes it toward responses people judge more useful, safer, or better aligned.
Adapt.
Act II — What it changes
The displacement question.
Research suggests most knowledge workers will see some tasks affected, while a smaller share may see a large fraction of tasks exposed.
Eloundou et al., arXiv:2303.10130, Mar 2023
These bars are illustrative task exposure, not layoff probabilities.
Translators
very high
Tax preparers
very high
Copy editors
high
Legal assistants
high
Programmers
medium
Teachers
partial
Plumbers
low
Notice the pattern?
The first pressure lands on routine cognitive work: tasks that are symbolic, repetitive, and easy to evaluate.
The plumber's job is safer than the programmer's.
Let that sit.
Not replacement.
Reconfiguration.
ATMs didn't kill bank tellers — the number grew. But what tellers did changed.
Bessen, "How Computer Automation Affects Occupations," BU, 2016
MIT found workers using AI completed tasks 37% faster with 20% higher quality. The gap between top and average performers shrank.
Noy & Zhang, Science, 2023
In many fields, the nearer-term risk is not total replacement. It is a widening gap between people who use AI well and people who do not.
Large institutions expect both job creation and job displacement. The transition may be net positive in aggregate while still being brutal for people caught in the middle of it.
World Economic Forum, "Future of Jobs Report 2023"
The question isn't whether your job changes.
It's whether you change with it.
Act III — What comes next
Where is all this heading?
The coming flood.
The deeper change may be volume. If AI becomes cheap, embedded, and ambient, machine-generated language stops feeling like chatbot output and starts feeling like infrastructure.
billions
of tokens generated daily already
thousands
of AI words a heavy user may see in a day
pages
of machine text many people already consume
cents
or less for a single model response
Not a forecast. A plausible adoption curve.
2026 — Now
AI is still a tool you invoke on purpose. You type. It answers.
2030
It could draft your email, calendar, and code in the background.
2035
Agents could start planning and executing longer chains of work.
2045
For many people, AI could feel less like an app and more like infrastructure.
2060
Machine-to-machine language may outweigh what humans read directly.
2076
A lifetime that starts today could see an extraordinary amount of machine-generated language become ordinary infrastructure.
What makes a human
valuable?
For centuries, we defined value through productivity. What you could make, calculate, write.
If a machine can write a legal brief in 3 seconds, diagnose a scan in 200ms, compose a symphony in 12 —
then the things that make you you aren't the things you produce.
Whether that counts as understanding is still debated.
What models clearly do is predict extremely well.
They do not feel, ache, or fear in the human sense.
One more thing.
Some of the prose you just read was written with AI assistance.
Some by a human alone.
Maybe you could tell. Maybe not.
That uncertainty is part of the point of this page.
The most interesting thing about artificial intelligence
is what it reveals about natural intelligence.
Now go ask it something impossible.
And notice what you feel when it answers.
Start over Explore: What You Are → ← Human Signals