How Machines Think — Parag Chandiwal

Act I — What it is

You see words.
It sees debris.

Before the model can process text, it breaks it into smaller pieces called tokens.

A typical tokenizer uses a vocabulary on the order of ~50,000 text pieces: whole words, subwords, punctuation, and spacing patterns.

Under the hood:
a transformer.

Your tokens pass through a pipeline. Each stage transforms them — building understanding, layer by layer.

"hello"→[15339]

✂️

Tokenize

→

[0.23, -0.7, 1.1…]

📐

Embed

→

Q·Kᵀ / √d

👁

Attention

→

ReLU(W·x+b)

⚡

Feed-Fwd

→

×96 layers

🔁

Repeat

→

P("world")=.31

🎯

Softmax

// at each layer, every neuron does:
output = activate( w₁·x₁ + w₂·x₂ + … + bias )

// multiply. add. repeat. billions of times.

No magic. Just math at inhuman speed.

How does it know which words matter most?

The breakthrough:
Attention.

In 2017, a paper crystallized the modern breakthrough: let the model learn which other tokens matter for the current one.

Tap a word to see what it pays attention to

In this toy example, tap “it”. You will see strong attention to “cat” — the kind of pattern real models learn at much larger scale.

Vaswani et al., "Attention Is All You Need," NeurIPS 2017

So how does it generate a response?

Watch it think.
One token at a time.

At each step, it scores the next-token options, selects or samples one, appends it, then repeats.

The meaning of life is

It does not store a hidden definition of "the meaning of life."
It has learned which continuations are statistically plausible in context.

What decides whether it plays it safe… or gets creative?

Same brain.
Different temperature.

Temperature reshapes the probability distribution. Drag the slider. Watch the output transform.

0.1

Focused 0.0 2.0 Wild

> a question that has been debated by philosophers for centuries.

Same weights. Same architecture. The only difference is how sharply it favors its highest-probability guesses.

How many numbers does it take to simulate understanding?

The weight of
knowledge.

Every "parameter" is a learned number. Frontier models are built from hundreds of billions of them.

GPT-2 (2019)

1.5B

GPT-3 (2020)

175B

Llama 3 (2024)

405B

Human brain

100T synapses

Public parameter counts exist for some open models. Many frontier closed models do not publish them. Hover the grid below — each cell is a toy weight.

How does a machine "learn"?

Learning by
getting it wrong.

Predict the next token. Measure the error. Adjust the weights. Over time, gradient descent pushes the model toward settings that make better predictions.

Start (random)Optimal weights

Epoch: 0 — Loss: 4.20

After pretraining, you can fine-tune or specialize a model on narrower data. Medical notes → medical assistant. Code → coding assistant. Same base machinery. Narrower expertise.

Then human preference training pushes it toward responses people judge more useful, safer, or better aligned.

Adapt.

Act II — What it changes

The displacement question.

Research suggests most knowledge workers will see some tasks affected, while a smaller share may see a large fraction of tasks exposed.

Eloundou et al., arXiv:2303.10130, Mar 2023

These bars are illustrative task exposure, not layoff probabilities.

Translators

very high

Tax preparers

very high

Copy editors

high

Legal assistants

high

Programmers

medium

Teachers

partial

Plumbers

low

Notice the pattern?

The first pressure lands on routine cognitive work: tasks that are symbolic, repetitive, and easy to evaluate.

The plumber's job is safer than the programmer's.
Let that sit.

Not replacement.
Reconfiguration.

ATMs didn't kill bank tellers — the number grew. But what tellers did changed.

Bessen, "How Computer Automation Affects Occupations," BU, 2016

MIT found workers using AI completed tasks 37% faster with 20% higher quality. The gap between top and average performers shrank.

Noy & Zhang, Science, 2023

In many fields, the nearer-term risk is not total replacement. It is a widening gap between people who use AI well and people who do not.

Large institutions expect both job creation and job displacement. The transition may be net positive in aggregate while still being brutal for people caught in the middle of it.

World Economic Forum, "Future of Jobs Report 2023"

The question isn't whether your job changes.
It's whether you change with it.

Act III — What comes next

Where is all this heading?

The coming flood.

The deeper change may be volume. If AI becomes cheap, embedded, and ambient, machine-generated language stops feeling like chatbot output and starts feeling like infrastructure.

billions

of tokens generated daily already

thousands

of AI words a heavy user may see in a day

pages

of machine text many people already consume

cents

or less for a single model response

Not a forecast. A plausible adoption curve.

2026 — Now

AI is still a tool you invoke on purpose. You type. It answers.

2030

It could draft your email, calendar, and code in the background.

2035

Agents could start planning and executing longer chains of work.

2045

For many people, AI could feel less like an app and more like infrastructure.

2060

Machine-to-machine language may outweigh what humans read directly.

2076

A lifetime that starts today could see an extraordinary amount of machine-generated language become ordinary infrastructure.

What makes a human
valuable?

For centuries, we defined value through productivity. What you could make, calculate, write.

If a machine can write a legal brief in 3 seconds, diagnose a scan in 200ms, compose a symphony in 12 —

then the things that make you you aren't the things you produce.

Whether that counts as understanding is still debated.
What models clearly do is predict extremely well.
They do not feel, ache, or fear in the human sense.

One more thing.

Some of the prose you just read was written with AI assistance.

Some by a human alone.

Maybe you could tell. Maybe not.
That uncertainty is part of the point of this page.

The most interesting thing about artificial intelligence
is what it reveals about natural intelligence.

Now go ask it something impossible.
And notice what you feel when it answers.

Start over Explore: What You Are → ← Human Signals