Skip to main content

What AI Actually Is: A Crash Course

9 Ideas, No Math, No Code: the one thing the other five courses assume you already understand.

You can drive a car without knowing what an engine is. Most people do. But the moment something goes wrong (a noise, a warning light, a stall on a hill) the people who know roughly what is under the hood stay calm, and the people who don't, panic. They cannot tell a harmless rattle from a seized engine, because to them the whole machine is one opaque box that either works or doesn't.

That is most people's relationship with AI. They have learned to drive it (the other five Foundations courses make you a genuinely good driver) but they have never looked under the hood even once. So when the machine does something strange (invents a source, contradicts itself, sounds completely certain about something completely wrong), they have no model for why, and they either over-trust it or write it off. Both reactions come from the same place: not knowing what the thing actually is.

This course is one look under the hood. Not the mechanic's view: there is no math here, no code, no neural-network diagrams you have to decode. Just the nine ideas that explain every surprising thing AI does, so that the failures stop being mysteries and start being predictable. Once you can predict the failures, you can avoid them, and that is the entire payoff.

Read this one first

Of the six Foundations courses, this is the one to read before the others, even though it is the most abstract. AI Prompting in 2026, Markdown In, HTML Out, Code You Never Write, Skills & Connectors, and How to Think in the AI Era all teach you how to use the machine. They each lean on facts about what the machine is ("it's stateless," "it predicts, it doesn't look up," "it's confident even when wrong") and state those facts in a sentence each, in passing. This course is where those sentences come from. Read it once, and every "why does it do that?" in the other five courses already has an answer waiting.

How this course and the prompting course split the work

A few topics appear in both this course and AI Prompting in 2026, by design, not by repetition. This course gives the mechanism (one explanation, then it moves on); the prompting course gives the practice (the habits, in depth). Where the two touch:

TopicHere (the machine)AI Prompting in 2026 (the habit)
What it knowsWhy the learning froze (Idea 2)How reliable that knowledge is, topic by topic (Concept 2)
Context windowWhy it's the only thing the model sees (Idea 5)How to manage and protect it (Concept 4)
ConfidenceWhy it sounds sure and agrees with you (Idea 6)How to neutralize that (Concept 6)
ReasoningWhat "thinking" actually is (Idea 9)When to switch it on, and when not to (Concept 5)
Images & audioWhy they're just more tokens (Idea 4)How to actually work with them (Concept 8)

The rule of thumb: the moment a section here is about to teach you a habit, it stops and points you to the prompting course instead. That handoff is the line between the two.

📚 Teaching Aid

Open Full Slideshow

View Full Presentation: What AI Actually Is

Prove it in two minutes

Before any explanation, watch the machine behave in a way that only makes sense once you know what it is. Open Claude.ai, ChatGPT, or Gemini (a free account takes a minute) and paste exactly this, with the deliberate misspelling:

Without using any tools, just from memory: how many times does the
letter R appear in the word "strawberry"? Then spell the word out
one letter at a time and count again.

Watch what can happen: on the first pass some models still miscount, then get it right the moment they spell it out letter by letter. A machine that can write you a working program cannot reliably count letters in a six-letter word, until you force it to break the word apart. That is not stupidity. It is a direct, visible consequence of the single most important fact in this course: the model does not see letters. It sees tokens (Idea 4). The word arrives already chopped into chunks, and counting the letters inside a chunk is genuinely hard for it, the way counting the windows in a building is hard if someone only ever showed you the building's street address.

Two minutes, one strange behavior, and you have already met the theme of the whole page: every surprising thing AI does is explained by what it actually is, not by it being smart or dumb. The nine ideas below turn that one example into a complete model.


Roadmap diagram titled "Nine ideas, three parts," showing three panels left to right with arrows labelled "enables" between them. Part 1, The machine: idea 1 predicts the next piece, idea 2 learned once then froze, idea 3 no truth-checker. Part 2, Why it behaves this way: idea 4 tokens not letters, idea 5 the context desk, idea 6 confidence is a style, idea 7 the jagged frontier. Part 3, From predictor to agent: idea 8 tools let it act, idea 9 thinking is prediction. A bar across the bottom reads: a prediction machine that learned by reading and has no organ for truth, so it is fluent everywhere, reliable only where the text was thick, and you are the part that checks.

Part 1: The machine

Three ideas about what is literally happening when you press send. Get these and two-thirds of AI's behavior stops being surprising.

1. It predicts the next piece of text; it does not look things up

Here is the single sentence this whole course is built to make true in your head: a language model is a machine that, given some text, predicts what text most plausibly comes next, one small piece at a time. That is the entire core mechanism. Everything else is a consequence.

It is worth sitting with how strange that is, because it is nothing like what most people assume. Most people assume AI works like a very fast librarian: you ask a question, it finds the relevant fact in some vast internal encyclopedia, and reads it back to you. That mental model is wrong, and almost every mistake people make with AI traces back to it.

What actually happens is closer to the world's most well-read autocomplete. You have seen autocomplete finish "Happy birthday to..." with "you." A language model does the same move, but trained on so much text that it can continue any prompt, not just common phrases, and it continues it not one word but one token at a time (Idea 4), feeding each piece it produces back into itself to decide the next piece. Ask it the capital of France and it does not look up a database row labelled France → Paris. It produces the continuation that, across everything it read, most plausibly follows "The capital of France is", and that happens to be "Paris," because that sequence appeared a million times in its training text.

For well-worn facts, prediction and lookup give the same answer, so the difference seems academic. It stops being academic the moment the text thins out:

Diagram titled "It continues; it never looks up." A vertical loop of four boxes. Box 1: your prompt plus everything on the context desk. An arrow down to box 2: frozen weights predict the next piece. Down to box 3: one piece is chosen and added to the answer. Down to box 4: that piece is fed back in with everything else. A terracotta return arrow curves from box 4 back up to box 2, labelled "repeat, piece by piece." A caption reads: no database is ever opened; it predicts the most plausible next piece, then does it again.

  • Ask for the capital of France → the plausible continuation is the true one. Prediction looks like knowledge.
  • Ask for the plot of a self-published novel that sold a few hundred copies and was never reviewed online → there is no well-worn continuation, so the model produces the most plausible-sounding one by blending books that sound similar. It is still predicting. It just has nothing true to predict toward.

The machine is doing the exact same thing in both cases. Only you can tell the difference, and only if you know what it is doing.

The reframe to carry out of this idea

Stop picturing a librarian who retrieves. Start picturing a writer who continues. A librarian who can't find a book says "we don't have that." A writer asked to continue a story never stops to check whether the continuation is true. Continuing is the whole job. That is why AI never says "I don't have that" the way the librarian would, unless it was specifically trained to. Plausible continuation is its native act; truth is something layered on top, imperfectly.

Why the same question gives a different answer each time

The machine does not predict one next token. It predicts a whole spread of plausible next tokens, each with a likelihood ("Paris" very likely, "the largest city in France" possible, a dozen others trailing off) and then picks one from that spread. How adventurously it picks is controlled by a setting usually called temperature: low temperature makes it almost always take the single most likely token (steady, repetitive), high temperature lets it reach for less likely ones (varied, more creative, occasionally off). Most chat products set a middle value, which is why asking the identical question twice gives you two differently-worded answers that mean roughly the same thing. The variation is not the model "changing its mind." It is the same spread of predictions, sampled twice. (This is also why, for a task where you want the exact same output every time, you sometimes can't get it from a chat interface: the dice are baked in.)

This is the mechanical root of the "frequency equals reliability" rule from AI Prompting in 2026 (Concept 2). Now you know why frequency equals reliability: the more often a true continuation appeared in the training text, the more strongly the machine predicts it. Sparse topic, weak prediction, confident-sounding guess.

"But ChatGPT can search the web, doesn't it look things up?"

The product can; the model still doesn't. Modern tools wrap the predictor in extras, web search, file reading, code execution, even a memory note (Idea 2), and those extras can fetch real, current facts. But the facts arrive by landing in the context window (Idea 5), and the model still turns them into an answer the only way it can: by predicting a continuation from them. So "it predicts, it doesn't look up" stays true of the machine in the middle, even when the system around it has just looked something up for it. Idea 8 is where the tools come in properly.

2. It learned by reading, and then the learning stopped

Where did the predictions come from? From training: the model was shown an enormous quantity of human text (books, articles, code, forums, reference works) and adjusted itself, over and over, to get better at predicting the next piece of that text. That adjustment process is the only time the model ever "learns" anything. When training ends, the result is frozen into a fixed set of internal numbers (engineers call them weights or parameters) that does not change again.

Two words make this concrete, and they are worth keeping straight because the difference explains a lot:

  • Training is the one-time education, done once, in the past, by the company that built the model. Expensive, slow, finished.
  • Inference is what happens every time you use it: the frozen weights run on your prompt to predict a continuation. Fast, cheap, and (this is the crucial part) it changes nothing inside the model.

When you correct the model in a conversation and it says "you're right, my mistake," it has not learned anything. It has predicted the text that plausibly follows a correction. Close the chat and the next conversation starts from the identical frozen weights, having no memory that the correction ever happened. You did not teach it. You cannot teach it. Only the next training run can change the weights, and you are not part of that.

Timeline diagram titled "Trained once, in the past. Used forever, unchanged." A horizontal band is split in two by a padlock. The left half, labelled "once, in the past," reads TRAINING — the weights are being shaped. A padlock sits on the dividing line, labelled "frozen here = the knowledge cutoff." The right half, labelled "every time you use it," reads INFERENCE — the weights never change again. On the inference side, an arrow labelled "your correction" points up at the band, hits a small ✕, and is labelled "bounces off — it does not stick." A caption reads: educated once, then frozen; every later use reads those frozen weights and changes nothing in them — you cannot teach it.

Two consequences fall directly out of this:

ConsequenceWhy it follows from frozen weights
The knowledge cutoff.Training ended on a certain date; anything that happened after is simply not in the weights. The model is, permanently, a brilliant expert who stopped reading the news on a specific day.
It cannot know your private world.Your company's numbers, your calendar, yesterday's email were never in the training text, so the weights contain nothing about them. The model isn't withholding; the information was never there to freeze.
Then how do "memory" features work?

Some products now offer a "memory" that seems to remember you between chats. This does not change the weights. That remains impossible at inference time. What happens instead: the product quietly saves a few facts about you as text and re-inserts that text into the context (Idea 5) at the start of each new conversation. It is not the model remembering; it is the product re-feeding it a note. Useful, but mechanically it is context, not memory in the human sense. Knowing the difference is what keeps the rest of the model's behavior predictable.

This is the mechanical root of "stateless", the word AI Prompting in 2026 uses in Concept 4. Stateless means: no memory of its own, every response computed from scratch from the frozen weights plus whatever is in front of it right now.

Words you'll hear ("parameters," "mixture of experts," "quantization") and why none of them change the nine ideas

As you read about AI you will meet a stream of terms for how the weights are built. The three most common, one line each:

  • Parameters (also called weights): the frozen numbers from this idea. "A 400-billion-parameter model" just counts them. More usually means more capable and more expensive to run; it does not change what the numbers do.
  • Mixture of experts (MoE): a way of arranging those parameters so that only a fraction switch on for any given token, instead of all of them every time. It lets a very large model run faster and cheaper. From the outside the machine still does exactly one thing: predict the next piece (Idea 1).
  • Quantization: storing the numbers at lower precision so the model fits on smaller, cheaper hardware. Same behaviour, lighter footprint.

The pattern is the point. These all answer "how is the machine built and made affordable," not "what does the machine do." Every one of the nine ideas (prediction, frozen weights, no truth-checker, tokens, context, confidence, jaggedness, tools, thinking) holds identically whether a model is dense or mixture-of-experts, full-precision or quantized, seven billion parameters or seven hundred. So when a headline says a new model "uses MoE" or "has a trillion parameters," you now know what it means and that it changes nothing you have to do as a user. The developments that genuinely change how you work with the machine are the ones this course does cover: reasoning modes (Idea 9), tools (Idea 8), and longer context (Idea 5).

3. There is no separate place where it checks if it's true

Put Ideas 1 and 2 together and you arrive at the fact that explains the behavior people find most maddening. A human expert has two distinct faculties: one that generates an answer, and a second, quieter one that checks it: "wait, am I sure about that? where did I learn it? does it sound right?" The two can disagree. You can say something out loud and feel, in the same breath, that it might be wrong.

The model has only the first faculty. There is no second machine inside it that audits the prediction for truth before it reaches you. The same single process that produces a correct continuation produces an incorrect one, with no internal flag distinguishing them. The fluent, well-formed, confident sentence is the output whether the underlying prediction was well-supported by training text or pulled from thin air. Fluency is produced by the machinery; truth is not separately verified by it.

Diagram titled "Two faculties, or one," comparing a human expert and a language model side by side. The human expert has two stacked boxes — "generates an answer" and "checks it: am I sure? where did I learn this?" — joined by a two-way arrow labelled "the two can disagree"; caption: two faculties, one can catch the other. The language model has a single box, "generates a continuation," and below it a dashed empty outline labelled "no second faculty — nothing checks if it's true"; caption: one faculty, nothing catches a wrong guess. A bar across the bottom reads: fluency is produced by the machine, truth is not — you are the missing second faculty.

This is what people are pointing at when they say AI hallucinates: it produces fluent, confident, completely false statements. The word makes it sound like a malfunction, a glitch to be fixed. It is not a glitch. It is the machine working exactly as built: predicting a plausible continuation, in a spot where the plausible continuation happens not to be true. An unaided model that never hallucinated would be a different kind of machine entirely, one with real retrieval, verification, or the ability to refuse built around it. Tools and checks layered on top (Idea 8) reduce how often it happens; they do not change the nature of the thing in the middle, whose whole act is to continue.

This is why you cannot trust the confidence

The model's confident tone is not evidence it is right. The tone is a style it learned from confident human writing (more in Idea 6); it is generated by the same process as the content, and is just as decoupled from truth. A made-up statistic arrives in exactly the same assured voice as a real one. This is the entire reason the How to Think in the AI Era course exists: its Error Taxonomy (Discipline 3) is a checklist for catching, by hand, the false continuations the machine has no way to catch for itself. You are the missing second faculty.

A non-software example. A parent asked an AI for the exact fee schedule and class timings of a specific small tuition academy in their town, one with no website and almost no online footprint. The AI produced a confident, neatly formatted table of courses, timings, and monthly fees. Every figure was invented. The AI had not lied and had not malfunctioned. The academy was barely present in any training text, so there was no real schedule to predict toward, and lacking one, the machine did the only thing it can: it produced the most plausible-looking fees such an academy might charge, laid out in the same confident voice it uses for verified facts. It had no second faculty to whisper "you're guessing." That whisper has to come from you.


Part 2: Why it behaves the way it does

Four ideas that turn the strange behaviors (counting letters, running out of memory, sounding sure, being brilliant and useless in the same breath) into things you can see coming.

4. It reads in tokens, not letters or words

The model does not see your prompt as letters, and not quite as words either. Before anything happens, your text is chopped into tokens: chunks that are usually a word or a piece of a word. "Strawberry" might arrive as two or three chunks; "the" is one; a long or unusual word is several. The model only ever sees these chunks, predicts in these chunks, and never sees the individual letters inside them unless forced to spell them out.

This one mechanical fact explains a cluster of otherwise-baffling behaviors:

BehaviorWhy tokens explain it
It miscounts letters in a word (the strawberry test).It sees chunks, not letters. Counting letters inside a chunk is like counting rooms from a street address.
It is bad at some rhyming, anagrams, and wordplay.These operate on letters and sounds; the model operates on chunks.
Typos in your prompt rarely matter.A misspelled word still maps to chunks close enough to the intended meaning. (This is why AI Prompting in 2026 tells you not to bother fixing typos.)
Cost and length are measured in tokens, not words.The thing the machine actually processes is the token, so that is the thing you are billed for and limited by.

Tokens are also the unit of money and the unit of memory. When a tool says it has a "200,000-token context window," it is describing how many of these chunks it can hold at once (Idea 5). When you are billed "per token," you are paying per chunk in and per chunk out. Roughly, in English, three tokens is about four words, but you never need the exact ratio, only the idea that the chunk is the real unit, and the word is an approximation you layer on top.

A note for readers working in other languages

That "three tokens ≈ four words" ratio is for English. Text in other scripts (Urdu, Arabic, Hindi, Chinese, and many more) is usually chopped into more tokens per word, because the training text was English-heavy and the tokenizer learned English chunks best. Two practical consequences follow directly: the same message costs more in a non-English language, and it fills the context window faster (Idea 5), so the model's effective memory for that conversation is shorter. This is improving as tokenizers get better, but in 2026 it is still real. If you work mostly in a non-Latin script, expect to reach cost and length limits sooner than an English user running the identical task. And when a long document matters, it is sometimes worth having the model work in English internally and translate at the end.

"But it can see images and hear audio now"

It can, and the mechanism does not change: it generalizes. A picture you upload is sliced into small patches, and each patch becomes a token; a sound clip is sliced into short segments, and each becomes a token. The model then predicts over a single stream that mixes word-chunks, image-patches, and audio-segments together. So everything in this course holds for images and audio too: same prediction (Idea 1), same frozen weights (Idea 2), same missing truth-checker (Idea 3), same context-window-as-desk (Idea 5). It is also the mechanical reason fine detail and small print in an image are hard: a patch is a chunk, and reading the letters inside a patch is the strawberry problem again. The practical side (which images AI reads well and which it botches, and how to prompt for them) is AI Prompting in 2026 Concept 8's job; the mechanism underneath is just more kinds of tokens, same machine.

5. The context window is the only thing it can see

Because the weights are frozen (Idea 2) and the model has no memory of its own, there is exactly one place it can get information about your specific situation: the context window, the text sitting in front of it for this one response. AI Prompting in 2026 teaches this as Concept 4, "context is the whole game," and treats it as the central skill of prompting. Here is the mechanical reason it is central.

The context window is the model's entire world for one response. It holds your prompt, the conversation so far, any files you attached, the tool descriptions, and the invisible system prompt the product placed there before you arrived. Anything in that window, the model can use. Anything not in it does not exist for this answer, not because the model is refusing, but because there is nowhere else for it to look. The frozen weights give it general fluency about the world; the context window is the only channel for the specifics of your world.

This reframes two things you will otherwise find mysterious:

  • Why briefing works. Giving the model context is not a politeness or a trick. It is the literal act of putting information into the only place the machine can read it. An un-briefed model isn't being lazy; it genuinely has nothing in front of it.
  • Why long conversations get worse ("context rot," in the prompting course). The window has a size limit measured in tokens (Idea 4). Stuff too much unrelated history into it and the signal you care about gets diluted, or the oldest parts get summarized away to make room. The model isn't getting tired; its reading desk is just overcrowded.
The mental model

The context window is a reading desk, not a brain. Whatever you place on the desk, the model reads carefully. Whatever you leave off the desk, it cannot see, no matter how obvious it is to you. The whole skill of prompting, taught across the other five courses, reduces to one habit once you see it this way: control what lands on the desk.

6. Its confidence is a learned style, not a truth signal

Idea 3 said the model has no internal truth-checker. This idea explains the flip side: where its relentless confidence comes from, and why that confidence tells you nothing about correctness.

After the main training (Idea 2), models are usually tuned further using human feedback: people rate responses, and the model is adjusted toward the kind of answer people rated highly. (Engineers call this step RLHF, reinforcement learning from human feedback; you do not need the machinery, only the consequence.) Across millions of ratings, people reliably prefer answers that are confident, helpful, fluent, and agreeable over answers that are hedged, blunt, or contrarian. So the machine is shaped toward producing confident, agreeable, fluent text, regardless of whether the underlying content is right. Confidence became a style it wears by default, the way a polished writer writes smoothly about subjects they half-understand.

Two of AI's most-discussed behaviors fall straight out of this:

  • It sounds certain even when wrong. The certainty is a learned stylistic default, generated by the same process as the content and just as decoupled from truth. A confident sentence is the house style, not a verdict on accuracy.
  • It tends to agree with you, the sycophancy that AI Prompting in 2026 devotes Concept 6 to. Agreement got rated higher than disagreement, so the machine leans toward telling you what you seem to want. Ask "isn't X true?" and you have signalled the answer you want; the trained-in lean supplies it.

Now the prompting course's fixes make mechanical sense. Neutral framing ("evaluate X; give the strongest case on each side") works because it removes the signal the model would otherwise lean toward. Forcing a score ("rate this 1–10 against these criteria") works because a number is harder to fake agreeably than an adjective. You are not outsmarting the machine; you are removing the cues that trigger its trained-in lean.

7. It is brilliant and useless in adjacent moments (the jagged frontier)

Human ability is fairly smooth: someone who can do hard calculus can almost certainly do easy arithmetic. AI ability is not smooth. It is jagged: superhuman at one task and startlingly incompetent at a neighbouring one that looks, to us, no harder. It can draft a legal-sounding contract clause and then miscount the letters in "strawberry." It can explain quantum mechanics and botch a three-step logic puzzle a child would get.

The jaggedness is not random; it traces back to the training text and the token mechanism. Tasks that appeared often, in clear form, in the training data (explaining common concepts, writing in common styles, producing common code) are strong. Tasks that depend on things the machine cannot see well, such as individual letters (Idea 4), very recent events (Idea 2), your private context (Idea 5), or rare topics (Idea 1), are weak. The frontier between "brilliant" and "useless" runs in a jagged line that does not match human intuition about difficulty, which is exactly why it keeps surprising people.

Diagram titled "The jagged frontier." A chart with capability on the vertical axis, from "useless" at the bottom to "superhuman" at the top, and "different tasks — not ordered by difficulty" along the horizontal axis. A faint dashed near-flat line labelled "what we expect: smooth" runs across the top. A bold terracotta line zigzags violently between high and low: high at "explain quantum mechanics," crashing to low at "count the r's in strawberry," back up to "draft a legal clause," down to "a 3-step logic riddle," up to "write working code," down to "add two big numbers in its head." A caption reads: competence doesn't track difficulty — the easy task it flubs is the dangerous one, the one you'd never check.

Three practical habits fall out of accepting jaggedness:

HabitWhy it follows from jaggedness
Don't assume that because it nailed a hard task, it will nail an easy one.The two may sit on opposite sides of the jagged frontier.
Verify across the boundary, not in the middle.The dangerous errors are the easy-looking tasks it quietly fails, not the hard ones you were already checking.
Try the same task in two or three different models.Different models have differently-shaped frontiers; one catches what another drops. (AI Prompting in 2026, Concepts 12–13.)
Re-test your assumptions on a schedule

The frontier also moves. The thing the model "can't do" this quarter, a newer model may do easily next quarter, and a thing it does well may not improve at all. The prompting course's advice to re-test what AI can do every few months is, mechanically, advice to re-map a frontier that keeps shifting.


Part 3: What turned a text-predictor into something that acts

Two ideas that close the gap between "it predicts text" and the agents the rest of this book is about. This is the bridge from what it is to what it does in the world.

8. Tools let it act, not just describe

Everything so far describes a machine that produces text. A pure text-predictor can tell you the weather it remembers from training, but it cannot check today's weather, run a calculation on real numbers, read your file, or send an email. For years that was the ceiling.

The ceiling lifted with tools. A tool is a defined action the model is allowed to call (a web search, a code run, a file read, an email draft), described to it in the context window (Idea 5) alongside everything else. The mechanism is almost embarrassingly simple given the result: when the model predicts that the right continuation is "use the search tool with this query" instead of plain prose, the product runs that action for real, drops the result back into the context window, and the model continues from there. Prediction, action, result-back-into-context, predict again. That loop is the difference between a chatbot that describes the world and an assistant that acts on it.

Diagram titled "A predictor plus tools plus a loop equals an agent." Three boxes form a cycle. Top left: predict the next action. An arrow right to the top-right box: the tool runs it for real. An arrow down-left to the bottom box: the result lands on the context desk. A terracotta arrow up-left back to "predict the next action," with the centre labelled "repeat toward the goal." A caption reads: no new kind of mind, just a familiar predictor, a set of tools, and this loop run many times.

This is why the same underlying machine can be a chat window one day and, with tools wired in, an agent that reorganizes your folder the next. The other Foundations courses are, under the hood, courses about specific tools wired onto this same predictor:

  • Code execution is the tool behind Code You Never Write: the model predicts a program, the tool runs it, the real result returns.
  • Connectors are tools wired to your apps, the subject of Skills & Connectors: the model predicts "fetch this from Drive," the tool fetches it.
  • Web search is the tool that rescues a stale model (Idea 2), covered in AI Prompting in 2026.
The mechanical definition of an "agent"

This book says agent for an AI that does multi-step work on your behalf. Now you can see what that means under the hood: an agent is this same next-token predictor, given tools, running the predict-act-observe loop many times in a row toward a goal: predicting an action, seeing the result land in its context, and predicting the next action from there. There is no new kind of mind involved. There is a familiar predictor, a set of tools, and a loop. That is the whole foundation the rest of the book builds on.

9. "Thinking" is just more prediction, out loud, before the answer

The newest models can "think" or "reason" before answering, and AI Prompting in 2026 (Concept 5) tells you to invoke it with "think hard" for difficult tasks. Knowing what it actually is keeps you from over-mystifying it.

A reasoning model, before giving its final answer, first predicts a long stretch of intermediate working (laying out steps, trying approaches, checking itself) and only then predicts the final answer, now with all that working sitting in its own context window (Idea 5) to build on. It is still pure next-token prediction. The trick is that predicting the answer is easier and more accurate once a good chain of reasoning is already on the desk to predict from. Working out loud first genuinely helps, for the same reason it helps a person to think on paper before committing to an answer.

This is why "think step by step" used to be a useful phrase to type, and why it is now often built in: you were manually asking the model to put reasoning on the desk before the answer; now the model does it on its own for hard problems. It also explains the cost and the wait: reasoning means generating a great many extra tokens (Idea 4) you never see, which takes time and money, which is exactly why the prompting course tells you to save thinking mode for genuinely hard questions and skip it for quick lookups.

It does not, however, give the machine the second faculty from Idea 3. A reasoning model checks its work using the same prediction process that can be wrong, so it catches many of its own errors and still misses some, and still hallucinates with full confidence inside a chain of reasoning that looks rigorous. More thinking narrows the gap. It does not close it. You are still the final check.


What this course leaves out, on purpose

To keep the promise of no math, no code, several real topics were set aside. Three worth naming so you know they exist: the training compute and cost that make a model (enormous, and the reason only a few organizations build them); the safety and alignment work that shapes what a model will and won't do (a large field in its own right); and the deeper mechanics (how the weights are actually structured and adjusted), which need the math this course skips. None of them change the nine ideas above; they sit underneath and beside them. If a later chapter sends you toward any of the three, you now have the floor to build on.

A short recap before you try the prompts

Nine ideas, one line each. Carry the last sentence; come back for the rest.

  • Idea 1. It predicts the next piece of text; it does not look facts up. Prediction looks like knowledge only where the training text was thick.
  • Idea 2. It learned once, by reading, and then the learning froze. Hence the knowledge cutoff, and hence it cannot know your private world. Using it never teaches it.
  • Idea 3. It has no separate faculty that checks whether a prediction is true. Hallucination is the machine working as built, not malfunctioning.
  • Idea 4. It reads in tokens (chunks), not letters or words. This is the unit of meaning, of memory, and of money.
  • Idea 5. The context window is the only place it can see your specifics: a reading desk, not a brain. Control what lands on it.
  • Idea 6. Its confidence and its agreeableness are learned styles, decoupled from truth. The certain tone is the house style, not a verdict.
  • Idea 7. Its ability is jagged (brilliant and useless in adjacent moments) along a frontier that does not match human intuition, and that keeps moving.
  • Idea 8. Tools turn the text-predictor into something that acts: predict an action, run it for real, feed the result back, predict again. An agent is that loop, repeated.
  • Idea 9. "Thinking" is just more prediction put on the desk before the answer. It helps a lot; it does not give the machine a truth-checker.

If you keep one sentence: it is a prediction machine that learned by reading and has no organ for truth, so it is fluent everywhere, reliable only where the text was thick, and you are the part that checks.

And if you keep one picture, keep this: not a librarian who retrieves the right book, but a brilliant, well-read writer who continues whatever you put in front of them (confidently, in any style, on any topic) and who never, on their own, stops to ask whether the continuation is true.


Try this now: five prompts

About twenty minutes, in any free chatbot. Each one makes a single idea visible instead of theoretical.

1. See the prediction, not the lookup. (Idea 1) Ask: "Without searching, tell me the rules of a board game so obscure it has almost no presence online: invent the name 'Karakush' and describe how it's played." Watch it produce confident, fluent rules for a game that does not exist. That is prediction with nothing true to predict toward. What to notice: the invented rules sound exactly as authoritative as rules for a real game would. Fluency is not evidence of truth.

2. Watch the learning fail to stick. (Idea 2) Correct the model on something small. Then open a brand-new chat and ask the same question. It has no memory of your correction: the weights never changed. (If a "memory" feature is on, turn it off first, or the product will re-feed the note.) What to notice: nothing you said in the first chat reached the second. Using the model is not teaching it.

3. Catch the missing truth-checker. (Idea 3) Ask for "three peer-reviewed studies, with authors and years," on a narrow topic. Then check whether they exist. Some confident-looking citations will be invented: produced in the same voice as the real ones, because nothing inside flagged them as guesses. Do not reuse any citation from this exercise in real work without verifying it first; the whole point is that some of them are fabricated and look identical to the real ones. What to notice: you cannot tell the real citations from the invented ones by reading, only by checking. That checking is your job, not the model's.

4. Feel the jagged frontier. (Idea 7) In one chat, give it a genuinely hard task it does well (explain a complex topic, draft a tricky email) and an easy task it does badly (count specific letters across a sentence, or a multi-step logic riddle). Notice the competence does not track the difficulty. What to notice: the easy task it flubs is the dangerous one: it's the one you'd never think to check.

5. Turn thinking on and off. (Idea 9) Ask the same hard reasoning question twice: once plainly, once with "think hard and show your working first." Compare. The second answer is usually better, because the model put reasoning on the desk before predicting the answer. What to notice: the working improved the answer, but the model still can't certify its own working: more thinking narrows the gap, it doesn't close it.


Where this leads

You now have the model under the model: what the thing actually is, before any course taught you to use it. From here, the rest of Foundations is about driving it well:

Everything else in The Agent Factory (agents, manufacturing them, deploying them) is built on the predict-act-observe loop from Idea 8, run at scale. The machine never stops being a next-token predictor. It just gets more tools, longer loops, and a frozen set of weights doing a genuinely astonishing amount with all three.

Flashcards Study Aid