How Large Language Models Actually Work
2 / 7You don't need to be a machine learning engineer to understand how AI writing tools work. But having a mental model of what's happening under the hood will make you a far more effective user — and a much sharper critic.
Start With the Name
"Large Language Model" — let's unpack each word:
- Large — These models are trained on enormous datasets (billions of web pages, books, articles) and contain billions of numerical parameters.
- Language — They work specifically with language: predicting, generating, and understanding text.
- Model — It's a mathematical function — a very complex one — that maps inputs (your prompt) to outputs (its response).
Training: Where the Knowledge Comes From
Before an LLM can answer a single question, it goes through a training process that might take months and cost millions of dollars in computing power.
During training, the model is shown massive amounts of text and learns one deceptively simple task:
Predict the next word (or token).
That's it. Over trillions of examples, the model adjusts its internal parameters to get better and better at this prediction. In doing so, it absorbs patterns of grammar, facts, reasoning styles, and writing conventions from all the text it processed.
The result is a model that can complete sentences, answer questions, write code, and generate creative text — all as emergent behaviours from that one training objective.
Tokens, Not Words
LLMs don't actually process whole words. They work with tokens — chunks of text that might be a whole word, part of a word, or a punctuation mark. For example:
"artificial intelligence" → ["art", "ific", "ial", " intel", "lig", "ence"]- Why does this matter? It means:
- The model's "vocabulary" is finite (typically 50,000–100,000 tokens)
- Unusual words or names may be split awkwardly
- Context windows (how much text the model can "see" at once) are measured in tokens, not words
The Context Window
Every time you interact with an LLM, it processes your entire conversation history within what's called a context window — a limit on how many tokens the model can consider at once.
Think of it like working memory. If your conversation exceeds the context window, earlier messages fall out of scope and the model loses track of them.
Modern context windows range from around 8,000 tokens to over 1 million tokens depending on the model.
How a Response Is Generated
When you send a prompt, the model doesn't look up a pre-written answer. It generates a response token by token, each time calculating probabilities for what should come next based on everything before it.
- 1.Your prompt is converted to tokens
- 2.The model processes those tokens through its neural network layers
- 3.It outputs a probability distribution over all possible next tokens
- 4.A token is selected (with some randomness, controlled by a "temperature" setting)
- 5.That token is added to the sequence, and the process repeats
This is why LLMs can sometimes produce fluent-sounding but factually wrong content — the model is optimising for plausibility at each step, not accuracy.
What LLMs Don't Have
Understanding what's missing is just as important as understanding what's there:
- No real-time knowledge — The model's knowledge has a training cutoff date. It doesn't browse the web (unless given a specific tool to do so).
- No persistent memory — Each conversation starts fresh (unless memory features are explicitly built in).
- No ground truth verification — The model can't check whether a fact is true; it can only assess whether a statement pattern-matches to what it saw during training.
- No genuine understanding — LLMs are sophisticated pattern-matchers. They simulate understanding, but there is ongoing debate about whether any deeper comprehension occurs.
The Fine-Tuning Layer
The raw pre-trained model is rarely what you interact with. Most consumer AI tools apply a layer of fine-tuning — additional training on curated datasets to make the model:
- Follow instructions reliably
- Refuse harmful requests
- Adopt a particular tone or persona
- Focus on specific domains (e.g., coding, customer support)
This is why ChatGPT, Claude, and Gemini all feel different despite being built on similar foundational architectures.
What This Means for Content Quality
Armed with this knowledge, you can now ask better questions about any AI-generated content:
- Is this topic well-represented in publicly available text, or might the model have limited training data here?
- Could this claim be a plausible-sounding fabrication (called a "hallucination")?
- Is the model reflecting the biases present in its training data?
- How recent does this information need to be, and when was this model trained?
These are the questions a thoughtful content reviewer asks — and we'll dig deeper into each of them in the lessons ahead.
0/7 complete