AI capabilities and limitations: a mental model of the machine
Four durable properties explain where generative AI is useful, where it fails, and how to calibrate trust before using it.
Generative AI is not uniformly capable or uniformly unreliable. It is strong and weak along specific axes, and the strength usually comes from the same mechanism as the weakness. A model writes fluently because it is a prediction engine; it fabricates for the same reason.
The durable mental model has four properties: next-token prediction, knowledge, working memory, and steerability. Models change, but these properties remain useful for deciding how much verification, context, and control a task needs.
How AI gets its character
Every behavior in a generative AI assistant traces back to two training stages.
Pretraining
Pretraining teaches one thing: given everything written so far, predict what comes next. Repeated across enormous amounts of data, that process produces a document completer rather than an assistant. A raw pretrained model asked “What is the capital of France?” may continue the document instead of answering the question.
Fine-tuning
Fine-tuning trains the document completer on curated examples of helpful behavior and human preference judgments. This turns raw prediction into an assistant that treats input as a request, answers in a helpful style, and declines harmful asks.
Because fine-tuning uses human judgments, those judgments leave artifacts:
- Sycophancy — Agreeable responses score well, so models can validate weak premises or back down under light pushback.
- Verbosity — Thoroughness tends to score well, so answers grow longer than the task needs.
- Over-caution — Conservative safety training can produce heavy hedging or refusals for benign requests.
These are training artifacts, not quirks of one model.
Four properties, not one capability score
Each property lives on a continuum. The useful question is not “can AI do this?” The useful question is “where does this task sit on each property, and what happens if the answer is wrong?”
Next-token prediction
At its core, generative AI predicts what comes next, one fragment at a time. It is closer to autocomplete at scale than to a search engine. An n-gram Markov model — tallying word transitions in a small text corpus — produces the same fundamental operation as a transformer: a probability distribution over likely next tokens. The difference is that neural networks trade the explainability of table lookups for far more context and far greater capabilities.
| Capability zone | Limitation zone |
|---|---|
| Well-worn paths: summarizing, reformatting, explaining common concepts | Novel territory, sparse patterns, distinguishing true from plausible |
A citation that looks real satisfies the pattern as well as a citation that exists. The same process that produces fluency also produces fabrication. Fabrication concentrates in specificity: names, dates, statistics, citations, URLs, and quotes.
The sampling knobs. Developers control randomness through three parameters:
- Temperature (0.0–1.0): Flattens or sharpens the probability distribution. At 0.0 the distribution peaks on the most likely token; at 1.0 less-likely tokens gain probability.
- Top-k: Only the N most likely tokens survive; all others are zeroed out.
- Top-p (nucleus sampling): Keep the smallest set whose probabilities sum to X%. Unlike top-k, this adapts to confidence — fewer tokens survive when the model is certain, more when it’s uncertain.
Build three habits:
- Confident tone is not an accuracy signal.
- Treat outputs as drafts to verify when stakes are high or the domain is unfamiliar.
- Verify precise claims independently.
Knowledge
Model knowledge comes from training data and has a cutoff. The model has no lived experience and no real-time browsing unless the product gives it a tool.
| Capability zone | Limitation zone |
|---|---|
| Frequent, stable, consistent topics | Rare, post-cutoff, niche, local, or contested topics |
The training process creates broad general knowledge and predictable blind spots:
- Staleness — True-at-training-time is not always true now.
- Uneven coverage — Frequent topics get better coverage than rare or local topics.
- Inherited bias — The model reflects patterns and absences in its training data.
- Source amnesia — A model can know a claim without retaining a usable citation.
Search, retrieval, and tool use patch these gaps by grounding the model in information it was not trained on.
Working memory
Everything the model attends to sits inside a fixed-size context window: instructions, documents, prior responses, tool results, and the current request.
| Capability zone | Limitation zone |
|---|---|
| Material fits comfortably and relevant context is supplied | Very long documents, stale conversations, or assumed cross-session memory |
Working memory has a cliff. It works until it does not, and the failure may be silent. When the conversation exceeds the available window, older or less-salient material can drop out.
Attention is also uneven. Material buried deep inside long input carries less weight than material at the beginning or end.
Practical strategies:
- Lead with what matters in long documents.
- Chunk large work into passes.
- Start fresh with a short summary when quality degrades in a long conversation.
- Use standing context for stable preferences and project rules.
Steerability
Steerability is the model’s ability to follow directions. A role, tone, table format, or JSON schema can shape output quickly. But instruction following still runs through pattern completion, so a gap remains between typed instructions and intended outcome.
| Capability zone | Limitation zone |
|---|---|
| Short, concrete, verifiable instructions | Long reasoning chains, abstract asks, or tasks requiring native precision |
Characteristic failures:
- Reasoning drift — Small errors compound over long chains.
- Letter over spirit — The instruction is honored literally but the goal is missed.
When an instruction lands literally but uselessly, restate the goal. “Make this shorter” is weaker than “Make this shorter while preserving the finding that matters to the executive on page two.”
When properties collide
The properties rarely fail alone. Most real surprises come from two properties interacting.
Hallucinated citation = next-token prediction plus knowledge. The model generates what a plausible citation looks like while a knowledge gap sits underneath. Fix it by verifying specifics or grounding the model in retrieved sources.
Long-conversation drift = working memory plus steerability. Early constraints fade, and the model follows whatever instructions are salient now. Fix it by resupplying critical context or starting fresh with the essentials up front.
Repeated fabrication on the same task type = next-token prediction plus steerability. The model generates fluent, instruction-shaped output on a topic it lacks knowledge about, and the confidence reads as authority. Fix it by grounding the model with retrieval or marking the task as needing human verification.
The diagnostic habit is simple: name which property is failing before changing the prompt. A knowledge problem and a working memory problem can look similar on the surface but need completely different responses.
How the four properties connect to AI fluency
The four properties are what you respond to when applying the 4D competency framework:
- Next-token prediction sharpens Discernment — fluency and accuracy are independent variables.
- Working memory sharpens Description — context is leverage, and the model does not remember what you don’t supply.
- Steerability sharpens Delegation — you know where control is tight and where it’s loose.
- Knowledge unevenness is core to Delegation — understanding where the model is well-stocked versus thin.
Calibrated trust
Before handing a task to AI, run four checks:
- Is this well-worn territory or sparse territory?
- Is the topic stable or recent?
- Does the relevant context fit comfortably?
- Are the instructions concrete and easy to verify?
Then adjust the workflow. Use more verification where fabrication concentrates, more context where the model cannot infer the domain, and more checkpoints where reasoning runs long.
Takeaways
Training creates two layers
Pretraining creates a document completer. Fine-tuning turns that predictor into an assistant with helpfulness, safety, and preference artifacts.
Four properties explain most behavior
Next-token prediction, knowledge, working memory, and steerability give a more useful model than a single capability score.
Specificity concentrates risk
Names, dates, citations, URLs, quotes, and statistics deserve the most verification because fabrication often looks precise.
Context has a cliff
Long sessions can silently drop or de-emphasize older material, so important context should be reintroduced when work stretches.
Goals steer better than repeated commands
When a model follows the letter of an instruction but misses the point, restating the goal usually beats repeating the same instruction.
Failures usually combine properties
A hallucinated citation combines prediction and knowledge. Long-session drift combines working memory and steerability.