Foundational Concepts — AI Base Knowledge

Tokens & tokenization

Models read tokens, not words.

Tokens are sub-word chunks produced by a tokenizer, each mapped to an integer ID. They are the unit of both pricing and context limits — and many odd model behaviors with spelling or math trace straight back to tokenization.

📏

The rule of thumb

~1 token ≈ 4 English characters, or about ¾ of a word.

🌍

Non-English costs more

Other languages often use 1.5–2× the tokens for the same meaning — and you pay per token.

⚠️

Tokenizers change bills

Opus 4.7's tokenizer reportedly produces up to 35% more tokens for the same text — same rate card, higher bill.

Context windows

Bigger windows, but not bigger isn't free.

The context window is the maximum tokens (input + output) a model can consider at once. 1M-token windows are common in 2026 — but two effects mean you must curate context, not dump everything in.

Context window sizes (2026)

Approximate maximum tokens, per source material

Claude Opus/Sonnet 4.6+

1M

Gemini 3.x

1M

GPT-5.2

~400K

EFFECT #1

Lost in the middle

Models recall the beginning and end of context better than the middle (Liu et al., 2023; TACL 2024). Put key constraints where they'll be seen.

EFFECT #2

Context rot

Output quality degrades as input grows — even before the window is full (Chroma Research, 2025, across 18 models).

Sampling parameters

Temperature dials randomness.

Temperature controls how varied the output is; top-p (nucleus sampling) limits choices to the smallest set of tokens whose probabilities sum to p. The practical rule is simple.

🎯

Low temperature

Deterministic and focused. Use it for code, structured output, and anything that must be correct and repeatable.

🎨

High temperature

Creative and varied. Use it for brainstorming, naming, and idea generation where you want range.

The 2026 landscape

A specialization market.

No single model wins everything. The 2026 best practice is model routing — match the cheapest model that clears your quality bar to each task.

Family	Tiers	Strength
Anthropic Claude	Opus · Sonnet · Haiku	Agentic coding
OpenAI GPT-5.x	Instant · Thinking · Pro	Abstract reasoning
Google Gemini 3.x	Pro · Flash · Deep Think	Long context
Open-weight	Llama · Qwen · DeepSeek · Mistral	Cost & privacy

A 70/20/10 Haiku / Sonnet / Opus split can cut API costs by more than half versus all-Sonnet.

Routing the cheapest sufficient model to each task is the dominant cost+quality pattern.

Reasoning / thinking models

They think before they answer.

Reasoning models spend extra thinking tokens on an internal scratchpad before responding (OpenAI o-series, Claude extended thinking, Gemini Deep Think). Use them for hard, multi-step problems — but know the trade-offs.

🧮

Best for hard problems

Complex debugging, architecture, and math — anything multi-step benefits most.

⏱️

Latency & cost

3–15s before the first visible token, and thinking tokens bill at output rates. A task can cost ~9× the bare answer.

🚫

Skip the step-by-step prompt

Telling a reasoning model to think step by step is redundant — it already thinks internally.

Hallucinations & reliability

Sometimes it sounds right but is wrong.

A hallucination is when the model gives an answer that sounds confident but is actually false or made up. The fix is to give it real sources to work from and ask it to show where each answer came from. One support team cut wrong answers from 19% down to about 2%, then under 1%.

Wrong-answer rate after adding sources

Real support team, from the source material

No sources

19%

With sources

~2%

+ source check

<1%

Don't ship an answer as fact in production unless it's backed by a real source.

Knowledge cutoffs

Why models need tools.

A model only has what was in its training data up to its cutoff date — it has no awareness of anything after. That single fact is the core justification for everything that follows on this site.

Tool use

Let the model call functions to do things now.

Web search

Fetch current information at query time.

RAG

Ground answers in your own documents.

MCP

Connect to current, authoritative external data.