The most visceral shift of 2026: tools that plan, act, observe, and iterate until the job is done.
An agent is a model placed inside infrastructure that gives it tools, memory, and an action loop. A long system prompt alone does not make an agent — the loop and tool access do.
It can call functions, run code, query systems.
It carries context across steps within a task.
It repeats until the goal is met — not a one-shot reply.
The canonical cycle, rooted in the Russell & Norvig agent definition (1995) and made practical for LLMs by the ReAct framework (Yao et al., 2022), which showed +34% on ALFWorld from interleaving reasoning and action. Typically one loop iteration = one LLM call + one tool call.
Read the current state and goal.
Reason about the next action.
Call a tool or run code.
Read the result, then repeat.
A spectrum from approve-every-step to fully autonomous. Autonomy level should scale with task risk and test coverage.
You confirm each action. Right for unfamiliar code, destructive operations, or thin test coverage.
Multi-hour tasks spawning sub-agents — e.g. Nubank reportedly used a fleet of Devins to migrate ~6M lines of code.
Cloud agents fix a GitHub issue in a sandbox while you're in meetings; IDEs run multiple agents in parallel (Antigravity, Cursor 2.0 up to 8).
A supervisor delegates to planner / executor / verifier agents. More capable on complex workflows — but it multiplies token cost and observability needs.