An AI agent that can reason, plan, and act across your systems is genuinely useful. An AI agent that can do all of that without constraints is a liability waiting to be reported as an incident. The difference is guardrails — and guardrails are a design discipline, not a feature you switch on at the end.
As agents move from demos into real workflows, the interesting questions stop being about model capability and start being about authority. What is this agent allowed to do? With whose permissions? What happens when it is wrong? This article covers how we answer those questions before an agent touches a production system.
Authority is the thing you are designing
A traditional application does exactly what its code says. An agent decides what to do at runtime. That shift means the central design question is not "what can it do" but "what is it permitted to do" — and those should be very different lists.
We scope every agent along three axes.
Scope of action
The set of tools and operations the agent can invoke. We start this list as small as possible and expand it only with evidence. An agent that drafts replies is far lower risk than one that sends them; an agent that proposes a refund is far lower risk than one that issues it. Many useful agents never need write access at all.
Scope of data
The records the agent can read and modify. An agent should run with the narrowest data access that lets it do its job, and ideally with the permissions of the user it is acting for — not a broad service account that can see everything. If a human could not access a record, the agent acting on their behalf should not either.
Scope of consequence
The blast radius of a single action. Spending money, contacting customers, changing production data, and deleting anything are high-consequence actions. These deserve hard limits — caps, rate limits, and approval steps — regardless of how confident the model is.
The central design question is not "what can it do" but "what is it permitted to do" — and those should be very different lists.
Where humans stay in the loop
"Human in the loop" is often used loosely. In practice there are three distinct patterns, and choosing the right one per action matters.
- Human approves — the agent prepares an action and waits for explicit sign-off before executing. Correct for high-consequence, low-volume actions.
- Human monitors — the agent acts immediately, but every action is logged to a review surface a person actually watches. Correct for medium-consequence, higher-volume actions.
- Human audits — the agent acts autonomously and actions are reviewed in aggregate after the fact. Correct only for low-consequence, reversible, high-volume actions.
A well-designed agent uses all three, mapped action by action. The mistake is picking one globally — full approval makes the agent too slow to be useful, full autonomy makes it too risky to trust.
Guardrails that run, not guidelines that are hoped for
A guardrail written into a prompt is a suggestion. The model will usually follow it and occasionally will not. Anything that genuinely must not happen belongs in code, outside the model, where it is enforced deterministically.
- Validation on every tool call — arguments are checked against hard rules before execution. A refund above a threshold is rejected by the tool, not discouraged in the prompt.
- Allowlists over blocklists — the agent can act on what is explicitly permitted, rather than everything except what was thought to forbid.
- Rate and budget limits — independent counters cap how many actions, and how much spend, an agent can reach in a window, containing both bugs and misuse.
- A reversible default — wherever possible, actions are designed to be undoable, so a wrong decision is a correction rather than a crisis.
Observability is part of the guardrail
You cannot govern what you cannot see. Every agent we ship records its full reasoning trace: the goal it was given, the plan it formed, each tool call with arguments and results, and the final outcome. When something goes wrong — and eventually something will — that trace is the difference between a five-minute diagnosis and a multi-day investigation. It is also what lets you tighten scope intelligently over time, because you can see exactly what the agent actually does.
Guardrails are not the part of agent design that slows you down. They are the part that lets you ship at all. An agent nobody trusts stays in the lab; an agent with well-scoped authority, the right human checkpoints, and enforced limits is one you can put in front of real work.