How should AI teams set approval thresholds for agents?

What matters first

Approval thresholds should be based on consequence, not only model confidence.

The strongest approval trigger is usually some combination of:

action consequence,
reversibility,
authority boundary,
evidence quality,
and trust impact if the action is wrong.

Confidence can help, but it is not a complete control model.

The weak threshold pattern

The weak pattern is saying:

“Anything below 90% confidence needs approval.”

That sounds precise, but it often fails because:

agent confidence may not be calibrated,
different workflows carry different risk,
and a high-confidence wrong action can still be more dangerous than a low-confidence draft.

The five factors that matter most

The healthiest approval thresholds usually score actions on:

Consequence: What happens if the action is wrong?
Reversibility: Can the action be undone cheaply?
Authority: Is the agent crossing a permission or policy boundary?
Evidence quality: Is the system acting on strong, consistent evidence?
Trust impact: Would a wrong action surprise or damage the user immediately?

Those five factors are more useful than one generic confidence gate.

Where approval should trigger early

Approval should usually trigger early for:

money movement,
policy exceptions,
security-sensitive changes,
external communications with consequence,
and actions that alter important records.

These are the places where false autonomy becomes expensive quickly.

Where approval should not dominate

Approval is often overused for:

evidence gathering,
summarization,
internal routing,
low-risk drafts,
or preparation steps that create no real side effect yet.

If approval covers too much low-risk work, the queue grows faster than reviewer value.

The better threshold model

A stronger model separates thresholds by workflow class:

hard gate: cannot proceed without approval,
soft gate: can proceed only when evidence and policy checks pass,
monitor lane: can proceed but is sampled, logged, and reviewed through monitoring,
handoff lane: must escalate rather than seek ordinary approval.

That gives teams more control than one blanket threshold.

Review capacity matters too

An approval policy is broken if reviewers cannot keep up.

Thresholds should reflect:

reviewer availability,
expected volume,
SLA expectations,
and the real cost of delay.

A theoretically safe approval model that creates endless backlog is still a bad production design.

The practical rule

Set approval thresholds by asking:

what bad outcome are we trying to prevent,
which action classes create that outcome,
which of those classes truly require human judgment before execution,
and whether the review lane can operate fast enough to stay credible.

That produces a usable threshold system.

Implementation checklist

Your approval thresholds are probably healthy when:

action classes are grouped by consequence;
high-risk actions are gated explicitly;
low-risk steps are not trapped behind universal review;
reviewer capacity and SLA are considered;
and threshold changes can be justified with outcome data instead of instinct alone.

Compare next

Do AI agents need human approval in production? Use this page when the next question is whether approval should exist at all for a given workflow.

Human in the loop vs human on the loop for AI agents Use this page when the team is deciding between pre-action approval and exception-based oversight.

When should an AI agent ask for confirmation before acting? Use this page when the control question is lighter user confirmation rather than formal approval.

What is a good SLA for an AI agent? Use this page when approval thresholds are now affecting queue design and response expectations.

Reader value check

This page should help a reader decide which repository actions a coding agent should be allowed to take and which gates must protect shared code. For How should AI teams set approval thresholds for agents?, the page is not finished if it only explains vocabulary. It should change what the team approves, measures, routes, buys, logs, or refuses to automate.

Before applying the guidance, bring changed files, test results, reviewer queue data, PR outcomes, and examples of bad or reverted agent changes. Those inputs keep the decision anchored in real operating conditions instead of a generic best-practice list.

Check	What the reader should be able to answer
Repository boundary	Does the page separate read, write, review, merge, and deploy risk?
Reviewer load	Does it account for the human time needed to inspect generated work?
Verification	Are tests, static checks, and PR gates tied to the action being approved?
Rollback	Can the team undo or contain the change if the agent is wrong?

Use the page as a working review artifact: compare the current workflow against the table, mark the missing evidence, and assign an owner for the next change. If the page exposes a gap but no one owns that gap, the correct next step is not broader rollout; it is a smaller pilot, a clearer gate, or a better measurement loop.

For coding-agent pages, the reader should be able to turn the guidance into a repo policy, PR checklist, or reviewer queue rule. Broad enthusiasm is not enough when the output enters shared code.