Skip to content

How should AI teams set approval thresholds for agents?

Approval thresholds should be based on consequence, not only model confidence.

The strongest approval trigger is usually some combination of:

  • action consequence,
  • reversibility,
  • authority boundary,
  • evidence quality,
  • and trust impact if the action is wrong.

Confidence can help, but it is not a complete control model.

The weak pattern is saying:

  • “Anything below 90% confidence needs approval.”

That sounds precise, but it often fails because:

  • agent confidence may not be calibrated,
  • different workflows carry different risk,
  • and a high-confidence wrong action can still be more dangerous than a low-confidence draft.

The healthiest approval thresholds usually score actions on:

  1. Consequence: What happens if the action is wrong?
  2. Reversibility: Can the action be undone cheaply?
  3. Authority: Is the agent crossing a permission or policy boundary?
  4. Evidence quality: Is the system acting on strong, consistent evidence?
  5. Trust impact: Would a wrong action surprise or damage the user immediately?

Those five factors are more useful than one generic confidence gate.

Approval should usually trigger early for:

  • money movement,
  • policy exceptions,
  • security-sensitive changes,
  • external communications with consequence,
  • and actions that alter important records.

These are the places where false autonomy becomes expensive quickly.

Approval is often overused for:

  • evidence gathering,
  • summarization,
  • internal routing,
  • low-risk drafts,
  • or preparation steps that create no real side effect yet.

If approval covers too much low-risk work, the queue grows faster than reviewer value.

A stronger model separates thresholds by workflow class:

  • hard gate: cannot proceed without approval,
  • soft gate: can proceed only when evidence and policy checks pass,
  • monitor lane: can proceed but is sampled, logged, and reviewed through monitoring,
  • handoff lane: must escalate rather than seek ordinary approval.

That gives teams more control than one blanket threshold.

An approval policy is broken if reviewers cannot keep up.

Thresholds should reflect:

  • reviewer availability,
  • expected volume,
  • SLA expectations,
  • and the real cost of delay.

A theoretically safe approval model that creates endless backlog is still a bad production design.

Set approval thresholds by asking:

  1. what bad outcome are we trying to prevent,
  2. which action classes create that outcome,
  3. which of those classes truly require human judgment before execution,
  4. and whether the review lane can operate fast enough to stay credible.

That produces a usable threshold system.

Your approval thresholds are probably healthy when:

  • action classes are grouped by consequence;
  • high-risk actions are gated explicitly;
  • low-risk steps are not trapped behind universal review;
  • reviewer capacity and SLA are considered;
  • and threshold changes can be justified with outcome data instead of instinct alone.

This page should help a reader decide which repository actions a coding agent should be allowed to take and which gates must protect shared code. For How should AI teams set approval thresholds for agents?, the page is not finished if it only explains vocabulary. It should change what the team approves, measures, routes, buys, logs, or refuses to automate.

Before applying the guidance, bring changed files, test results, reviewer queue data, PR outcomes, and examples of bad or reverted agent changes. Those inputs keep the decision anchored in real operating conditions instead of a generic best-practice list.

CheckWhat the reader should be able to answer
Repository boundaryDoes the page separate read, write, review, merge, and deploy risk?
Reviewer loadDoes it account for the human time needed to inspect generated work?
VerificationAre tests, static checks, and PR gates tied to the action being approved?
RollbackCan the team undo or contain the change if the agent is wrong?

Use the page as a working review artifact: compare the current workflow against the table, mark the missing evidence, and assign an owner for the next change. If the page exposes a gap but no one owns that gap, the correct next step is not broader rollout; it is a smaller pilot, a clearer gate, or a better measurement loop.

For coding-agent pages, the reader should be able to turn the guidance into a repo policy, PR checklist, or reviewer queue rule. Broad enthusiasm is not enough when the output enters shared code.