Agent Systems

This section exists because “AI agents” is now too broad to be useful. Real teams need to decide where tools belong, how autonomy should be constrained, when MCP helps, and what governance is required before agents touch internal systems or customer-facing workflows.

Core paths

Model Context Protocol for enterprise teams Use this page to decide when MCP is a genuine architecture advantage and when normal function calling is still enough.

Managed agent architecture Use this page when agent runtime design now requires session isolation, sandboxes, scoped tools, retries, traces, and approval boundaries.

What should an AI agent be allowed to do in production? Use this page when the first architecture question is permission scope, approval, and who the agent should be allowed to act for.

AI chatbot vs AI agent for business Use this page when the business is still deciding whether it needs an agent at all or a simpler system shape would be stronger.

Should AI agents have access to customer data? Use this page when data access is the real boundary shaping trust, review, and deployment scope.

Agent workflows vs autonomous agents Use this page to separate deterministic workflow design from open-ended autonomy before the team overbuilds.

MCP security and approval boundaries Use this page to decide how tool permissions, approval gates, and risk classes should work before MCP reaches production.

MCP server security audit checklist Audit MCP servers for tool authority, prompt injection, SSRF, browser automation, credentials, approvals, traces, and kill-switch design.

Remote MCP servers vs direct tool integrations Use this page when the team needs to choose between shared MCP infrastructure and simpler app-owned tool wiring.

MCP server tooling and SDK generation Use this page when generated SDKs, CLIs, or MCP servers are becoming shared connector infrastructure for agents.

A2A vs MCP Use this page when the team needs to separate tool-access protocols from remote agent interoperability.

Single-agent vs multi-agent systems Use this page when the architecture debate is really about whether handoffs create clarity or just more moving parts.

Built-in tools vs external integrations Use this page when the real architecture choice is vendor-managed tools, app-owned internal APIs, or a deliberate hybrid model.

Computer Use API vs browser automation Use this page when browser-facing agents need a clear rule for UI interpretation, deterministic automation, safety, and cost.

Computer Use API safety checklist Use this page before a computer-use agent touches real browser sessions, credentials, approvals, or customer-facing actions.

Prompt injection defenses for tool-using agents Use this page when web pages, files, retrieved chunks, or tool outputs may contain untrusted instructions.

OpenAI Model Spec: tool outputs are untrusted Use this page when Model Spec guidance needs to become runtime boundaries, approval gates, source trust rules, and safer tool design.

Agents SDK vs app-owned orchestration Use this page when the framework decision is now shaping workflow ownership, tracing, and runtime control.

User-scoped auth vs service accounts Use this page when the real boundary is whose authority the agent is using, not just which tool it can call.

Least privilege tool scopes Use this page when the team needs narrower read, write, and side-effect boundaries instead of one broad integration scope.

Guardrails vs evals Use this page when the team keeps mixing runtime safety controls with release-time quality measurement.

Tool timeouts, retries, and idempotency Use this page when agent reliability and repeated side effects are now control-plane problems rather than prompt problems.

Sandboxing, network permissions, and secrets Use this page when coding-agent execution boundaries need to separate safe repository work from broader system risk.

OpenAI Codex sandboxing and approvals Use this page when Codex desktop needs a concrete policy for sandboxing, approvals, network access, plugins, MCP, and automations.

OpenAI Codex skills, plugins, and MCP Use this page when repeated Codex work should become a skill, plugin, connector workflow, or custom MCP tool boundary.

OpenAI Codex computer use and visual QA Use this page when Codex needs to inspect local web apps, desktop apps, screenshots, UI flows, or visual regressions.

Should AI agents run in a sandbox? Use this page when the team needs a direct rule for when agent execution should be isolated and when lighter containment is enough.

Model routing Routing remains essential once the agent stack grows beyond one model lane or one risk class.

Prompt release governance Agent systems still need release control, rollback logic, and review discipline.

Questions this section is built to answer

When does an agent actually need tool access, retrieval, or MCP integration?
When should the system stay workflow-first instead of pretending autonomy is the product?
Which parts of an agent design create the real risk: model choice, tool permissions, or approval boundaries?
How should teams stage agent adoption so they gain leverage without creating hidden operating risk?

High-value current cluster

Managed agent architecture A current architecture page for teams turning managed-agent interest into sandbox, session, tool, credential, trace, and recovery design.

MCP architecture decision Start here to decide whether MCP belongs in the architecture at all.

MCP security boundary A stronger implementation page around approval, read-versus-write tools, and user-scoped versus system-scoped access.

MCP server audit checklist A current security page for teams turning MCP adoption into concrete review of SSRF, prompt injection, browser tools, credentials, and approval evidence.

Remote MCP adoption boundary A higher-intent page for the moment when MCP interest turns into a real shared-tool platform decision.

Generated connector lifecycle A current implementation page for teams turning API specs into generated SDKs, CLIs, MCP servers, tests, approvals, and versioned connector ownership.

A2A versus MCP A current interoperability page for teams deciding whether they need tool connectivity, remote agent collaboration, or both.

Single-agent versus multi-agent handoffs A durable architecture page for teams deciding when specialist handoffs create control and quality advantages instead of only complexity.

Built-in tools vs owned tool boundaries A stronger architecture page for teams deciding how much agent capability should be vendor-managed versus internally owned.

Computer use versus browser automation A current high-intent page for teams deciding whether browser-facing agents need model-driven UI understanding, deterministic automation, or a hybrid control plane.

Computer use production safety A practical safety page for teams evaluating sandboxing, allowlists, approvals, traces, and credential boundaries around computer-use agents.

User authority versus system authority A high-intent control page for teams deciding when agents should inherit user scope and when a service identity is more appropriate.

Least-privilege tool scopes A stronger control-boundary page for teams breaking broad integrations into narrower read, write, and side-effect capabilities.

Guardrails versus evals A stronger control-boundary page for teams that need to separate runtime prevention from post-run measurement.

Retries, timeouts, and idempotency A practical reliability page for teams trying to stop retry logic from turning tool failures into repeated side effects.

Sandboxing and secret boundaries A higher-intent control page for engineering teams deciding how much filesystem, network, and credential access coding agents should ever receive.

Codex desktop security boundary A Codex-specific policy page for desktop app sandboxing, approval prompts, network use, tool permissions, secrets, and unattended automations.

Codex skills, plugins, and MCP A Codex-specific workflow page for reusable skills, installable plugins, app connectors, MCP servers, and permission scopes.

Codex visual QA and computer use A practical page for using Codex in-app browser and computer use only when visual evidence is needed.

Should AI agents run in a sandbox? A broader runtime-isolation page for teams deciding when sandboxing is mandatory and when it is only one control layer among several.

Prompt injection boundary A stronger security page for teams that now understand untrusted pages, files, and tool outputs can steer agent behavior if the runtime boundary is weak.

OpenAI Model Spec: tool outputs are untrusted A direct authority-boundary page for prompt injection defense when tools, files, pages, screenshots, and retrieved data enter agent context.

Agents SDK versus app-owned orchestration A current architecture page for teams deciding whether framework-managed orchestration is still helping or is now getting in the way of product-owned control.

What should an AI agent be allowed to do in production? A permissions-first page for teams staging read, draft, approval, and execution boundaries before wider rollout.

AI chatbot vs AI agent for business A stronger business-decision page for teams that need to justify agent complexity instead of assuming it.

Should AI agents have access to customer data? A stronger data-boundary page for teams defining exactly what customer context agents should ever be allowed to see.