Skip to content

Frontier AI Cyber Defense Readiness Checklist

Frontier AI cyber capability is moving from abstract benchmark discussion into controlled defensive programs. OpenAI’s trusted-access cyber work around GPT-5.5 and GPT-5.5-Cyber, and Anthropic’s Project Glasswing around Claude Mythos Preview, point to the same operating reality:

Security teams need a readiness model before advanced AI is allowed to analyze code, reason about vulnerabilities, or propose remediation in real environments.

This checklist is intentionally defensive. It is for security leaders, AppSec teams, AI platform owners, and incident commanders who need governance before capability expands.

Do not start with “which model is strongest?” Start with these seven controls:

ControlDecision to make before access
EligibilityWhich users, teams, and systems are approved for frontier cyber workflows?
ScopeWhich repositories, assets, environments, and vulnerability classes are in bounds?
AuthorityCan the AI only analyze, or can it draft patches, open tickets, create PRs, or trigger scans?
ReviewWhich outputs require human security review before action?
EvidenceWhat run IDs, prompts, tool calls, findings, diffs, and approvals are retained?
ContainmentHow can access, tools, connectors, and workflows be paused quickly?
Learning loopWhich findings become eval cases, secure-coding rules, or incident lessons?

Capability without those controls creates risk faster than it creates defense.

The safest first use cases are bounded, reviewable, and connected to assets the team owns.

WorkflowSafer first versionHigher-risk version
Dependency reviewAnalyze owned repos and known advisories, then draft human-reviewed issuesAutonomous broad scanning with unclear asset ownership
Code reviewFlag risky patterns in a pull request and cite code locationsBlocking releases based on opaque model judgment
Patch draftingPropose minimal diffs for known issues and require maintainer reviewApplying patches directly to production branches
Security backlog triageCluster known findings, deduplicate, and suggest priorityReclassifying severity without AppSec sign-off
Incident supportSummarize evidence and propose investigation questionsExecuting containment actions without incident command approval

Do not allow a pilot to drift from analysis into action just because the model can do more.

Frontier cyber workflows should have stricter access than ordinary chat tools.

  • Require named users, not shared accounts.
  • Use managed identity and single sign-on where possible.
  • Restrict access to approved security, platform, and engineering owners.
  • Keep separate policies for analysis, patch drafting, ticket creation, and tool execution.
  • Log user identity, workspace, asset, model lane, and approval path.
  • Review access after role changes, incidents, and pilot completion.
  • Disable personal account use for company security work.

If the organization cannot identify who ran a workflow and which assets were in scope, the workflow is not ready.

Every run should have an explicit target boundary:

  • repository, service, package, or asset group;
  • business owner and security owner;
  • environment classification;
  • data sensitivity;
  • allowed tools;
  • excluded systems;
  • expected output format;
  • review owner;
  • retention requirement.

The model should not be asked to “find anything” across unclear assets. Defensive work still needs authorization, even when the tool is an AI model.

Retain enough evidence for security review and incident reconstruction:

  • run ID and timestamp;
  • user and approval context;
  • model and tool versions;
  • target assets and declared scope;
  • prompt or task summary;
  • tool calls and retrieved sources;
  • findings and confidence notes;
  • generated patches or tickets;
  • reviewer decisions;
  • final disposition.

Evidence is not bureaucracy. It is how a team separates a real finding from a plausible but wrong recommendation.

Human security review should be mandatory when an AI-generated output:

  • claims a severe vulnerability;
  • proposes a patch to authentication, authorization, cryptography, payment, deployment, or data-access code;
  • recommends disabling a control;
  • suggests a production configuration change;
  • affects customer data or regulated workflows;
  • conflicts with existing severity policy;
  • requires disclosure, escalation, or external communication.

Reviewers should see the evidence packet, not only the final prose.

Before expanding access, build evals from known internal cases:

  • previously fixed vulnerabilities;
  • false positives the team wants to avoid;
  • code review examples with expected labels;
  • patch diffs that were accepted or rejected;
  • incident evidence packets;
  • secure-coding policy examples;
  • tool-use boundary tests.

Measure whether the workflow improves security outcomes without increasing noisy review burden. A model that finds more issues but doubles triage time may still be a poor rollout candidate.

The team should be able to quickly:

  • pause the cyber workflow;
  • revoke model or connector access;
  • disable write-capable tools;
  • narrow target scope;
  • preserve run evidence;
  • notify security leadership;
  • roll back generated patches if needed;
  • add new evals or controls before reactivation.

Containment should be designed before the first high-capability pilot. Waiting until an incident occurs usually means the access model is already too loose.

Ask providers and internal platform owners:

  • What capability tier is the model treated as for cybersecurity risk?
  • What access controls are required?
  • What safeguards differ between ordinary and cyber-specific model access?
  • How are outputs logged, retained, or excluded from training?
  • Which tools can the model call?
  • Which actions require explicit human approval?
  • What misuse monitoring exists?
  • How are customer findings, patches, and sensitive code protected?
  • What happens if the model or policy changes during the pilot?

The buying question is not only capability. It is whether the operating model matches the organization’s risk profile.

  1. Approve defensive use cases and excluded activities.
  2. Limit access to named security and engineering owners.
  3. Define target scope for each pilot workflow.
  4. Require evidence packets and human review for high-impact outputs.
  5. Evaluate on known internal cases before broad deployment.
  6. Add containment, audit, and incident procedures before write-capable actions.

This checklist responds to current defensive-cyber signals from OpenAI’s trusted-access work with GPT-5.5 and GPT-5.5-Cyber and Anthropic’s Project Glasswing, both of which frame advanced AI cyber capability as controlled, defensive, and access-governed rather than a general-purpose open workflow.