Cursor auto-review vs YOLO — picking the middle safety tier


Agent sessions that touch builds, tests, and MCP can stack dozens of approval prompts. When every shell invocation requires a click, the practical choice narrows: babysit the run, or limit agents to single-file edits.

Flip to Run Everything — what Cursor used to call YOLO mode — and the prompts disappear. So does any pre-execution review. Vendor docs and public incident write-ups describe the downside: credential exfiltration, destructive filesystem operations, and unintended pushes to production remotes. Those outcomes are documented failure modes, not hypotheticals invented for effect.

Cursor and Anthropic (Claude Code) now treat the old binary as insufficient. Cursor 3.6 shipped Auto-review on May 29, 2026 (changelog). Anthropic shipped Claude Code auto mode on March 24, 2026 — a permissions mode “where Claude makes permission decisions on your behalf, with safeguards monitoring actions before they run” (Auto mode for Claude Code). Later Claude Code v2.1.178+ releases added subagent-specific classifier checkpoints (spawn-time, per-action, and return review); the top-level mode selector is documented separately in permission modes.

Scope note: Behavior below follows vendor documentation as of June 2026. Settings paths, tier availability, and classifier outcomes can change between releases — verify against current docs before adopting a default on production-adjacent repos. For subagent-heavy Claude Code workflows, see when to let Claude write the harness — harness trust is adjacent but not the focus here.

The sane default is not “ask always” or “ask never.” It is a middle tier configured once and revisited when the repo or threat model changes.

Why the binary failed

Approval fatigue is the obvious failure mode. Long agent runs need dozens of tool calls — reads, builds, test reruns, MCP lookups. If every shell invocation stops for a click, teams either babysit the session or abandon agents for anything beyond a one-file edit.

YOLO regret is the other side. Run Everything in Cursor passes every tool call through with no classifier and no sandbox in the loop. Claude Code’s bypassPermissions mode is the same shape: everything runs, including destructive ops, unless explicit deny rules are wired. That profile fits a disposable container. It is a poor default on a laptop with SSH keys, cloud credentials, and a main branch that deploys.

What practitioners actually wanted: longer uninterrupted runs with something between them and curl | bash. Not a security guarantee — both vendors are explicit that classifiers are probabilistic — but a filter that catches obvious bad calls and sandboxed execution for the rest.

Cursor Auto-review: allowlist → sandbox → classifier

Auto-review is the default Run Mode for new Cursor users as of 3.6. Existing users enable it under Settings → Cursor Settings → Agents → Run Mode (labeled Approvals & Execution in the 3.6 changelog).

It applies to Shell, MCP, and Fetch tool calls. Every call walks three checks in order:

  1. Allowlist. Commands on the terminal allowlist or MCP tools on the MCP allowlist run immediately — no prompt, no sandbox.
  2. Sandbox. If the call can run inside Cursor’s sandbox (macOS, Linux, or Windows via WSL2), it runs there with restricted filesystem and network access. Network defaults to a curated domain list unless overridden via sandbox.json.
  3. Classifier. Everything else goes to an LLM subagent. It sees the current request plus any autoRun instructions from permissions.json. It returns allow or block. On block, Cursor may try a different approach or surface a normal approval prompt.

Cursor documents the classifier as non-deterministic and not a security boundary. It can allow what a human would block and block what was safe. Treat Auto-review as convenience, not compliance.

Configuring the middle tier

Three surfaces matter:

SurfaceWhat it controls
Run Mode (Settings UI)Auto-review vs Allowlist vs Allowlist (with Sandbox) vs Run Everything
permissions.json (~/.cursor/ and .cursor/ in the repo)Terminal/MCP allowlists; autoRun.allow_instructions / block_instructions for the classifier
Protection toggles (Settings UI)File-deletion, dotfile, external-file, and browser protections — independent of Run Mode

The autoRun block is the interesting part. Natural-language sentences steer the classifier — not enforce, steer. Example from Cursor’s docs: block instructions like “Especially for delete operations, I like for the classifier to reject so I can have a chance to review.”

Per-user and per-repo permissions.json files concatenate, so teams can commit repo-specific guardrails without touching global config.

Run Everything (formerly YOLO) skips all three checks. Cursor’s docs say to pick it when zero prompting is desired and nothing gets screened first.

How Cursor Run Mode names changed (pre-3.6)

Before Auto-review, Cursor exposed three Run Mode choices under different labels:

Old label (pre-3.6)What it didCurrent equivalent
Run in SandboxAuto-run commands that fit the sandboxPart of Auto-review and Allowlist (with Sandbox)
Ask Every TimePrompt on every actionDeprecated in 3.5.x — use Allowlist with empty terminal/MCP allowlists
Run EverythingNo screeningStill Run Everything

Auto-review (3.6) is the new default middle tier. It keeps allowlist + sandbox and adds the classifier for everything else. Teams that want prompt-on-every-action should pick Allowlist, leave allowlists empty, and verify behavior against current docs — sandbox rules can still auto-run some read-only commands without prompting.

Claude Code auto mode: classifier on every action

Claude Code’s middle tier is auto mode, cycled with Shift+Tab in the CLI or the mode selector in VS Code/Desktop. It sits between default (prompt on most actions) and bypassPermissions (prompt on nothing).

In auto mode, Claude executes without routine permission prompts. Before each action runs, a separate classifier model (Anthropic docs describe it running on Sonnet-class hardware regardless of the main session model) reviews the pending tool call against conversation context and permission rules.

Explicit ask rules still force a prompt. Deny rules block regardless of mode. When auto mode is entered, broad allow rules that grant arbitrary execution — things like Bash(*) — get dropped so subagents cannot bypass the gate.

Auto mode availability depends on account tier and provider; on Bedrock/Vertex/Foundry, CLAUDE_CODE_ENABLE_AUTO_MODE=1 may be required (v2.1.158+). Check Anthropic’s permission mode docs for current requirements — they move faster than blog posts.

Subagents get three checkpoints

For multi-agent workflows, auto mode’s subagent handling is the feature worth comparing to Cursor’s classifier:

  1. At spawn (v2.1.178+): the delegated task description is evaluated before the subagent starts. A task like “delete all remote branches matching this pattern” should fail here.
  2. During execution: each subagent tool call goes through the same classifier and block/allow rules as the parent. Any permissionMode in the subagent’s frontmatter is ignored when the parent is in auto mode.
  3. On return: the classifier reviews the subagent’s full action history. If something looks off — a benign spawn compromised by hostile content mid-run — a security warning gets prepended to the results.

That is a different shape than Cursor’s per-tool-call gate. Claude Code’s spawn-time check addresses delegation risk Cursor’s Auto-review does not explicitly name.

Configure steering via /permissions, settings.json, or managed settings — same family of allow/deny/ask rules, not a separate natural-language autoRun object. When permission rules behave unexpectedly, --safe-mode isolates whether the stack or the mode selector is at fault.

Side-by-side: what each middle tier actually does

Cursor Auto-reviewClaude Code auto mode
Shippedv3.6 (May 29, 2026)Mar 24, 2026 (blog); subagent spawn check v2.1.178+
What it gatesShell, MCP, FetchFile edits, Bash, network, subagent spawns
First filterStatic allowlistPermission rules + mode baseline
Second filterOS sandbox (when available)Classifier on each pending action
Third filterLLM classifierSubagent return review (spawn + per-action + return)
Override surfacepermissions.json allowlists + autoRun NL hints/permissions, settings.json deny/allow/ask rules
Full autonomy escape hatchRun EverythingbypassPermissions
Vendor stance”Best-effort convenience, not a security boundary”Classifier blocks escalation beyond the request; still not a formal guarantee
Failure modeClassifier allows a destructive call a human would have blockedClassifier blocks a safe call; repeated blocks fall back to prompting
Best forIDE agents with heavy terminal/MCP useTerminal-first loops, subagent-heavy workflows

Both middle tiers reduce prompt count. Neither replaces judgment on prod-adjacent repos.

These profiles map common threat models to a starting Run Mode. Downgrade one notch after a classifier miss or near-miss in the target environment.

Solo dev, greenfield prototype, disposable directory: Cursor Auto-review with a short terminal allowlist (git, pnpm, npm). Claude Code auto mode for long refactors. Drop to manual approval when the agent touches anything outside the workspace.

Small team, shared repo, CI on main: Cursor Auto-review plus committed .cursor/permissions.json with repo-specific block_instructions (migrations against prod schemas, deploy commands). Claude Code acceptEdits or default for merge-adjacent work; auto only for bounded tasks with clean file boundaries. Avoid bypassPermissions on machines with production credentials.

Prod-on-main, regulated data, or infra repos: Cursor Allowlist (with Sandbox) — empty allowlist until curated. Claude Code default or plan for exploration; auto mode only in scoped sandboxes. The middle tier is for velocity; this profile is for blast-radius control.

Cross-tool equivalence

If you switch between Cursor and Claude Code, map modes by intent — not by identical implementation:

IntentCursor Run ModeClaude Code mode
Keep the agent moving with classifier gatesAuto-reviewauto mode
Accept full responsibility; zero promptsRun EverythingbypassPermissions
Explicit approval on most actionsAllowlist (optionally with Sandbox)default + tight deny/ask rules

What neither middle tier solves

Documented gaps worth planning around:

  • Destructive ops with plausible cover. A classifier can approve rm on the wrong directory if the preceding context looked legitimate. File-deletion protection in Cursor helps for automatic deletes; it does not stop approved shell commands.
  • Credential scope. Neither tool sandboxes environment variables, SSH agent, or cloud CLI sessions. An allowed aws s3 sync is still an allowed aws s3 sync.
  • Cross-repo blast radius. External-file protection and workspace boundaries help, but allowlisted git operations against the wrong remote remain an operator problem.
  • Non-determinism. Classifier decisions can differ on replay. Do not build compliance audits around “the AI said no.”
  • MCP and fetch exfil. Auto-review gates MCP and Fetch in Cursor; Claude Code’s network permissions are a separate configuration surface. Read both docs before wiring production MCP servers. When MCP misbehaves after you’ve tuned permissions, safe mode is the fastest way to see whether the server or your rules own the failure.

The middle tier is a speed bump, not a guardrail to forget about.

Pick a tier, then document the choice

Sane default for most practitioners: Cursor Auto-review and Claude Code auto mode — with committed permission config, protection toggles left on, and escape hatches reserved for throwaway environments.

Start there. When something goes wrong — and classifiers do miss — downgrade one notch (Allowlist / default mode) for that repo and record what the agent attempted. That feedback loop is what both vendors are betting teams will tolerate in exchange for fewer clicks.

What default tier is in use today — and which documented failure mode would force a downgrade?