Jun 23, 2026

Cursor auto-review vs YOLO — picking the middle safety tier

Agent sessions that touch builds, tests, and MCP can stack dozens of approval prompts. When every shell invocation requires a click, the practical choice narrows: babysit the run, or limit agents to single-file edits.

Flip to Run Everything — what Cursor used to call YOLO mode — and the prompts disappear. So does any pre-execution review. Vendor docs and public incident write-ups describe the downside: credential exfiltration, destructive filesystem operations, and unintended pushes to production remotes. Those outcomes are documented failure modes, not hypotheticals invented for effect.

Cursor and Anthropic (Claude Code) now treat the old binary as insufficient. Cursor 3.6 shipped Auto-review on May 29, 2026 (changelog). Anthropic shipped Claude Code auto mode on March 24, 2026 — a permissions mode “where Claude makes permission decisions on your behalf, with safeguards monitoring actions before they run” (Auto mode for Claude Code). Later Claude Code v2.1.178+ releases added subagent-specific classifier checkpoints (spawn-time, per-action, and return review); the top-level mode selector is documented separately in permission modes.

Scope note: Behavior below follows vendor documentation as of June 2026. Settings paths, tier availability, and classifier outcomes can change between releases — verify against current docs before adopting a default on production-adjacent repos. For subagent-heavy Claude Code workflows, see when to let Claude write the harness — harness trust is adjacent but not the focus here.

The sane default is not “ask always” or “ask never.” It is a middle tier configured once and revisited when the repo or threat model changes.

Why the binary failed

Approval fatigue is the obvious failure mode. Long agent runs need dozens of tool calls — reads, builds, test reruns, MCP lookups. If every shell invocation stops for a click, teams either babysit the session or abandon agents for anything beyond a one-file edit.

YOLO regret is the other side. Run Everything in Cursor passes every tool call through with no classifier and no sandbox in the loop. Claude Code’s bypassPermissions mode is the same shape: everything runs, including destructive ops, unless explicit deny rules are wired. That profile fits a disposable container. It is a poor default on a laptop with SSH keys, cloud credentials, and a main branch that deploys.

What practitioners actually wanted: longer uninterrupted runs with something between them and curl | bash. Not a security guarantee — both vendors are explicit that classifiers are probabilistic — but a filter that catches obvious bad calls and sandboxed execution for the rest.

Cursor Auto-review: allowlist → sandbox → classifier

Auto-review is the default Run Mode for new Cursor users as of 3.6. Existing users enable it under Settings → Cursor Settings → Agents → Run Mode (labeled Approvals & Execution in the 3.6 changelog).

It applies to Shell, MCP, and Fetch tool calls. Every call walks three checks in order:

Allowlist. Commands on the terminal allowlist or MCP tools on the MCP allowlist run immediately — no prompt, no sandbox.
Sandbox. If the call can run inside Cursor’s sandbox (macOS, Linux, or Windows via WSL2), it runs there with restricted filesystem and network access. Network defaults to a curated domain list unless overridden via sandbox.json.
Classifier. Everything else goes to an LLM subagent. It sees the current request plus any autoRun instructions from permissions.json. It returns allow or block. On block, Cursor may try a different approach or surface a normal approval prompt.

Cursor documents the classifier as non-deterministic and not a security boundary. It can allow what a human would block and block what was safe. Treat Auto-review as convenience, not compliance.

Configuring the middle tier

Three surfaces matter:

Surface	What it controls
Run Mode (Settings UI)	Auto-review vs Allowlist vs Allowlist (with Sandbox) vs Run Everything
`permissions.json` (`~/.cursor/` and `.cursor/` in the repo)	Terminal/MCP allowlists; `autoRun.allow_instructions` / `block_instructions` for the classifier
Protection toggles (Settings UI)	File-deletion, dotfile, external-file, and browser protections — independent of Run Mode

The autoRun block is the interesting part. Natural-language sentences steer the classifier — not enforce, steer. Example from Cursor’s docs: block instructions like “Especially for delete operations, I like for the classifier to reject so I can have a chance to review.”

Per-user and per-repo permissions.json files concatenate, so teams can commit repo-specific guardrails without touching global config.

Run Everything (formerly YOLO) skips all three checks. Cursor’s docs say to pick it when zero prompting is desired and nothing gets screened first.

How Cursor Run Mode names changed (pre-3.6)

Before Auto-review, Cursor exposed three Run Mode choices under different labels:

Old label (pre-3.6)	What it did	Current equivalent
Run in Sandbox	Auto-run commands that fit the sandbox	Part of Auto-review and Allowlist (with Sandbox)
Ask Every Time	Prompt on every action	Deprecated in 3.5.x — use Allowlist with empty terminal/MCP allowlists
Run Everything	No screening	Still Run Everything

Auto-review (3.6) is the new default middle tier. It keeps allowlist + sandbox and adds the classifier for everything else. Teams that want prompt-on-every-action should pick Allowlist, leave allowlists empty, and verify behavior against current docs — sandbox rules can still auto-run some read-only commands without prompting.

Claude Code auto mode: classifier on every action

Claude Code’s middle tier is auto mode, cycled with Shift+Tab in the CLI or the mode selector in VS Code/Desktop. It sits between default (prompt on most actions) and bypassPermissions (prompt on nothing).

In auto mode, Claude executes without routine permission prompts. Before each action runs, a separate classifier model (Anthropic docs describe it running on Sonnet-class hardware regardless of the main session model) reviews the pending tool call against conversation context and permission rules.

Explicit ask rules still force a prompt. Deny rules block regardless of mode. When auto mode is entered, broad allow rules that grant arbitrary execution — things like Bash(*) — get dropped so subagents cannot bypass the gate.

Auto mode availability depends on account tier and provider; on Bedrock/Vertex/Foundry, CLAUDE_CODE_ENABLE_AUTO_MODE=1 may be required (v2.1.158+). Check Anthropic’s permission mode docs for current requirements — they move faster than blog posts.

Subagents get three checkpoints

For multi-agent workflows, auto mode’s subagent handling is the feature worth comparing to Cursor’s classifier:

At spawn (v2.1.178+): the delegated task description is evaluated before the subagent starts. A task like “delete all remote branches matching this pattern” should fail here.
During execution: each subagent tool call goes through the same classifier and block/allow rules as the parent. Any permissionMode in the subagent’s frontmatter is ignored when the parent is in auto mode.
On return: the classifier reviews the subagent’s full action history. If something looks off — a benign spawn compromised by hostile content mid-run — a security warning gets prepended to the results.

That is a different shape than Cursor’s per-tool-call gate. Claude Code’s spawn-time check addresses delegation risk Cursor’s Auto-review does not explicitly name.

Configure steering via /permissions, settings.json, or managed settings — same family of allow/deny/ask rules, not a separate natural-language autoRun object. When permission rules behave unexpectedly, --safe-mode isolates whether the stack or the mode selector is at fault.

Side-by-side: what each middle tier actually does

	Cursor Auto-review	Claude Code auto mode
Shipped	v3.6 (May 29, 2026)	Mar 24, 2026 (blog); subagent spawn check v2.1.178+
What it gates	Shell, MCP, Fetch	File edits, Bash, network, subagent spawns
First filter	Static allowlist	Permission rules + mode baseline
Second filter	OS sandbox (when available)	Classifier on each pending action
Third filter	LLM classifier	Subagent return review (spawn + per-action + return)
Override surface	`permissions.json` allowlists + `autoRun` NL hints	`/permissions`, settings.json deny/allow/ask rules
Full autonomy escape hatch	Run Everything	`bypassPermissions`
Vendor stance	”Best-effort convenience, not a security boundary”	Classifier blocks escalation beyond the request; still not a formal guarantee
Failure mode	Classifier allows a destructive call a human would have blocked	Classifier blocks a safe call; repeated blocks fall back to prompting
Best for	IDE agents with heavy terminal/MCP use	Terminal-first loops, subagent-heavy workflows

Both middle tiers reduce prompt count. Neither replaces judgment on prod-adjacent repos.

Recommended defaults by profile

These profiles map common threat models to a starting Run Mode. Downgrade one notch after a classifier miss or near-miss in the target environment.

Solo dev, greenfield prototype, disposable directory: Cursor Auto-review with a short terminal allowlist (git, pnpm, npm). Claude Code auto mode for long refactors. Drop to manual approval when the agent touches anything outside the workspace.

Small team, shared repo, CI on main: Cursor Auto-review plus committed .cursor/permissions.json with repo-specific block_instructions (migrations against prod schemas, deploy commands). Claude Code acceptEdits or default for merge-adjacent work; auto only for bounded tasks with clean file boundaries. Avoid bypassPermissions on machines with production credentials.

Prod-on-main, regulated data, or infra repos: Cursor Allowlist (with Sandbox) — empty allowlist until curated. Claude Code default or plan for exploration; auto mode only in scoped sandboxes. The middle tier is for velocity; this profile is for blast-radius control.

Cross-tool equivalence

If you switch between Cursor and Claude Code, map modes by intent — not by identical implementation:

Intent	Cursor Run Mode	Claude Code mode
Keep the agent moving with classifier gates	Auto-review	auto mode
Accept full responsibility; zero prompts	Run Everything	bypassPermissions
Explicit approval on most actions	Allowlist (optionally with Sandbox)	default + tight deny/ask rules

What neither middle tier solves

Documented gaps worth planning around:

Destructive ops with plausible cover. A classifier can approve rm on the wrong directory if the preceding context looked legitimate. File-deletion protection in Cursor helps for automatic deletes; it does not stop approved shell commands.
Credential scope. Neither tool sandboxes environment variables, SSH agent, or cloud CLI sessions. An allowed aws s3 sync is still an allowed aws s3 sync.
Cross-repo blast radius. External-file protection and workspace boundaries help, but allowlisted git operations against the wrong remote remain an operator problem.
Non-determinism. Classifier decisions can differ on replay. Do not build compliance audits around “the AI said no.”
MCP and fetch exfil. Auto-review gates MCP and Fetch in Cursor; Claude Code’s network permissions are a separate configuration surface. Read both docs before wiring production MCP servers. When MCP misbehaves after you’ve tuned permissions, safe mode is the fastest way to see whether the server or your rules own the failure.

The middle tier is a speed bump, not a guardrail to forget about.

Pick a tier, then document the choice

Sane default for most practitioners: Cursor Auto-review and Claude Code auto mode — with committed permission config, protection toggles left on, and escape hatches reserved for throwaway environments.

Start there. When something goes wrong — and classifiers do miss — downgrade one notch (Allowlist / default mode) for that repo and record what the agent attempted. That feedback loop is what both vendors are betting teams will tolerate in exchange for fewer clicks.

What default tier is in use today — and which documented failure mode would force a downgrade?