TokenMix Research Lab · 2026-04-22

Cursor Composer 2 Review: 200 tok/s, 61.3 CursorBench (2026)

Last Updated: 2026-04-22
Author: TokenMix Research Lab

Cursor 3 shipped in April 2026 with Composer 2 — Anysphere's proprietary frontier coding model — as the default in Auto mode. Key numbers: 61.3 on CursorBench (39% improvement over Composer 1.5), 200+ tokens per second via custom GPU kernels, parallel agent orchestration, background cloud agents, local-to-cloud handoff, and a rebuilt Design Mode. This review examines whether Composer 2 actually beats Claude Opus 4.7 and GPT-5.4 for coding in the editor where you use them. TokenMix.ai routes coding traffic across Composer 2, Claude, GPT, and GLM-5.1 for teams wanting provider-agnostic coding agent infrastructure.

Confirmed vs Speculation: Composer 2 Claims
CursorBench 61.3: What That Score Means
Custom GPU Kernels: Where 200 tok/s Comes From
Parallel Agents + Background Cloud Agents
Cursor 3 vs Claude Code vs Windsurf
Should You Switch from Claude Code to Cursor 3?
FAQ

Confirmed vs Speculation: Composer 2 Claims

Claim	Status	Source
Cursor 3 released April 2026	Confirmed	The New Stack
Composer 2 default in Auto mode	Confirmed	Cursor release notes
61.3 on CursorBench	Confirmed (Anysphere's own benchmark)	Cursor official
200+ tok/s via custom GPU kernels	Confirmed (Anysphere claim)	Technical blog
39% improvement over Composer 1.5	Confirmed	Same source
Beats Claude Opus 4.7 on CursorBench	Unverified — CursorBench is first-party	Anysphere-only benchmark
Matches Opus 4.7 on SWE-Bench Verified	Likely no (87.6% Opus vs ~80%? Composer)	Third-party data pending
Design Mode GA	Confirmed	Cursor 3 feature page
Background cloud agents stable	Beta quality	Community reports

Bottom line: real engineering achievement, marketing overstates competitive positioning. CursorBench is a first-party benchmark optimized for Composer 2 — cross-check with SWE-Bench Verified before taking the 61.3 number as definitive.

CursorBench 61.3: What That Score Means

CursorBench is Anysphere's proprietary benchmark measuring multi-file coding tasks within the Cursor IDE context — repository-aware edits, applying diff suggestions, accepting/rejecting inline completions.

Model	CursorBench	SWE-Bench Verified (external)
Composer 1.5	44.1	~55% (est)
Composer 2	61.3	~80% (est)
Claude Opus 4.7	~63-65 (est via adapter)	87.6%
GPT-5.4	~48 (est)	58.7%
Gemini 3.1 Pro	~52 (est)	80.6%

Key observation: CursorBench and SWE-Bench Verified rank models differently. Opus 4.7 dominates SWE-Bench Verified but doesn't equally dominate CursorBench — because CursorBench rewards IDE-native behaviors (inline completion quality, diff acceptance rates) that the Composer training loop explicitly optimizes for.

Honest framing: Composer 2 is probably the best model inside Cursor because it's trained for that environment. It's not a drop-in replacement for Claude Opus 4.7 in arbitrary coding contexts.

Custom GPU Kernels: Where 200 tok/s Comes From

Anysphere published technical details on Composer 2's inference stack. Key optimizations:

1. Custom MLA (Multi-Head Latent Attention) kernel — Anysphere rewrote the attention computation from scratch in CUDA, avoiding PyTorch overhead. Result: 1.7× faster attention at same quality.

2. Speculative decoding with draft model — A 300M-param draft model runs ahead, proposing 4-8 tokens per step that the full model validates in parallel. Hit rate ~65% in coding contexts, giving ~2× effective throughput.

3. Prompt prefix caching — Repository-level context is cached across turns, so a 100K-token project only gets encoded once per session.

Combined result: 200+ tok/s output throughput with ~150ms time-to-first-token on typical IDE workflows. For comparison:

Model	Throughput	Time-to-first-token
Composer 2 (Cursor 3)	200+ tok/s	150ms
Claude Opus 4.7 (API)	55-80 tok/s	400-800ms
GPT-5.4 (API)	70-100 tok/s	300-600ms
Gemini 3.1 Flash	180 tok/s	200ms

For an IDE where every keystroke matters, Composer 2's latency profile is a real UX advantage.

Parallel Agents + Background Cloud Agents

Cursor 3's headline agent features:

Parallel agents: You can run multiple agent tasks simultaneously in separate panes. Example: one agent refactors a module, another writes tests, a third updates documentation. Each runs independently in isolated git worktrees, merging back when done.

Background cloud agents: Long-running tasks (8-hour refactors, codebase-wide migrations) run on Anysphere's cloud infrastructure, surviving laptop closure. You get notifications on completion. This is Cursor's answer to Claude Code Routines.

Local-to-cloud handoff: Start a task locally, realize it'll take 30+ minutes, hit "run in cloud" to continue remotely without losing context.

Realistic limits:

Parallel agents in same repo can conflict on shared files. Anysphere's merge resolution is usable but not perfect.
Cloud agents cost extra (Cursor 3 Max tier, $200/month)
Background agents have a 2-hour soft cap on free tier

Cursor 3 vs Claude Code vs Windsurf

Feature	Cursor 3	Claude Code	Windsurf
Default coding model	Composer 2	Opus 4.7	Your choice (Sonnet 4.6 default)
Price	$20/mo Pro, $200 Max	$20/mo Pro, $100 Max	$20/mo Pro, $200 Max
IDE base	VS Code fork	Terminal + Desktop	VS Code fork
Parallel agents	Yes, native UI	Limited (subagents)	Limited
Background cloud agents	Yes	Yes (Routines)	No
Multi-pane	Yes	No	Yes
BugBot	Yes	No	No
Ghost Mode (stealth refactor)	Yes	No	No
Design Mode (UI-to-code)	Yes	No	No
Market share (April 2026)	18%	18%	8%
Developer love (Pragmatic Eng survey)	Strong	46% most loved	Medium

Positioning summary:

Cursor 3: highest capability ceiling in an IDE form factor. Best if you live in VS Code.
Claude Code: terminal-native, highest model quality (Opus 4.7), best for codebase-wide refactors.
Windsurf: middle ground, quota pricing (see our Windsurf pricing change analysis), now slightly dated vs Cursor 3.

Should You Switch from Claude Code to Cursor 3?

Your situation	Switch to Cursor 3?
Prefer terminal workflow	No — stick with Claude Code
Live in VS Code day-to-day	Yes
Need parallel agent UI	Yes, Cursor 3 is ahead
Doing codebase-wide refactors	Use both (Cursor for ongoing, Claude Code for massive)
Heavy on Opus 4.7 specifically	Stick with Claude Code (Cursor 3 can use Opus but defaults Composer)
Design + code workflows	Yes, Cursor 3 Design Mode
Budget < $20/mo	Use free tier of either (limited)

Many teams now run both: Cursor 3 for individual feature development, Claude Code for architectural work and automation. Survey data shows Cursor and Claude Code tied at 18% adoption with significant overlap — most heavy users have both installed.

FAQ

Is Composer 2 better than Claude Opus 4.7 for coding?

Inside Cursor, yes for most IDE tasks (inline completion, diffs, repo-aware edits). Outside Cursor on SWE-Bench Verified, Opus 4.7's 87.6% still leads. Use Composer 2 as your default in Cursor; Opus 4.7 via API for headless automation.

How fast is Composer 2 really?

200+ tokens per second output throughput, 150ms time-to-first-token in typical IDE workflows. That's roughly 3× faster than Claude Opus 4.7 via API for equivalent quality. Custom GPU kernels and speculative decoding are the main contributors.

Can I use Claude Opus 4.7 inside Cursor 3 instead of Composer 2?

Yes. Cursor 3 lets you pick any supported model. Set "Claude Opus 4.7" as default if you want max model quality and accept slower latency. Auto mode defaults to Composer 2 for speed.

What's the price of Cursor 3?

$20/month Pro (matches Windsurf and Claude Code Pro). $200/month Max for background cloud agents, heavier usage limits, and priority GPU access. Free tier exists but limits to ~50 slow requests/day.

Is Cursor a VS Code fork or a new editor?

Cursor is a VS Code fork. You can import VS Code settings and extensions directly. This is why Cursor 3 ships with mature extension support that Claude Code (terminal-first) and Windsurf (VS Code fork, less polish) both lack.

Should I wait for Composer 3?

Anysphere hasn't announced Composer 3 timing. Release cadence suggests Q4 2026 at earliest. Composer 2 is a material quality jump — no reason to wait.

How does the reported $60B SpaceX-Cursor deal affect this?

If SpaceX's rumored $60B Cursor acquisition closes, Composer 2 could be swapped for a Grok variant in Cursor's Auto mode. As of April 22, 2026 the deal remains in negotiation. Near-term, Composer 2 stays as default.

Sources

By TokenMix Research Lab · Updated 2026-04-22