TokenMix Research Lab · 2026-04-22

Cursor Composer 2 Review: 200 tok/s, 61.3 CursorBench (2026)

Cursor 3 shipped in April 2026 with Composer 2 — Anysphere's proprietary frontier coding model — as the default in Auto mode. Key numbers: 61.3 on CursorBench (39% improvement over Composer 1.5), 200+ tokens per second via custom GPU kernels, parallel agent orchestration, background cloud agents, local-to-cloud handoff, and a rebuilt Design Mode. This review examines whether Composer 2 actually beats Claude Opus 4.7 and GPT-5.4 for coding in the editor where you use them. TokenMix.ai routes coding traffic across Composer 2, Claude, GPT, and GLM-5.1 for teams wanting provider-agnostic coding agent infrastructure.

Table of Contents


Confirmed vs Speculation: Composer 2 Claims

Claim Status Source
Cursor 3 released April 2026 Confirmed The New Stack
Composer 2 default in Auto mode Confirmed Cursor release notes
61.3 on CursorBench Confirmed (Anysphere's own benchmark) Cursor official
200+ tok/s via custom GPU kernels Confirmed (Anysphere claim) Technical blog
39% improvement over Composer 1.5 Confirmed Same source
Beats Claude Opus 4.7 on CursorBench Unverified — CursorBench is first-party Anysphere-only benchmark
Matches Opus 4.7 on SWE-Bench Verified Likely no (87.6% Opus vs ~80%? Composer) Third-party data pending
Design Mode GA Confirmed Cursor 3 feature page
Background cloud agents stable Beta quality Community reports

Bottom line: real engineering achievement, marketing overstates competitive positioning. CursorBench is a first-party benchmark optimized for Composer 2 — cross-check with SWE-Bench Verified before taking the 61.3 number as definitive.

CursorBench 61.3: What That Score Means

CursorBench is Anysphere's proprietary benchmark measuring multi-file coding tasks within the Cursor IDE context — repository-aware edits, applying diff suggestions, accepting/rejecting inline completions.

Model CursorBench SWE-Bench Verified (external)
Composer 1.5 44.1 ~55% (est)
Composer 2 61.3 ~80% (est)
Claude Opus 4.7 ~63-65 (est via adapter) 87.6%
GPT-5.4 ~48 (est) 58.7%
Gemini 3.1 Pro ~52 (est) 80.6%

Key observation: CursorBench and SWE-Bench Verified rank models differently. Opus 4.7 dominates SWE-Bench Verified but doesn't equally dominate CursorBench — because CursorBench rewards IDE-native behaviors (inline completion quality, diff acceptance rates) that the Composer training loop explicitly optimizes for.

Honest framing: Composer 2 is probably the best model inside Cursor because it's trained for that environment. It's not a drop-in replacement for Claude Opus 4.7 in arbitrary coding contexts.

Custom GPU Kernels: Where 200 tok/s Comes From

Anysphere published technical details on Composer 2's inference stack. Key optimizations:

1. Custom MLA (Multi-Head Latent Attention) kernel — Anysphere rewrote the attention computation from scratch in CUDA, avoiding PyTorch overhead. Result: 1.7× faster attention at same quality.

2. Speculative decoding with draft model — A 300M-param draft model runs ahead, proposing 4-8 tokens per step that the full model validates in parallel. Hit rate ~65% in coding contexts, giving ~2× effective throughput.

3. Prompt prefix caching — Repository-level context is cached across turns, so a 100K-token project only gets encoded once per session.

Combined result: 200+ tok/s output throughput with ~150ms time-to-first-token on typical IDE workflows. For comparison:

Model Throughput Time-to-first-token
Composer 2 (Cursor 3) 200+ tok/s 150ms
Claude Opus 4.7 (API) 55-80 tok/s 400-800ms
GPT-5.4 (API) 70-100 tok/s 300-600ms
Gemini 3.1 Flash 180 tok/s 200ms

For an IDE where every keystroke matters, Composer 2's latency profile is a real UX advantage.

Parallel Agents + Background Cloud Agents

Cursor 3's headline agent features:

Parallel agents: You can run multiple agent tasks simultaneously in separate panes. Example: one agent refactors a module, another writes tests, a third updates documentation. Each runs independently in isolated git worktrees, merging back when done.

Background cloud agents: Long-running tasks (8-hour refactors, codebase-wide migrations) run on Anysphere's cloud infrastructure, surviving laptop closure. You get notifications on completion. This is Cursor's answer to Claude Code Routines.

Local-to-cloud handoff: Start a task locally, realize it'll take 30+ minutes, hit "run in cloud" to continue remotely without losing context.

Realistic limits:

Cursor 3 vs Claude Code vs Windsurf

Feature Cursor 3 Claude Code Windsurf
Default coding model Composer 2 Opus 4.7 Your choice (Sonnet 4.6 default)
Price $20/mo Pro, $200 Max $20/mo Pro, 00 Max $20/mo Pro, $200 Max
IDE base VS Code fork Terminal + Desktop VS Code fork
Parallel agents Yes, native UI Limited (subagents) Limited
Background cloud agents Yes Yes (Routines) No
Multi-pane Yes No Yes
BugBot Yes No No
Ghost Mode (stealth refactor) Yes No No
Design Mode (UI-to-code) Yes No No
Market share (April 2026) 18% 18% 8%
Developer love (Pragmatic Eng survey) Strong 46% most loved Medium

Positioning summary:

Should You Switch from Claude Code to Cursor 3?

Your situation Switch to Cursor 3?
Prefer terminal workflow No — stick with Claude Code
Live in VS Code day-to-day Yes
Need parallel agent UI Yes, Cursor 3 is ahead
Doing codebase-wide refactors Use both (Cursor for ongoing, Claude Code for massive)
Heavy on Opus 4.7 specifically Stick with Claude Code (Cursor 3 can use Opus but defaults Composer)
Design + code workflows Yes, Cursor 3 Design Mode
Budget < $20/mo Use free tier of either (limited)

Many teams now run both: Cursor 3 for individual feature development, Claude Code for architectural work and automation. Survey data shows Cursor and Claude Code tied at 18% adoption with significant overlap — most heavy users have both installed.

FAQ

Is Composer 2 better than Claude Opus 4.7 for coding?

Inside Cursor, yes for most IDE tasks (inline completion, diffs, repo-aware edits). Outside Cursor on SWE-Bench Verified, Opus 4.7's 87.6% still leads. Use Composer 2 as your default in Cursor; Opus 4.7 via API for headless automation.

How fast is Composer 2 really?

200+ tokens per second output throughput, 150ms time-to-first-token in typical IDE workflows. That's roughly 3× faster than Claude Opus 4.7 via API for equivalent quality. Custom GPU kernels and speculative decoding are the main contributors.

Can I use Claude Opus 4.7 inside Cursor 3 instead of Composer 2?

Yes. Cursor 3 lets you pick any supported model. Set "Claude Opus 4.7" as default if you want max model quality and accept slower latency. Auto mode defaults to Composer 2 for speed.

What's the price of Cursor 3?

$20/month Pro (matches Windsurf and Claude Code Pro). $200/month Max for background cloud agents, heavier usage limits, and priority GPU access. Free tier exists but limits to ~50 slow requests/day.

Is Cursor a VS Code fork or a new editor?

Cursor is a VS Code fork. You can import VS Code settings and extensions directly. This is why Cursor 3 ships with mature extension support that Claude Code (terminal-first) and Windsurf (VS Code fork, less polish) both lack.

Should I wait for Composer 3?

Anysphere hasn't announced Composer 3 timing. Release cadence suggests Q4 2026 at earliest. Composer 2 is a material quality jump — no reason to wait.

How does the reported $60B SpaceX-Cursor deal affect this?

If SpaceX's rumored $60B Cursor acquisition closes, Composer 2 could be swapped for a Grok variant in Cursor's Auto mode. As of April 22, 2026 the deal remains in negotiation. Near-term, Composer 2 stays as default.


Sources

By TokenMix Research Lab · Updated 2026-04-22