TokenMix Research Lab · 2026-04-22
Cursor Composer 2 Review: 200 tok/s, 61.3 CursorBench (2026)
Last Updated: 2026-04-22
Author: TokenMix Research Lab
Cursor 3 shipped in April 2026 with Composer 2 — Anysphere's proprietary frontier coding model — as the default in Auto mode. Key numbers: 61.3 on CursorBench (39% improvement over Composer 1.5), 200+ tokens per second via custom GPU kernels, parallel agent orchestration, background cloud agents, local-to-cloud handoff, and a rebuilt Design Mode. This review examines whether Composer 2 actually beats Claude Opus 4.7 and GPT-5.4 for coding in the editor where you use them. TokenMix.ai routes coding traffic across Composer 2, Claude, GPT, and GLM-5.1 for teams wanting provider-agnostic coding agent infrastructure.
Table of Contents
- Confirmed vs Speculation: Composer 2 Claims
- CursorBench 61.3: What That Score Means
- Custom GPU Kernels: Where 200 tok/s Comes From
- Parallel Agents + Background Cloud Agents
- Cursor 3 vs Claude Code vs Windsurf
- Should You Switch from Claude Code to Cursor 3?
- FAQ
Confirmed vs Speculation: Composer 2 Claims
| Claim | Status | Source |
|---|---|---|
| Cursor 3 released April 2026 | Confirmed | The New Stack |
| Composer 2 default in Auto mode | Confirmed | Cursor release notes |
| 61.3 on CursorBench | Confirmed (Anysphere's own benchmark) | Cursor official |
| 200+ tok/s via custom GPU kernels | Confirmed (Anysphere claim) | Technical blog |
| 39% improvement over Composer 1.5 | Confirmed | Same source |
| Beats Claude Opus 4.7 on CursorBench | Unverified — CursorBench is first-party | Anysphere-only benchmark |
| Matches Opus 4.7 on SWE-Bench Verified | Likely no (87.6% Opus vs ~80%? Composer) | Third-party data pending |
| Design Mode GA | Confirmed | Cursor 3 feature page |
| Background cloud agents stable | Beta quality | Community reports |
Bottom line: real engineering achievement, marketing overstates competitive positioning. CursorBench is a first-party benchmark optimized for Composer 2 — cross-check with SWE-Bench Verified before taking the 61.3 number as definitive.
CursorBench 61.3: What That Score Means
CursorBench is Anysphere's proprietary benchmark measuring multi-file coding tasks within the Cursor IDE context — repository-aware edits, applying diff suggestions, accepting/rejecting inline completions.
| Model | CursorBench | SWE-Bench Verified (external) |
|---|---|---|
| Composer 1.5 | 44.1 | ~55% (est) |
| Composer 2 | 61.3 | ~80% (est) |
| Claude Opus 4.7 | ~63-65 (est via adapter) | 87.6% |
| GPT-5.4 | ~48 (est) | 58.7% |
| Gemini 3.1 Pro | ~52 (est) | 80.6% |
Key observation: CursorBench and SWE-Bench Verified rank models differently. Opus 4.7 dominates SWE-Bench Verified but doesn't equally dominate CursorBench — because CursorBench rewards IDE-native behaviors (inline completion quality, diff acceptance rates) that the Composer training loop explicitly optimizes for.
Honest framing: Composer 2 is probably the best model inside Cursor because it's trained for that environment. It's not a drop-in replacement for Claude Opus 4.7 in arbitrary coding contexts.
Custom GPU Kernels: Where 200 tok/s Comes From
Anysphere published technical details on Composer 2's inference stack. Key optimizations:
1. Custom MLA (Multi-Head Latent Attention) kernel — Anysphere rewrote the attention computation from scratch in CUDA, avoiding PyTorch overhead. Result: 1.7× faster attention at same quality.
2. Speculative decoding with draft model — A 300M-param draft model runs ahead, proposing 4-8 tokens per step that the full model validates in parallel. Hit rate ~65% in coding contexts, giving ~2× effective throughput.
3. Prompt prefix caching — Repository-level context is cached across turns, so a 100K-token project only gets encoded once per session.
Combined result: 200+ tok/s output throughput with ~150ms time-to-first-token on typical IDE workflows. For comparison:
| Model | Throughput | Time-to-first-token |
|---|---|---|
| Composer 2 (Cursor 3) | 200+ tok/s | 150ms |
| Claude Opus 4.7 (API) | 55-80 tok/s | 400-800ms |
| GPT-5.4 (API) | 70-100 tok/s | 300-600ms |
| Gemini 3.1 Flash | 180 tok/s | 200ms |
For an IDE where every keystroke matters, Composer 2's latency profile is a real UX advantage.
Parallel Agents + Background Cloud Agents
Cursor 3's headline agent features:
Parallel agents: You can run multiple agent tasks simultaneously in separate panes. Example: one agent refactors a module, another writes tests, a third updates documentation. Each runs independently in isolated git worktrees, merging back when done.
Background cloud agents: Long-running tasks (8-hour refactors, codebase-wide migrations) run on Anysphere's cloud infrastructure, surviving laptop closure. You get notifications on completion. This is Cursor's answer to Claude Code Routines.
Local-to-cloud handoff: Start a task locally, realize it'll take 30+ minutes, hit "run in cloud" to continue remotely without losing context.
Realistic limits:
- Parallel agents in same repo can conflict on shared files. Anysphere's merge resolution is usable but not perfect.
- Cloud agents cost extra (Cursor 3 Max tier, $200/month)
- Background agents have a 2-hour soft cap on free tier
Cursor 3 vs Claude Code vs Windsurf
| Feature | Cursor 3 | Claude Code | Windsurf |
|---|---|---|---|
| Default coding model | Composer 2 | Opus 4.7 | Your choice (Sonnet 4.6 default) |
| Price | $20/mo Pro, $200 Max | $20/mo Pro, $100 Max | $20/mo Pro, $200 Max |
| IDE base | VS Code fork | Terminal + Desktop | VS Code fork |
| Parallel agents | Yes, native UI | Limited (subagents) | Limited |
| Background cloud agents | Yes | Yes (Routines) | No |
| Multi-pane | Yes | No | Yes |
| BugBot | Yes | No | No |
| Ghost Mode (stealth refactor) | Yes | No | No |
| Design Mode (UI-to-code) | Yes | No | No |
| Market share (April 2026) | 18% | 18% | 8% |
| Developer love (Pragmatic Eng survey) | Strong | 46% most loved | Medium |
Positioning summary:
- Cursor 3: highest capability ceiling in an IDE form factor. Best if you live in VS Code.
- Claude Code: terminal-native, highest model quality (Opus 4.7), best for codebase-wide refactors.
- Windsurf: middle ground, quota pricing (see our Windsurf pricing change analysis), now slightly dated vs Cursor 3.
Should You Switch from Claude Code to Cursor 3?
| Your situation | Switch to Cursor 3? |
|---|---|
| Prefer terminal workflow | No — stick with Claude Code |
| Live in VS Code day-to-day | Yes |
| Need parallel agent UI | Yes, Cursor 3 is ahead |
| Doing codebase-wide refactors | Use both (Cursor for ongoing, Claude Code for massive) |
| Heavy on Opus 4.7 specifically | Stick with Claude Code (Cursor 3 can use Opus but defaults Composer) |
| Design + code workflows | Yes, Cursor 3 Design Mode |
| Budget < $20/mo | Use free tier of either (limited) |
Many teams now run both: Cursor 3 for individual feature development, Claude Code for architectural work and automation. Survey data shows Cursor and Claude Code tied at 18% adoption with significant overlap — most heavy users have both installed.
FAQ
Is Composer 2 better than Claude Opus 4.7 for coding?
Inside Cursor, yes for most IDE tasks (inline completion, diffs, repo-aware edits). Outside Cursor on SWE-Bench Verified, Opus 4.7's 87.6% still leads. Use Composer 2 as your default in Cursor; Opus 4.7 via API for headless automation.
How fast is Composer 2 really?
200+ tokens per second output throughput, 150ms time-to-first-token in typical IDE workflows. That's roughly 3× faster than Claude Opus 4.7 via API for equivalent quality. Custom GPU kernels and speculative decoding are the main contributors.
Can I use Claude Opus 4.7 inside Cursor 3 instead of Composer 2?
Yes. Cursor 3 lets you pick any supported model. Set "Claude Opus 4.7" as default if you want max model quality and accept slower latency. Auto mode defaults to Composer 2 for speed.
What's the price of Cursor 3?
$20/month Pro (matches Windsurf and Claude Code Pro). $200/month Max for background cloud agents, heavier usage limits, and priority GPU access. Free tier exists but limits to ~50 slow requests/day.
Is Cursor a VS Code fork or a new editor?
Cursor is a VS Code fork. You can import VS Code settings and extensions directly. This is why Cursor 3 ships with mature extension support that Claude Code (terminal-first) and Windsurf (VS Code fork, less polish) both lack.
Should I wait for Composer 3?
Anysphere hasn't announced Composer 3 timing. Release cadence suggests Q4 2026 at earliest. Composer 2 is a material quality jump — no reason to wait.
How does the reported $60B SpaceX-Cursor deal affect this?
If SpaceX's rumored $60B Cursor acquisition closes, Composer 2 could be swapped for a Grok variant in Cursor's Auto mode. As of April 22, 2026 the deal remains in negotiation. Near-term, Composer 2 stays as default.
Sources
- Cursor 3 Release — The New Stack
- Cursor vs Claude Code vs Windsurf — NxCode
- Cursor vs Claude Code vs Windsurf — Dev.to
- Best AI Coding Agents 2026 — Codegen Blog
- Pragmatic Engineer Survey — Claude Code 46% Most Loved
- SpaceX-Cursor $60B Deal — TokenMix
- Windsurf Quota Pricing — TokenMix
By TokenMix Research Lab · Updated 2026-04-22