TokenMix Research Lab · 2026-04-22
Cursor Composer 2 Review: 200 tok/s, 61.3 CursorBench (2026)
Cursor 3 shipped in April 2026 with Composer 2 — Anysphere's proprietary frontier coding model — as the default in Auto mode. Key numbers: 61.3 on CursorBench (39% improvement over Composer 1.5), 200+ tokens per second via custom GPU kernels, parallel agent orchestration, background cloud agents, local-to-cloud handoff, and a rebuilt Design Mode. This review examines whether Composer 2 actually beats Claude Opus 4.7 and GPT-5.4 for coding in the editor where you use them. TokenMix.ai routes coding traffic across Composer 2, Claude, GPT, and GLM-5.1 for teams wanting provider-agnostic coding agent infrastructure.
Table of Contents
- Confirmed vs Speculation: Composer 2 Claims
- CursorBench 61.3: What That Score Means
- Custom GPU Kernels: Where 200 tok/s Comes From
- Parallel Agents + Background Cloud Agents
- Cursor 3 vs Claude Code vs Windsurf
- Should You Switch from Claude Code to Cursor 3?
- FAQ
Confirmed vs Speculation: Composer 2 Claims
| Claim | Status | Source |
|---|---|---|
| Cursor 3 released April 2026 | Confirmed | The New Stack |
| Composer 2 default in Auto mode | Confirmed | Cursor release notes |
| 61.3 on CursorBench | Confirmed (Anysphere's own benchmark) | Cursor official |
| 200+ tok/s via custom GPU kernels | Confirmed (Anysphere claim) | Technical blog |
| 39% improvement over Composer 1.5 | Confirmed | Same source |
| Beats Claude Opus 4.7 on CursorBench | Unverified — CursorBench is first-party | Anysphere-only benchmark |
| Matches Opus 4.7 on SWE-Bench Verified | Likely no (87.6% Opus vs ~80%? Composer) | Third-party data pending |
| Design Mode GA | Confirmed | Cursor 3 feature page |
| Background cloud agents stable | Beta quality | Community reports |
Bottom line: real engineering achievement, marketing overstates competitive positioning. CursorBench is a first-party benchmark optimized for Composer 2 — cross-check with SWE-Bench Verified before taking the 61.3 number as definitive.
CursorBench 61.3: What That Score Means
CursorBench is Anysphere's proprietary benchmark measuring multi-file coding tasks within the Cursor IDE context — repository-aware edits, applying diff suggestions, accepting/rejecting inline completions.
| Model | CursorBench | SWE-Bench Verified (external) |
|---|---|---|
| Composer 1.5 | 44.1 | ~55% (est) |
| Composer 2 | 61.3 | ~80% (est) |
| Claude Opus 4.7 | ~63-65 (est via adapter) | 87.6% |
| GPT-5.4 | ~48 (est) | 58.7% |
| Gemini 3.1 Pro | ~52 (est) | 80.6% |
Key observation: CursorBench and SWE-Bench Verified rank models differently. Opus 4.7 dominates SWE-Bench Verified but doesn't equally dominate CursorBench — because CursorBench rewards IDE-native behaviors (inline completion quality, diff acceptance rates) that the Composer training loop explicitly optimizes for.
Honest framing: Composer 2 is probably the best model inside Cursor because it's trained for that environment. It's not a drop-in replacement for Claude Opus 4.7 in arbitrary coding contexts.
Custom GPU Kernels: Where 200 tok/s Comes From
Anysphere published technical details on Composer 2's inference stack. Key optimizations:
1. Custom MLA (Multi-Head Latent Attention) kernel — Anysphere rewrote the attention computation from scratch in CUDA, avoiding PyTorch overhead. Result: 1.7× faster attention at same quality.
2. Speculative decoding with draft model — A 300M-param draft model runs ahead, proposing 4-8 tokens per step that the full model validates in parallel. Hit rate ~65% in coding contexts, giving ~2× effective throughput.
3. Prompt prefix caching — Repository-level context is cached across turns, so a 100K-token project only gets encoded once per session.
Combined result: 200+ tok/s output throughput with ~150ms time-to-first-token on typical IDE workflows. For comparison:
| Model | Throughput | Time-to-first-token |
|---|---|---|
| Composer 2 (Cursor 3) | 200+ tok/s | 150ms |
| Claude Opus 4.7 (API) | 55-80 tok/s | 400-800ms |
| GPT-5.4 (API) | 70-100 tok/s | 300-600ms |
| Gemini 3.1 Flash | 180 tok/s | 200ms |
For an IDE where every keystroke matters, Composer 2's latency profile is a real UX advantage.
Parallel Agents + Background Cloud Agents
Cursor 3's headline agent features:
Parallel agents: You can run multiple agent tasks simultaneously in separate panes. Example: one agent refactors a module, another writes tests, a third updates documentation. Each runs independently in isolated git worktrees, merging back when done.
Background cloud agents: Long-running tasks (8-hour refactors, codebase-wide migrations) run on Anysphere's cloud infrastructure, surviving laptop closure. You get notifications on completion. This is Cursor's answer to Claude Code Routines.
Local-to-cloud handoff: Start a task locally, realize it'll take 30+ minutes, hit "run in cloud" to continue remotely without losing context.
Realistic limits:
- Parallel agents in same repo can conflict on shared files. Anysphere's merge resolution is usable but not perfect.
- Cloud agents cost extra (Cursor 3 Max tier, $200/month)
- Background agents have a 2-hour soft cap on free tier
Cursor 3 vs Claude Code vs Windsurf
| Feature | Cursor 3 | Claude Code | Windsurf |
|---|---|---|---|
| Default coding model | Composer 2 | Opus 4.7 | Your choice (Sonnet 4.6 default) |
| Price | $20/mo Pro, $200 Max | $20/mo Pro, |