TokenMix Research Lab · 2026-04-25

seed-oss (ByteDance): Open-Source 512K Context Deep Dive (2026)
ByteDance's Seed-OSS is the TikTok parent company's serious entry into open-source language models. The flagship variant Seed-OSS-36B-Instruct delivers SOTA-level open-source results: 91.7% on AIME24 math, 67.4% on LiveCodeBench v6, 65.1% on MMLU-Pro, 81.7% on MATH, and a native 512K context window with 94.6 RULER score at 128K. All released under Apache 2.0 license for free commercial use. This guide covers what makes Seed-OSS significant in the open-weight landscape, the three variants, controllable thinking budget feature, and deployment considerations. Verified against ByteDance-Seed's official documentation and Hugging Face releases as of April 2026.
Table of Contents
- What Seed-OSS Is
- Three Variants Explained
- The 512K Native Context
- Controllable Thinking Budget
- Benchmark Performance (SOTA Open-Source)
- Pricing and Deployment
- Supported LLM Providers and Model Routing
- When to Use Seed-OSS-36B
- Seed-OSS vs DeepSeek V4 vs Kimi K2.6
- Known Limitations
- FAQ
What Seed-OSS Is
ByteDance's Seed Team's open-source LLM series, released under Apache 2.0 license. Represents ByteDance's strategic entry into the open-weight arms race against DeepSeek, Alibaba (Qwen), and Moonshot (Kimi).
Key attributes:
| Attribute | Value |
|---|---|
| Creator | ByteDance Seed Team |
| Family name | Seed-OSS |
| Flagship size | 36B parameters |
| Training tokens | 12 trillion |
| Context window | 512K native |
| License | Apache 2.0 (commercial use allowed) |
| Distribution | Hugging Face, GitHub |
| Thinking budget | Controllable (novel feature) |
Three Variants Explained
ByteDance released three Seed-OSS-36B variants simultaneously:
1. Seed-OSS-36B-Base (synthetic data trained): pre-trained on synthetic data mixtures. Strongest on benchmarks. Delivers 65.1 on MMLU-Pro and 81.7 on MATH (both SOTA for size).
2. Seed-OSS-36B-Base (no synthetic data): pre-trained on real data only. Slightly lower benchmark performance but preferred for research transparency.
3. Seed-OSS-36B-Instruct: instruction-tuned for task execution. The production-ready variant. Scores 91.7 on AIME24, 67.4 on LiveCodeBench v6.
For most production use, pick Seed-OSS-36B-Instruct. The Base variants are for fine-tuning or research.
The 512K Native Context
Seed-OSS-36B trains natively on 512K context — one of the largest context windows in any open-weight model.
Verified context quality:
- RULER at 128K: 94.6 (highest open-source result reported)
- Strong performance beyond 128K is plausible but less benchmark coverage
Why native 512K matters:
Many models advertise long context via extrapolation techniques (RoPE scaling) that degrade quality. Native training at 512K means the model was actually trained on sequences this long — effective context is closer to advertised context.
Practical implications:
- Long-document analysis without chunking
- Large codebase comprehension
- Multi-file technical analysis
- Extended conversation history retention
Competitive context landscape:
| Model | Advertised | Native / Effective |
|---|---|---|
| Seed-OSS-36B | 512K | 512K native |
| Llama 4 Scout | 10M | ~500K effective |
| Kimi K2.6 | 1M | ~700K effective |
| DeepSeek V4-Pro | 1M | ~700K effective |
| Claude Opus 4.7 | 1M | ~800K effective |
| Gemini 3.1 Pro | 2M | ~1.5M effective |
Seed-OSS-36B is in the "honest 500K" tier — not the largest advertised, but genuinely usable up to its claimed limit.
Controllable Thinking Budget
A novel feature in Seed-OSS: users can dynamically control reasoning length.
Traditional thinking models (DeepSeek R1, o3, ERNIE-4.5-21B-Thinking) produce extended reasoning that may exceed needs. Seed-OSS adds a thinking budget parameter — you specify how much reasoning effort, the model adjusts.
Benefits:
- Faster responses when deep reasoning isn't needed
- Deeper reasoning when complex problems warrant
- Cost control (thinking tokens = output tokens = money)
Example usage (conceptual):
# Quick response for simple query
response_fast = model.generate(prompt, thinking_budget=100) # 100 tokens max thinking
# Deep reasoning for complex problem
response_deep = model.generate(prompt, thinking_budget=5000) # 5K tokens thinking
Exact parameter names depend on your inference framework.
Benchmark Performance (SOTA Open-Source)
Seed-OSS-36B-Instruct benchmark highlights:
| Benchmark | Seed-OSS-36B-Instruct | Notes |
|---|---|---|
| AIME24 (math olympiad) | 91.7% | Open-source SOTA |
| BeyondAIME | 65 | Open-source SOTA |
| LiveCodeBench v6 | 67.4 | Open-source SOTA |
| RULER @ 128K | 94.6 | Highest open-source reported |
| MMLU-Pro (Base variant) | 65.1 | SOTA for size |
| MATH (Base variant) | 81.7 | SOTA for size |
The honest comparison:
- vs closed frontier (Claude Opus 4.7, GPT-5.5): Seed-OSS lags on comprehensive benchmarks but competes on specific math/reasoning
- vs other open-weight (DeepSeek V4-Pro, Kimi K2.6, Qwen3-next): Seed-OSS is competitive to leading
For its 36B parameter size, Seed-OSS-36B-Instruct is genuinely impressive. Larger MoE models like DeepSeek V4-Pro or Kimi K2.6 have more total capability but require significantly more infrastructure.
Pricing and Deployment
Open-weight = free + infrastructure cost:
- Download from Hugging Face
- Apache 2.0 license — commercial use allowed
- Self-host on your infrastructure
Typical hardware for 36B model:
- Single A100 80GB (FP16): comfortable, ~100-150 tok/s
- 2× A100 40GB with tensor parallelism: works
- Single H100 80GB: fastest single-GPU, ~180-220 tok/s
- 2× RTX 4090 (48GB, 4-bit quantized): consumer option, ~60-80 tok/s
Hosted alternatives:
- OpenRouter, Together AI, Fireworks may offer Seed-OSS hosting
- Pricing typically ranges $0.20-1.00 input / $0.50-3.00 output per MTok depending on provider
Supported LLM Providers and Model Routing
Seed-OSS-36B is accessible via:
- Hugging Face (download for self-hosting)
- GitHub (ByteDance-Seed/seed-oss) — source, inference code
- OpenRouter and similar hosted providers
- OpenAI-compatible aggregators — TokenMix.ai, and similar
Through TokenMix.ai, Seed-OSS-36B is accessible (where hosted) alongside DeepSeek V4-Pro, Kimi K2.6, qwen3-next-80b-a3b, Llama 4, Claude Opus 4.7, GPT-5.5, and 300+ other models through a single OpenAI-compatible API key. Useful for comparing ByteDance's entry against established open-weight players on your specific workloads.
Basic usage:
from openai import OpenAI
client = OpenAI(
api_key="your-tokenmix-key",
base_url="https://api.tokenmix.ai/v1",
)
response = client.chat.completions.create(
model="seed-oss-36b-instruct",
messages=[{"role": "user", "content": "Solve this AIME problem..."}],
)
When to Use Seed-OSS-36B
Strong fit:
- Open-weight requirement for commercial use
- Math and reasoning-heavy workloads
- Long-context tasks up to 512K tokens
- Teams with 36B-capable infrastructure (A100 80GB or equivalent)
- Fine-tuning for specialized domains (Apache 2.0 allows)
Weak fit:
- Frontier benchmark ceiling (closed models still ahead)
- Sub-20GB VRAM deployment (36B is significant hardware)
- Agent-specific workloads where Kimi K2.6's swarm support wins
- Team has no ML infrastructure — use hosted API instead
Seed-OSS vs DeepSeek V4 vs Kimi K2.6
Chinese open-weight landscape comparison:
| Dimension | Seed-OSS-36B | DeepSeek V4-Pro | Kimi K2.6 |
|---|---|---|---|
| Total params | 36B dense | ~671B MoE | 1T MoE |
| Active params | 36B | ~37B | 32B |
| Context native | 512K | 1M | 1M |
| AIME24 | 91.7% | — | — |
| LiveCodeBench v6 | 67.4 | — | — |
| SWE-Bench Verified | moderate | ~85% | 80.2% |
| Agent swarm | standard | standard | native (300 sub-agents) |
| Open-weight license | Apache 2.0 | Apache 2.0 | Open-weight |
| Hosted price range | ~$0.20-1.00 input |