GPT-OSS 120B

by OpenAI · chat

OpenAI's open-weight mixture-of-experts model with 120B total parameters (5.1B active per token), released under Apache 2.0. Features configurable chain-of-thought reasoning and runs on a single 80GB GPU via MXFP4 quantization.

Pricing

Input: $0.1425/M tokens · Output: $0.57/M tokens

Capabilities

Function Calling, JSON Mode, Streaming, Reasoning

Context: 131K tokens

Max output: 131K tokens

Routes: 2/2 healthy

Performance

TTFT: 680ms · Latency: 2335ms · Throughput: 436.3 tok/s