TokenMix Research Lab · 2026-04-13

AI API for React Apps: How to Add AI to React with Streaming, useChat Hook, and Provider Comparison (2026)
Last Updated: 2026-04-29
Author: TokenMix Research Lab
Adding an AI API to a React app requires three things: a backend proxy to protect your API key, a streaming connection for real-time responses, and a display component that renders tokens as they arrive. This guide covers every approach -- plain fetch, the Vercel AI SDK useChat hook, and direct provider SDKs -- with working code for each. We compare OpenAI, Anthropic, Google, and DeepSeek for React integration, covering cost, latency, and developer experience. All data tracked by TokenMix.ai as of April 2026.
Table of Contents
- Quick Comparison: React AI Integration Approaches
- Architecture: Why You Need a Backend Proxy
- Method 1: Fetch API with Streaming Display
- Method 2: Vercel AI SDK useChat Hook
- Method 3: Direct Provider SDKs
- AI Provider Comparison for React Apps
- Building a Streaming Chat Component
- Cost Estimation for React AI Features
- How to Choose Your React AI Stack
- Production Best Practices
- Conclusion
- FAQ
Quick Comparison: React AI Integration Approaches
| Approach | Setup Time | Streaming | Provider Flexibility | Best For |
|---|---|---|---|---|
| Fetch API | 20 min | Manual SSE parsing | Any provider | Simple integrations, full control |
| Vercel AI SDK (useChat) | 10 min | Built-in | OpenAI, Anthropic, Google, Mistral | Chat UIs, rapid prototyping |
| Provider SDK (OpenAI) | 15 min | SDK-managed | Single provider | OpenAI-specific features |
| Provider SDK (Anthropic) | 15 min | SDK-managed | Single provider | Claude-specific features |
| TokenMix.ai API | 10 min | OpenAI-compatible | 300+ models | Multi-provider, cost optimization |
Architecture: Why You Need a Backend Proxy
Never call AI APIs directly from React client code. Your API key would be visible in browser developer tools, network requests, and your JavaScript bundle.
The correct architecture:
React Client → Your Backend (API key stored here) → AI Provider API
Your backend can be:
- An Express/Fastify server
- A Next.js API route
- A Cloudflare Worker
- A Vercel Edge Function
- A simple Node.js server
The backend does three things:
- Receives the user's message from your React app
- Attaches the API key and sends the request to the AI provider
- Streams the response back to React
This pattern keeps your API key secure and lets you add rate limiting, logging, and cost controls on the backend.
For developers who want to skip building a custom backend, TokenMix.ai provides a managed proxy with built-in rate limiting, cost tracking, and multi-provider routing.
Method 1: Fetch API with Streaming Display
The fetch API approach gives you full control with zero dependencies. It works with any React setup and any AI provider.
Backend (Express example):
import express from 'express';
import OpenAI from 'openai';
const app = express();
app.use(express.json());
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
app.post('/api/chat', async (req, res) => {
const { messages } = req.body;
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
const stream = await client.chat.completions.create({
model: 'gpt-4.1-mini',
messages,
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
if (content) {
res.write(`data: ${JSON.stringify({ content })}\n\n`);
}
}
res.write('data: [DONE]\n\n');
res.end();
});
React component with streaming:
import { useState, useCallback } from 'react';
function ChatApp() {
const [messages, setMessages] = useState([]);
const [input, setInput] = useState('');
const [isStreaming, setIsStreaming] = useState(false);
const sendMessage = useCallback(async () => {
const userMessage = { role: 'user', content: input };
const updatedMessages = [...messages, userMessage];
setMessages(updatedMessages);
setInput('');
setIsStreaming(true);
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ messages: updatedMessages }),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let assistantMessage = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
const lines = text.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6);
if (data === '[DONE]') break;
const parsed = JSON.parse(data);
assistantMessage += parsed.content;
setMessages([...updatedMessages, { role: 'assistant', content: assistantMessage }]);
}
}
setIsStreaming(false);
}, [messages, input]);
return (
<div>
{messages.map((m, i) => (
<div key={i} className={m.role}>{m.content}</div>
))}
<input value={input} onChange={e => setInput(e.target.value)}
onKeyDown={e => e.key === 'Enter' && sendMessage()} />
</div>
);
}
This approach requires ~60 lines of custom code but works with any backend and any AI provider.
Method 2: Vercel AI SDK useChat Hook
The Vercel AI SDK reduces the React integration to a single hook. It handles streaming, message state, input management, and error handling.
Installation:
npm install ai @ai-sdk/openai
Backend (Next.js API route or standalone):
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
export async function POST(req: Request) {
const { messages } = await req.json();
const result = streamText({
model: openai('gpt-4.1-mini'),
messages,
});
return result.toDataStreamResponse();
}
React component -- the entire thing:
'use client';
import { useChat } from 'ai/react';
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit, isLoading, error } = useChat();
return (
<div>
{messages.map(m => (
<div key={m.id} className={m.role}>
{m.content}
</div>
))}
{error && <div className="error">{error.message}</div>}
<form onSubmit={handleSubmit}>
<input value={input} onChange={handleInputChange} disabled={isLoading} />
<button type="submit" disabled={isLoading}>Send</button>
</form>
</div>
);
}
The useChat hook provides:
messages: Array of all messages with roles and contentinput/handleInputChange: Controlled input statehandleSubmit: Form submission handlerisLoading: Boolean for loading stateerror: Error object for error handlingstop: Function to abort streamingreload: Function to regenerate the last responseappend: Function to add messages programmatically
Switching providers takes one line on the backend:
import { anthropic } from '@ai-sdk/anthropic';
// model: anthropic('claude-haiku-3.5')
import { google } from '@ai-sdk/google';
// model: google('gemini-2.0-flash')
The React component stays identical. This is the fastest path from zero to a working AI chat in React.
Method 3: Direct Provider SDKs
Use provider SDKs when you need provider-specific features like function calling, structured outputs, or vision capabilities.
OpenAI SDK with React (function calling example):
// Backend
const response = await client.chat.completions.create({
model: 'gpt-4.1-mini',
messages,
tools: [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get current weather for a city',
parameters: {
type: 'object',
properties: {
city: { type: 'string', description: 'City name' },
},
required: ['city'],
},
},
},
],
stream: true,
});
Anthropic SDK with React (streaming):
// Backend
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const stream = client.messages.stream({
model: 'claude-haiku-3.5',
max_tokens: 1024,
messages,
});
for await (const event of stream) {
if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
res.write(`data: ${JSON.stringify({ content: event.delta.text })}\n\n`);
}
}
Both SDKs use Server-Sent Events (SSE) for streaming. The React client-side code is the same regardless of which provider SDK you use on the backend. Read our streaming tutorial for detailed SSE implementation.
AI Provider Comparison for React Apps
TokenMix.ai benchmarks all major providers monthly. Here is how they compare for React app integration.
| Dimension | OpenAI | Anthropic | DeepSeek | |
|---|---|---|---|---|
| Best Model for React | GPT-4.1 mini | Claude Haiku 3.5 | Gemini 2.0 Flash | DeepSeek V4 |
| Input Price | $0.40/M | $0.80/M | $0.10/M | $0.30/M |
| Output Price | $1.60/M | $4.00/M | $0.40/M | $1.20/M |
| TTFT (streaming) | 0.3s | 0.5s | 0.4s | 1.2s |
| Tokens/Second | 120 | 90 | 150 | 60 |
| SDK Quality | Excellent | Excellent | Good | Basic |
| React Examples | Extensive | Good | Limited | Minimal |
| Streaming Support | Native SSE | Native SSE | Native SSE | OpenAI-compatible |
| Function Calling | Yes | Yes (tools) | Yes | Yes |
| JSON Mode | Yes | Yes | Yes | Yes |
Our recommendation for React apps:
- Default choice: GPT-4.1 mini -- best SDK, most React examples, reliable streaming
- Budget choice: Gemini 2.0 Flash -- cheapest option with good quality
- Quality choice: Claude Haiku 3.5 -- best instruction following for UI interactions
- Speed choice: GPT-4.1 mini or Groq -- fastest TTFT for chat UIs
Building a Streaming Chat Component
A production-ready React chat component needs markdown rendering, auto-scroll, typing indicators, and error recovery. Here is the pattern.
Key UX requirements:
- Auto-scroll to the latest message during streaming
- Markdown rendering for code blocks, lists, and formatting
- Typing indicator while waiting for the first token
- Stop button to cancel generation mid-stream
- Error state with retry option
Component structure:
function ChatMessage({ message }) {
return (
<div className={`message ${message.role}`}>
<ReactMarkdown>{message.content}</ReactMarkdown>
</div>
);
}
function TypingIndicator() {
return <div className="typing">AI is thinking...</div>;
}
function ChatWindow({ messages, isLoading }) {
const bottomRef = useRef(null);
useEffect(() => {
bottomRef.current?.scrollIntoView({ behavior: 'smooth' });
}, [messages]);
return (
<div className="chat-window">
{messages.map(m => <ChatMessage key={m.id} message={m} />)}
{isLoading && <TypingIndicator />}
<div ref={bottomRef} />
</div>
);
}
Performance tip: Use react-markdown with rehype-highlight for code syntax highlighting. Memoize message components with React.memo to prevent re-renders of previous messages during streaming.
Cost Estimation for React AI Features
AI features in React apps cost more than most developers expect. Here are realistic numbers.
Scenario: SaaS product with AI chat, 5,000 daily active users, 3 messages per session average.
| Model | Daily Messages | Input Tokens/Day | Output Tokens/Day | Daily Cost | Monthly Cost |
|---|---|---|---|---|---|
| GPT-4.1 mini | 15,000 | 7.5M | 3M | $7.80 | $234 |
| Claude Haiku 3.5 | 15,000 | 7.5M | 3M | $18.00 | $540 |
| Gemini 2.0 Flash | 15,000 | 7.5M | 3M | $1.95 | $58.50 |
| DeepSeek V4 | 15,000 | 7.5M | 3M | $5.85 | $175.50 |
Assumptions: 500 input tokens, 200 output tokens per message.
Cost reduction strategies for React apps:
- Client-side deduplication -- debounce rapid re-submissions to avoid duplicate API calls
- Response caching -- cache identical queries with a TTL (Redis or in-memory)
- Model routing -- use a cheap model for simple queries, premium for complex ones
- Token budgets -- set
max_tokensto cap output length and prevent runaway costs
TokenMix.ai provides per-user cost tracking and automatic budget enforcement. See our GPT cost optimization guide for more tactics.
How to Choose Your React AI Stack
| Your Situation | Recommended Stack | Why |
|---|---|---|
| Building a chat UI quickly | Vercel AI SDK + useChat | Fastest path, handles streaming |
| Need full control over requests | Fetch API + custom streaming | No dependencies, any provider |
| OpenAI-only with tools/functions | OpenAI SDK + custom backend | Best function calling support |
| Multiple providers, cost matters | TokenMix.ai API + Fetch | One endpoint, cheapest model routing |
| Existing Express/Fastify backend | Provider SDK + SSE middleware | Integrates with your existing API |
| Static site (Vite, CRA) | Fetch API + separate backend | No server framework dependency |
| Next.js app | Vercel AI SDK | Tightest integration |
Production Best Practices
Security:
- Never expose API keys in client-side code or environment variables prefixed with
REACT_APP_orVITE_ - Validate and sanitize all user input on the backend before sending to AI providers
- Implement rate limiting per user (10-30 requests per minute for chat)
- Add content filtering for both input and output
Performance:
- Enable streaming for all user-facing AI responses
- Use
AbortControllerto cancel in-flight requests when users navigate away - Implement request queuing to prevent concurrent requests from the same user
- Set
max_tokensto prevent unexpectedly long (and expensive) responses
Error handling:
- Handle 429 (rate limit) with exponential backoff and user-friendly messages
- Handle 500/503 (server error) with automatic retry (max 3 attempts)
- Handle network errors with offline detection and queue-for-retry
- Display meaningful error messages, not raw API error responses
Monitoring:
- Log token usage per request for cost tracking
- Monitor response latency (P50, P95, P99)
- Set up alerts for error rate spikes and cost anomalies
- Track user satisfaction metrics alongside AI usage
For a deeper dive into response time optimization, check our AI API response time comparison.
Conclusion
Adding AI to a React app is straightforward once you understand the proxy architecture. Use the Vercel AI SDK's useChat hook for the fastest path to a working chat UI. Use the Fetch API approach when you need full control or work with a non-Next.js backend.
For model choice, GPT-4.1 mini is the best default for React apps -- fast streaming, reliable SDK, and $0.40/M input pricing. For budget-sensitive projects, Gemini 2.0 Flash at $0.10/M input delivers 4x more tokens per dollar with comparable quality.
Track your AI costs from day one. Use TokenMix.ai to compare providers, monitor spending, and switch models without changing your React code. The cheapest model that meets your quality bar is always the right choice.
FAQ
Can I call an AI API directly from React without a backend?
No, not safely. Calling AI APIs directly from React exposes your API key in the browser. Anyone can inspect network requests and steal your key. Always use a backend proxy. The backend stores the API key securely, receives requests from React, forwards them to the AI provider, and streams responses back.
What is the best AI SDK for React apps?
The Vercel AI SDK is the best option for React. Its useChat hook handles streaming, message state, input management, and error handling in a single import. It supports OpenAI, Anthropic, Google, and Mistral with a one-line provider swap. For non-Next.js React apps, it works with any backend that implements the data stream protocol.
How much does it cost to add AI to a React app?
For a typical SaaS with 5,000 daily users and 3 AI interactions per session, monthly costs range from $58 (Gemini Flash) to $540 (Claude Haiku). GPT-4.1 mini sits at $234/month. These numbers assume 500 input tokens and 200 output tokens per interaction. Use TokenMix.ai for real-time cost estimation.
How do I handle streaming AI responses in React?
Use the Fetch API with ReadableStream or the Vercel AI SDK's useChat hook. The backend sends tokens via Server-Sent Events (SSE). React reads the stream chunk by chunk and updates state with each token. This gives users real-time feedback instead of waiting 2-5 seconds for a complete response.
Which AI provider has the fastest streaming for React?
Groq offers the fastest time-to-first-token at 0.15s with Llama models. Among major providers, OpenAI GPT-4.1 mini leads at 0.3s TTFT, followed by Google Gemini Flash at 0.4s. Anthropic Claude Haiku is 0.5s. DeepSeek is slowest at 1.2s. For chat UIs where perceived speed matters, choose OpenAI or Groq.
Can I use multiple AI providers in a single React app?
Yes. Your backend can route different requests to different providers. Use GPT-4.1 mini for general chat, Claude for content generation, and Gemini Flash for summarization. The React frontend does not need to know which provider handles each request. TokenMix.ai simplifies this with a single API endpoint that routes to 300+ models.
Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: Vercel AI SDK, OpenAI API, Anthropic API, TokenMix.ai