TokenMix Research Lab · 2026-04-13

AI API for React Apps 2026: Streaming, useChat, 4 Providers

AI API for React Apps: How to Add AI to React with Streaming, useChat Hook, and Provider Comparison (2026)

Last Updated: 2026-04-29
Author: TokenMix Research Lab

Adding an AI API to a React app requires three things: a backend proxy to protect your API key, a streaming connection for real-time responses, and a display component that renders tokens as they arrive. This guide covers every approach -- plain fetch, the Vercel AI SDK useChat hook, and direct provider SDKs -- with working code for each. We compare OpenAI, Anthropic, Google, and DeepSeek for React integration, covering cost, latency, and developer experience. All data tracked by TokenMix.ai as of April 2026.

Quick Comparison: React AI Integration Approaches
Architecture: Why You Need a Backend Proxy
Method 1: Fetch API with Streaming Display
Method 2: Vercel AI SDK useChat Hook
Method 3: Direct Provider SDKs
AI Provider Comparison for React Apps
Building a Streaming Chat Component
Cost Estimation for React AI Features
How to Choose Your React AI Stack
Production Best Practices
Conclusion
FAQ

Quick Comparison: React AI Integration Approaches

Approach	Setup Time	Streaming	Provider Flexibility	Best For
Fetch API	20 min	Manual SSE parsing	Any provider	Simple integrations, full control
Vercel AI SDK (useChat)	10 min	Built-in	OpenAI, Anthropic, Google, Mistral	Chat UIs, rapid prototyping
Provider SDK (OpenAI)	15 min	SDK-managed	Single provider	OpenAI-specific features
Provider SDK (Anthropic)	15 min	SDK-managed	Single provider	Claude-specific features
TokenMix.ai API	10 min	OpenAI-compatible	300+ models	Multi-provider, cost optimization

Architecture: Why You Need a Backend Proxy

Never call AI APIs directly from React client code. Your API key would be visible in browser developer tools, network requests, and your JavaScript bundle.

The correct architecture:

React Client → Your Backend (API key stored here) → AI Provider API

Your backend can be:

An Express/Fastify server
A Next.js API route
A Cloudflare Worker
A Vercel Edge Function
A simple Node.js server

The backend does three things:

Receives the user's message from your React app
Attaches the API key and sends the request to the AI provider
Streams the response back to React

This pattern keeps your API key secure and lets you add rate limiting, logging, and cost controls on the backend.

For developers who want to skip building a custom backend, TokenMix.ai provides a managed proxy with built-in rate limiting, cost tracking, and multi-provider routing.

Method 1: Fetch API with Streaming Display

The fetch API approach gives you full control with zero dependencies. It works with any React setup and any AI provider.

Backend (Express example):

import express from 'express';
import OpenAI from 'openai';

const app = express();
app.use(express.json());

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

app.post('/api/chat', async (req, res) => {
  const { messages } = req.body;

  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  const stream = await client.chat.completions.create({
    model: 'gpt-4.1-mini',
    messages,
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    if (content) {
      res.write(`data: ${JSON.stringify({ content })}\n\n`);
    }
  }

  res.write('data: [DONE]\n\n');
  res.end();
});

React component with streaming:

import { useState, useCallback } from 'react';

function ChatApp() {
  const [messages, setMessages] = useState([]);
  const [input, setInput] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);

  const sendMessage = useCallback(async () => {
    const userMessage = { role: 'user', content: input };
    const updatedMessages = [...messages, userMessage];
    setMessages(updatedMessages);
    setInput('');
    setIsStreaming(true);

    const response = await fetch('/api/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ messages: updatedMessages }),
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    let assistantMessage = '';

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const text = decoder.decode(value);
      const lines = text.split('\n').filter(line => line.startsWith('data: '));

      for (const line of lines) {
        const data = line.slice(6);
        if (data === '[DONE]') break;
        const parsed = JSON.parse(data);
        assistantMessage += parsed.content;
        setMessages([...updatedMessages, { role: 'assistant', content: assistantMessage }]);
      }
    }

    setIsStreaming(false);
  }, [messages, input]);

  return (
    <div>
      {messages.map((m, i) => (
        <div key={i} className={m.role}>{m.content}</div>
      ))}
      <input value={input} onChange={e => setInput(e.target.value)}
        onKeyDown={e => e.key === 'Enter' && sendMessage()} />
    </div>
  );
}

This approach requires ~60 lines of custom code but works with any backend and any AI provider.

Method 2: Vercel AI SDK useChat Hook

The Vercel AI SDK reduces the React integration to a single hook. It handles streaming, message state, input management, and error handling.

Installation:

npm install ai @ai-sdk/openai

Backend (Next.js API route or standalone):

import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

export async function POST(req: Request) {
  const { messages } = await req.json();
  const result = streamText({
    model: openai('gpt-4.1-mini'),
    messages,
  });
  return result.toDataStreamResponse();
}

React component -- the entire thing:

'use client';
import { useChat } from 'ai/react';

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading, error } = useChat();

  return (
    <div>
      {messages.map(m => (
        <div key={m.id} className={m.role}>
          {m.content}
        </div>
      ))}
      {error && <div className="error">{error.message}</div>}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} disabled={isLoading} />
        <button type="submit" disabled={isLoading}>Send</button>
      </form>
    </div>
  );
}

The useChat hook provides:

messages: Array of all messages with roles and content
input / handleInputChange: Controlled input state
handleSubmit: Form submission handler
isLoading: Boolean for loading state
error: Error object for error handling
stop: Function to abort streaming
reload: Function to regenerate the last response
append: Function to add messages programmatically

Switching providers takes one line on the backend:

import { anthropic } from '@ai-sdk/anthropic';
// model: anthropic('claude-haiku-3.5')

import { google } from '@ai-sdk/google';
// model: google('gemini-2.0-flash')

The React component stays identical. This is the fastest path from zero to a working AI chat in React.

Method 3: Direct Provider SDKs

Use provider SDKs when you need provider-specific features like function calling, structured outputs, or vision capabilities.

OpenAI SDK with React (function calling example):

// Backend
const response = await client.chat.completions.create({
  model: 'gpt-4.1-mini',
  messages,
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get current weather for a city',
        parameters: {
          type: 'object',
          properties: {
            city: { type: 'string', description: 'City name' },
          },
          required: ['city'],
        },
      },
    },
  ],
  stream: true,
});

Anthropic SDK with React (streaming):

// Backend
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const stream = client.messages.stream({
  model: 'claude-haiku-3.5',
  max_tokens: 1024,
  messages,
});

for await (const event of stream) {
  if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
    res.write(`data: ${JSON.stringify({ content: event.delta.text })}\n\n`);
  }
}

Both SDKs use Server-Sent Events (SSE) for streaming. The React client-side code is the same regardless of which provider SDK you use on the backend. Read our streaming tutorial for detailed SSE implementation.

AI Provider Comparison for React Apps

TokenMix.ai benchmarks all major providers monthly. Here is how they compare for React app integration.

Dimension	OpenAI	Anthropic	Google	DeepSeek
Best Model for React	GPT-4.1 mini	Claude Haiku 3.5	Gemini 2.0 Flash	DeepSeek V4
Input Price	$0.40/M	$0.80/M	$0.10/M	$0.30/M
Output Price	$1.60/M	$4.00/M	$0.40/M	$1.20/M
TTFT (streaming)	0.3s	0.5s	0.4s	1.2s
Tokens/Second	120	90	150	60
SDK Quality	Excellent	Excellent	Good	Basic
React Examples	Extensive	Good	Limited	Minimal
Streaming Support	Native SSE	Native SSE	Native SSE	OpenAI-compatible
Function Calling	Yes	Yes (tools)	Yes	Yes
JSON Mode	Yes	Yes	Yes	Yes

Our recommendation for React apps:

Default choice: GPT-4.1 mini -- best SDK, most React examples, reliable streaming
Budget choice: Gemini 2.0 Flash -- cheapest option with good quality
Quality choice: Claude Haiku 3.5 -- best instruction following for UI interactions
Speed choice: GPT-4.1 mini or Groq -- fastest TTFT for chat UIs

Building a Streaming Chat Component

A production-ready React chat component needs markdown rendering, auto-scroll, typing indicators, and error recovery. Here is the pattern.

Key UX requirements:

Auto-scroll to the latest message during streaming
Markdown rendering for code blocks, lists, and formatting
Typing indicator while waiting for the first token
Stop button to cancel generation mid-stream
Error state with retry option

Component structure:

function ChatMessage({ message }) {
  return (
    <div className={`message ${message.role}`}>
      <ReactMarkdown>{message.content}</ReactMarkdown>
    </div>
  );
}

function TypingIndicator() {
  return <div className="typing">AI is thinking...</div>;
}

function ChatWindow({ messages, isLoading }) {
  const bottomRef = useRef(null);

  useEffect(() => {
    bottomRef.current?.scrollIntoView({ behavior: 'smooth' });
  }, [messages]);

  return (
    <div className="chat-window">
      {messages.map(m => <ChatMessage key={m.id} message={m} />)}
      {isLoading && <TypingIndicator />}
      <div ref={bottomRef} />
    </div>
  );
}

Performance tip: Use react-markdown with rehype-highlight for code syntax highlighting. Memoize message components with React.memo to prevent re-renders of previous messages during streaming.

Cost Estimation for React AI Features

AI features in React apps cost more than most developers expect. Here are realistic numbers.

Scenario: SaaS product with AI chat, 5,000 daily active users, 3 messages per session average.

Model	Daily Messages	Input Tokens/Day	Output Tokens/Day	Daily Cost	Monthly Cost
GPT-4.1 mini	15,000	7.5M	3M	$7.80	$234
Claude Haiku 3.5	15,000	7.5M	3M	$18.00	$540
Gemini 2.0 Flash	15,000	7.5M	3M	$1.95	$58.50
DeepSeek V4	15,000	7.5M	3M	$5.85	$175.50

Assumptions: 500 input tokens, 200 output tokens per message.

Cost reduction strategies for React apps:

Client-side deduplication -- debounce rapid re-submissions to avoid duplicate API calls
Response caching -- cache identical queries with a TTL (Redis or in-memory)
Model routing -- use a cheap model for simple queries, premium for complex ones
Token budgets -- set max_tokens to cap output length and prevent runaway costs

TokenMix.ai provides per-user cost tracking and automatic budget enforcement. See our GPT cost optimization guide for more tactics.

How to Choose Your React AI Stack

Your Situation	Recommended Stack	Why
Building a chat UI quickly	Vercel AI SDK + useChat	Fastest path, handles streaming
Need full control over requests	Fetch API + custom streaming	No dependencies, any provider
OpenAI-only with tools/functions	OpenAI SDK + custom backend	Best function calling support
Multiple providers, cost matters	TokenMix.ai API + Fetch	One endpoint, cheapest model routing
Existing Express/Fastify backend	Provider SDK + SSE middleware	Integrates with your existing API
Static site (Vite, CRA)	Fetch API + separate backend	No server framework dependency
Next.js app	Vercel AI SDK	Tightest integration

Production Best Practices

Security:

Never expose API keys in client-side code or environment variables prefixed with REACT_APP_ or VITE_
Validate and sanitize all user input on the backend before sending to AI providers
Implement rate limiting per user (10-30 requests per minute for chat)
Add content filtering for both input and output

Performance:

Enable streaming for all user-facing AI responses
Use AbortController to cancel in-flight requests when users navigate away
Implement request queuing to prevent concurrent requests from the same user
Set max_tokens to prevent unexpectedly long (and expensive) responses

Error handling:

Handle 429 (rate limit) with exponential backoff and user-friendly messages
Handle 500/503 (server error) with automatic retry (max 3 attempts)
Handle network errors with offline detection and queue-for-retry
Display meaningful error messages, not raw API error responses

Monitoring:

Log token usage per request for cost tracking
Monitor response latency (P50, P95, P99)
Set up alerts for error rate spikes and cost anomalies
Track user satisfaction metrics alongside AI usage

For a deeper dive into response time optimization, check our AI API response time comparison.

Conclusion

Adding AI to a React app is straightforward once you understand the proxy architecture. Use the Vercel AI SDK's useChat hook for the fastest path to a working chat UI. Use the Fetch API approach when you need full control or work with a non-Next.js backend.

For model choice, GPT-4.1 mini is the best default for React apps -- fast streaming, reliable SDK, and $0.40/M input pricing. For budget-sensitive projects, Gemini 2.0 Flash at $0.10/M input delivers 4x more tokens per dollar with comparable quality.

Track your AI costs from day one. Use TokenMix.ai to compare providers, monitor spending, and switch models without changing your React code. The cheapest model that meets your quality bar is always the right choice.

FAQ

Can I call an AI API directly from React without a backend?

No, not safely. Calling AI APIs directly from React exposes your API key in the browser. Anyone can inspect network requests and steal your key. Always use a backend proxy. The backend stores the API key securely, receives requests from React, forwards them to the AI provider, and streams responses back.

What is the best AI SDK for React apps?

The Vercel AI SDK is the best option for React. Its useChat hook handles streaming, message state, input management, and error handling in a single import. It supports OpenAI, Anthropic, Google, and Mistral with a one-line provider swap. For non-Next.js React apps, it works with any backend that implements the data stream protocol.

How much does it cost to add AI to a React app?

For a typical SaaS with 5,000 daily users and 3 AI interactions per session, monthly costs range from $58 (Gemini Flash) to $540 (Claude Haiku). GPT-4.1 mini sits at $234/month. These numbers assume 500 input tokens and 200 output tokens per interaction. Use TokenMix.ai for real-time cost estimation.

How do I handle streaming AI responses in React?

Use the Fetch API with ReadableStream or the Vercel AI SDK's useChat hook. The backend sends tokens via Server-Sent Events (SSE). React reads the stream chunk by chunk and updates state with each token. This gives users real-time feedback instead of waiting 2-5 seconds for a complete response.

Which AI provider has the fastest streaming for React?

Groq offers the fastest time-to-first-token at 0.15s with Llama models. Among major providers, OpenAI GPT-4.1 mini leads at 0.3s TTFT, followed by Google Gemini Flash at 0.4s. Anthropic Claude Haiku is 0.5s. DeepSeek is slowest at 1.2s. For chat UIs where perceived speed matters, choose OpenAI or Groq.

Can I use multiple AI providers in a single React app?

Yes. Your backend can route different requests to different providers. Use GPT-4.1 mini for general chat, Claude for content generation, and Gemini Flash for summarization. The React frontend does not need to know which provider handles each request. TokenMix.ai simplifies this with a single API endpoint that routes to 300+ models.

Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: Vercel AI SDK, OpenAI API, Anthropic API, TokenMix.ai