TokenMix Research Lab · 2026-04-13

AI API for React Apps 2026: Streaming, useChat, 4 Providers

AI API for React Apps: How to Add AI to React with Streaming, useChat Hook, and Provider Comparison (2026)

Adding an AI API to a React app requires three things: a backend proxy to protect your API key, a streaming connection for real-time responses, and a display component that renders tokens as they arrive. This guide covers every approach -- plain fetch, the Vercel AI SDK useChat hook, and direct provider SDKs -- with working code for each. We compare OpenAI, Anthropic, Google, and DeepSeek for React integration, covering cost, latency, and developer experience. All data tracked by TokenMix.ai as of April 2026.

Table of Contents


Quick Comparison: React AI Integration Approaches

Approach Setup Time Streaming Provider Flexibility Best For
Fetch API 20 min Manual SSE parsing Any provider Simple integrations, full control
Vercel AI SDK (useChat) 10 min Built-in OpenAI, Anthropic, Google, Mistral Chat UIs, rapid prototyping
Provider SDK (OpenAI) 15 min SDK-managed Single provider OpenAI-specific features
Provider SDK (Anthropic) 15 min SDK-managed Single provider Claude-specific features
TokenMix.ai API 10 min OpenAI-compatible 300+ models Multi-provider, cost optimization

Architecture: Why You Need a Backend Proxy

Never call AI APIs directly from React client code. Your API key would be visible in browser developer tools, network requests, and your JavaScript bundle.

The correct architecture:

React Client → Your Backend (API key stored here) → AI Provider API

Your backend can be:

The backend does three things:

  1. Receives the user's message from your React app
  2. Attaches the API key and sends the request to the AI provider
  3. Streams the response back to React

This pattern keeps your API key secure and lets you add rate limiting, logging, and cost controls on the backend.

For developers who want to skip building a custom backend, TokenMix.ai provides a managed proxy with built-in rate limiting, cost tracking, and multi-provider routing.


Method 1: Fetch API with Streaming Display

The fetch API approach gives you full control with zero dependencies. It works with any React setup and any AI provider.

Backend (Express example):

import express from 'express';
import OpenAI from 'openai';

const app = express();
app.use(express.json());

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

app.post('/api/chat', async (req, res) => {
  const { messages } = req.body;

  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  const stream = await client.chat.completions.create({
    model: 'gpt-4.1-mini',
    messages,
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    if (content) {
      res.write(`data: ${JSON.stringify({ content })}\n\n`);
    }
  }

  res.write('data: [DONE]\n\n');
  res.end();
});

React component with streaming:

import { useState, useCallback } from 'react';

function ChatApp() {
  const [messages, setMessages] = useState([]);
  const [input, setInput] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);

  const sendMessage = useCallback(async () => {
    const userMessage = { role: 'user', content: input };
    const updatedMessages = [...messages, userMessage];
    setMessages(updatedMessages);
    setInput('');
    setIsStreaming(true);

    const response = await fetch('/api/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ messages: updatedMessages }),
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    let assistantMessage = '';

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const text = decoder.decode(value);
      const lines = text.split('\n').filter(line => line.startsWith('data: '));

      for (const line of lines) {
        const data = line.slice(6);
        if (data === '[DONE]') break;
        const parsed = JSON.parse(data);
        assistantMessage += parsed.content;
        setMessages([...updatedMessages, { role: 'assistant', content: assistantMessage }]);
      }
    }

    setIsStreaming(false);
  }, [messages, input]);

  return (
    <div>
      {messages.map((m, i) => (
        <div key={i} className={m.role}>{m.content}</div>
      ))}
      <input value={input} onChange={e => setInput(e.target.value)}
        onKeyDown={e => e.key === 'Enter' && sendMessage()} />
    </div>
  );
}

This approach requires ~60 lines of custom code but works with any backend and any AI provider.


Method 2: Vercel AI SDK useChat Hook

The Vercel AI SDK reduces the React integration to a single hook. It handles streaming, message state, input management, and error handling.

Installation:

npm install ai @ai-sdk/openai

Backend (Next.js API route or standalone):

import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

export async function POST(req: Request) {
  const { messages } = await req.json();
  const result = streamText({
    model: openai('gpt-4.1-mini'),
    messages,
  });
  return result.toDataStreamResponse();
}

React component -- the entire thing:

'use client';
import { useChat } from 'ai/react';

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading, error } = useChat();

  return (
    <div>
      {messages.map(m => (
        <div key={m.id} className={m.role}>
          {m.content}
        </div>
      ))}
      {error && <div className="error">{error.message}</div>}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} disabled={isLoading} />
        <button type="submit" disabled={isLoading}>Send</button>
      </form>
    </div>
  );
}

The useChat hook provides:

Switching providers takes one line on the backend:

import { anthropic } from '@ai-sdk/anthropic';
// model: anthropic('claude-haiku-3.5')

import { google } from '@ai-sdk/google';
// model: google('gemini-2.0-flash')

The React component stays identical. This is the fastest path from zero to a working AI chat in React.


Method 3: Direct Provider SDKs

Use provider SDKs when you need provider-specific features like function calling, structured outputs, or vision capabilities.

OpenAI SDK with React (function calling example):

// Backend
const response = await client.chat.completions.create({
  model: 'gpt-4.1-mini',
  messages,
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get current weather for a city',
        parameters: {
          type: 'object',
          properties: {
            city: { type: 'string', description: 'City name' },
          },
          required: ['city'],
        },
      },
    },
  ],
  stream: true,
});

Anthropic SDK with React (streaming):

// Backend
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const stream = client.messages.stream({
  model: 'claude-haiku-3.5',
  max_tokens: 1024,
  messages,
});

for await (const event of stream) {
  if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
    res.write(`data: ${JSON.stringify({ content: event.delta.text })}\n\n`);
  }
}

Both SDKs use Server-Sent Events (SSE) for streaming. The React client-side code is the same regardless of which provider SDK you use on the backend. Read our streaming tutorial for detailed SSE implementation.


AI Provider Comparison for React Apps

TokenMix.ai benchmarks all major providers monthly. Here is how they compare for React app integration.

Dimension OpenAI Anthropic Google DeepSeek
Best Model for React GPT-4.1 mini Claude Haiku 3.5 Gemini 2.0 Flash DeepSeek V4
Input Price $0.40/M $0.80/M $0.10/M $0.30/M
Output Price .60/M $4.00/M $0.40/M .20/M
TTFT (streaming) 0.3s 0.5s 0.4s 1.2s
Tokens/Second 120 90 150 60
SDK Quality Excellent Excellent Good Basic
React Examples Extensive Good Limited Minimal
Streaming Support Native SSE Native SSE Native SSE OpenAI-compatible
Function Calling Yes Yes (tools) Yes Yes
JSON Mode Yes Yes Yes Yes

Our recommendation for React apps:


Building a Streaming Chat Component

A production-ready React chat component needs markdown rendering, auto-scroll, typing indicators, and error recovery. Here is the pattern.

Key UX requirements:

  1. Auto-scroll to the latest message during streaming
  2. Markdown rendering for code blocks, lists, and formatting
  3. Typing indicator while waiting for the first token
  4. Stop button to cancel generation mid-stream
  5. Error state with retry option

Component structure:

function ChatMessage({ message }) {
  return (
    <div className={`message ${message.role}`}>
      <ReactMarkdown>{message.content}</ReactMarkdown>
    </div>
  );
}

function TypingIndicator() {
  return <div className="typing">AI is thinking...</div>;
}

function ChatWindow({ messages, isLoading }) {
  const bottomRef = useRef(null);

  useEffect(() => {
    bottomRef.current?.scrollIntoView({ behavior: 'smooth' });
  }, [messages]);

  return (
    <div className="chat-window">
      {messages.map(m => <ChatMessage key={m.id} message={m} />)}
      {isLoading && <TypingIndicator />}
      <div ref={bottomRef} />
    </div>
  );
}

Performance tip: Use react-markdown with rehype-highlight for code syntax highlighting. Memoize message components with React.memo to prevent re-renders of previous messages during streaming.


Cost Estimation for React AI Features

AI features in React apps cost more than most developers expect. Here are realistic numbers.

Scenario: SaaS product with AI chat, 5,000 daily active users, 3 messages per session average.

Model Daily Messages Input Tokens/Day Output Tokens/Day Daily Cost Monthly Cost
GPT-4.1 mini 15,000 7.5M 3M $7.80 $234
Claude Haiku 3.5 15,000 7.5M 3M 8.00 $540
Gemini 2.0 Flash 15,000 7.5M 3M .95 $58.50
DeepSeek V4 15,000 7.5M 3M $5.85 75.50

Assumptions: 500 input tokens, 200 output tokens per message.

Cost reduction strategies for React apps:

  1. Client-side deduplication -- debounce rapid re-submissions to avoid duplicate API calls
  2. Response caching -- cache identical queries with a TTL (Redis or in-memory)
  3. Model routing -- use a cheap model for simple queries, premium for complex ones
  4. Token budgets -- set max_tokens to cap output length and prevent runaway costs

TokenMix.ai provides per-user cost tracking and automatic budget enforcement. See our GPT cost optimization guide for more tactics.


How to Choose Your React AI Stack

Your Situation Recommended Stack Why
Building a chat UI quickly Vercel AI SDK + useChat Fastest path, handles streaming
Need full control over requests Fetch API + custom streaming No dependencies, any provider
OpenAI-only with tools/functions OpenAI SDK + custom backend Best function calling support
Multiple providers, cost matters TokenMix.ai API + Fetch One endpoint, cheapest model routing
Existing Express/Fastify backend Provider SDK + SSE middleware Integrates with your existing API
Static site (Vite, CRA) Fetch API + separate backend No server framework dependency
Next.js app Vercel AI SDK Tightest integration

Production Best Practices

Security:

Performance:

Error handling:

Monitoring:

For a deeper dive into response time optimization, check our AI API response time comparison.


Conclusion

Adding AI to a React app is straightforward once you understand the proxy architecture. Use the Vercel AI SDK's useChat hook for the fastest path to a working chat UI. Use the Fetch API approach when you need full control or work with a non-Next.js backend.

For model choice, GPT-4.1 mini is the best default for React apps -- fast streaming, reliable SDK, and $0.40/M input pricing. For budget-sensitive projects, Gemini 2.0 Flash at $0.10/M input delivers 4x more tokens per dollar with comparable quality.

Track your AI costs from day one. Use TokenMix.ai to compare providers, monitor spending, and switch models without changing your React code. The cheapest model that meets your quality bar is always the right choice.


FAQ

Can I call an AI API directly from React without a backend?

No, not safely. Calling AI APIs directly from React exposes your API key in the browser. Anyone can inspect network requests and steal your key. Always use a backend proxy. The backend stores the API key securely, receives requests from React, forwards them to the AI provider, and streams responses back.

What is the best AI SDK for React apps?

The Vercel AI SDK is the best option for React. Its useChat hook handles streaming, message state, input management, and error handling in a single import. It supports OpenAI, Anthropic, Google, and Mistral with a one-line provider swap. For non-Next.js React apps, it works with any backend that implements the data stream protocol.

How much does it cost to add AI to a React app?

For a typical SaaS with 5,000 daily users and 3 AI interactions per session, monthly costs range from $58 (Gemini Flash) to $540 (Claude Haiku). GPT-4.1 mini sits at $234/month. These numbers assume 500 input tokens and 200 output tokens per interaction. Use TokenMix.ai for real-time cost estimation.

How do I handle streaming AI responses in React?

Use the Fetch API with ReadableStream or the Vercel AI SDK's useChat hook. The backend sends tokens via Server-Sent Events (SSE). React reads the stream chunk by chunk and updates state with each token. This gives users real-time feedback instead of waiting 2-5 seconds for a complete response.

Which AI provider has the fastest streaming for React?

Groq offers the fastest time-to-first-token at 0.15s with Llama models. Among major providers, OpenAI GPT-4.1 mini leads at 0.3s TTFT, followed by Google Gemini Flash at 0.4s. Anthropic Claude Haiku is 0.5s. DeepSeek is slowest at 1.2s. For chat UIs where perceived speed matters, choose OpenAI or Groq.

Can I use multiple AI providers in a single React app?

Yes. Your backend can route different requests to different providers. Use GPT-4.1 mini for general chat, Claude for content generation, and Gemini Flash for summarization. The React frontend does not need to know which provider handles each request. TokenMix.ai simplifies this with a single API endpoint that routes to 300+ models.


Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: Vercel AI SDK, OpenAI API, Anthropic API, TokenMix.ai