TokenMix Research Lab · 2026-04-25

Firecrawl MCP Server: Web Scraping via MCP for AI Agents (2026)

Firecrawl MCP Server: Web Scraping via MCP (2026)

The Firecrawl MCP server gives AI agents structured web scraping through Model Context Protocol — letting Claude, GPT-5.5, DeepSeek V4, Kimi K2.6, and any MCP-compatible client crawl websites, extract content, and return clean markdown without separate API integrations. Firecrawl's value vs rolling your own scraper: JavaScript rendering, anti-bot handling, markdown extraction, and structured output all built in. This guide covers setup, pricing, production use cases, and the trade-offs vs alternatives like Tavily, Jina, and direct HTTP fetching. Tested against Firecrawl MCP v1.2 (April 2026).

What Firecrawl MCP Does

Exposes four main capabilities as MCP tools:

firecrawl_scrape — single-URL scraping returning markdown

Handles JavaScript rendering
Strips navigation, ads, boilerplate
Returns clean markdown + metadata

firecrawl_crawl — multi-page site crawling

Configurable depth and path filters
Parallel execution
Returns structured map of site content

firecrawl_search — search-engine integration

Query → URL list → scrape top results
Useful for LLM agents doing research

firecrawl_extract — structured extraction via LLM

Define a schema, extract fields from arbitrary pages
LLM-powered field matching

All invokable as tools from any MCP client, with proper JSON schemas for argument validation.

Why MCP + Firecrawl vs Direct API

You could call Firecrawl's REST API directly from agent code. MCP adds:

Tool discovery (LLM sees available tools via MCP protocol)
Consistent interface across different LLM clients
No per-agent integration code — one MCP server serves all
Cleaner prompt management — scraping becomes a tool call, not embedded API logic

For one-off scripts, direct API is simpler. For production agent workflows, MCP wins.

Installation

Node.js (Official)

npm install -g @mendable/firecrawl-mcp

Get a Firecrawl API key at firecrawl.dev/signup. Free tier includes 500 credits/month.

export FIRECRAWL_API_KEY="your-key-here"
firecrawl-mcp

Docker

docker run -i --rm \
  -e FIRECRAWL_API_KEY=your-key \
  mendable/firecrawl-mcp:latest

Client Configuration

Claude Desktop

Edit claude_desktop_config.json:

{
  "mcpServers": {
    "firecrawl": {
      "command": "firecrawl-mcp",
      "env": {
        "FIRECRAWL_API_KEY": "your-key"
      }
    }
  }
}

Cursor

Settings → MCP → Add server with same config.

Claude Code

claude mcp add firecrawl --command firecrawl-mcp \
  --env FIRECRAWL_API_KEY=your-key

Pricing

Firecrawl credit pricing (April 2026):

Free tier: 500 credits/month
Hobby ( 6/mo): 5,000 credits
Starter ($49/mo): 50,000 credits
Standard ( 99/mo): 250,000 credits
Enterprise: custom

Credit consumption per operation:

scrape (no JS): 1 credit
scrape (with JS render): 2-5 credits
crawl per page: 1-5 credits depending on options
extract (LLM-powered): 3-10 credits

For reference, 500 free-tier credits ≈ 200-500 page scrapes per month.

Production Use Cases

Use Case 1 — Research Agent

Ask Claude to research a topic across the web:

"Research the current state of RAG frameworks. Check LangChain, LlamaIndex, Haystack, and DSPy homepages. Summarize strengths and trade-offs."

Agent:

Calls firecrawl_scrape on each URL
Reads markdown content
Synthesizes comparative analysis

Typical cost: 4 credits, ~30 seconds total.

Use Case 2 — Competitive Intelligence

"Check the pricing pages of OpenAI, Anthropic, DeepSeek, and Moonshot. Report any changes from the prices I have in this spreadsheet."

Agent scrapes current pricing, compares against reference data, reports deltas. Useful for pricing tracking in fast-moving markets.

Use Case 3 — Documentation Ingestion

"Crawl docs.anthropic.com with depth 2. Extract all Messages API examples. Compile into a reference document."

firecrawl_crawl handles recursion, returns structured page map. Useful for building RAG indices from documentation sites without maintaining your own crawler.

Use Case 4 — Dynamic Form-Filled Content

Some sites require form interaction to reveal content (search boxes, filters). Firecrawl's actions system can submit forms before scraping:

{
  "url": "https://example.com/search",
  "actions": [
    {"type": "type", "selector": "#search", "text": "AI models"},
    {"type": "click", "selector": "#submit"},
    {"type": "wait", "milliseconds": 2000}
  ]
}

Returns scraped content post-interaction.

Use Case 5 — Structured Data Extraction at Scale

"Extract job postings from these 50 company careers pages. For each, get title, location, salary, and apply URL."

firecrawl_extract with a defined schema does this in one call per page — LLM-powered field extraction handles variations in page structure.

Alternatives: When Not to Use Firecrawl

Tavily MCP

Tavily specializes in search + scraping for AI agents. Better for search-heavy workflows where the query-to-content pipeline matters more than rich scraping features.

Pick Tavily if: heavy search usage, want LLM-friendly search results, don't need JavaScript rendering.

Pick Firecrawl if: need JS rendering, complex site interaction, structured extraction, or markdown conversion quality matters.

Jina AI Reader

Jina Reader offers simple URL-to-markdown conversion, free tier is generous. Simpler feature set than Firecrawl.

Pick Jina if: simple scraping, minimal budget, don't need crawl/extract features.

Pick Firecrawl if: need crawl, extract, or production reliability.

Direct HTTP + Manual Parsing

For simple pages without JavaScript, requests + BeautifulSoup is free.

Pick direct if: simple HTML sites, one-off scripts, budget zero.

Pick Firecrawl if: production workflow, JS-rendered sites, or maintaining your own scraper isn't worth your time.

Browserbase

Browserbase provides headless browser infrastructure, more control than Firecrawl but requires more engineering.

Pick Browserbase if: need custom automation beyond scraping, want full browser control.

Pick Firecrawl if: want the "scrape and forget" simplicity.

Security and Rate Limiting

API key protection: treat Firecrawl keys like any other credential. Environment variable, not in code.

Rate limiting your agent: Firecrawl has server-side rate limits, but you should also self-limit to avoid credit burn from runaway loops. Best practice: hard-cap scrape operations per agent session.

robots.txt compliance: Firecrawl respects robots.txt by default. Disable at your own legal risk for specific use cases — some commercial sites explicitly disallow automated scraping.

GDPR / personal data: if you're scraping European user data, consider GDPR implications. Firecrawl itself is neutral; your use case determines compliance burden.

Routing Across Multiple LLMs

One MCP server, many possible backend models. Through TokenMix.ai, your Firecrawl MCP server can be used by agents running on Claude Opus 4.7, GPT-5.5, DeepSeek V4-Pro, Kimi K2.6, Gemini 3.1 Pro, and 300+ other models — all through a single OpenAI-compatible endpoint.

Why this matters:

Research agents: Claude Opus 4.7 or Kimi K2.6 (long-context, synthesis)
Structured extraction: GPT-5.5 (strong JSON output) or DeepSeek V4-Pro (cheaper)
High-volume simple scraping: route to DeepSeek V4-Flash ($0.14/$0.28) for cost optimization
Complex multi-step research: Kimi K2.6's agent swarm (300 sub-agents native support)

Multi-model routing via aggregator lets your Firecrawl-powered agents scale economically.

Common Issues

"Page not loading correctly"

Some sites use aggressive anti-bot measures. Firecrawl handles most, but not all. For tough sites:

Try the mobile user agent option
Use actions system to simulate real interaction
Consider Browserbase for full browser control

"Rate limit exceeded"

Free tier: 500 credits/month, ~10 concurrent requests. Upgrade plan or throttle.

"Credit depleted faster than expected"

Common causes:

JS rendering eats 2-5× the credits of simple scraping
Extract operations cost 3-10 credits
Crawls scale with page count

Track usage in Firecrawl dashboard. Set alerts at 50% / 80% of monthly credits.

"Returned content isn't what I see in browser"

Possible causes:

JS not fully loaded when scraped — add waitFor option
Cloudflare or similar blocking — increase scraping delay
A/B testing variants — content varies per request

Tips for Production Use

1. Cache scraped content. Firecrawl costs credits per request. Cache results when URLs don't change frequently (product pages, docs) with sensible TTLs.

2. Batch when possible. firecrawl_crawl is more efficient than many firecrawl_scrape calls for sites.

3. Pre-filter targets. Don't let agents scrape "any URL" — define allowed domains to prevent runaway credit burn from following arbitrary links.

4. Monitor credit usage. Set up daily or weekly checks on credit balance. Agent loops can burn credits fast if not bounded.

5. Test extraction schemas carefully. firecrawl_extract quality depends on schema clarity. Iterate on schema definition with a few test pages before running at scale.

FAQ

Is Firecrawl free to use?

Free tier: 500 credits/month. Sufficient for development and light production use. Paid tiers start at 6/mo.

Does Firecrawl work on JavaScript-heavy sites?

Yes, that's one of its main differentiators. Set renderJs: true in scrape options. Costs 2-5× more credits than static scraping.

Can I scrape behind login?

Yes, via Firecrawl's actions system — submit login forms as part of the scrape request. Cookies persist within the scrape session.

Is this TOS-compliant?

Depends on the target site. Firecrawl respects robots.txt and terms that prohibit scraping. You're responsible for your specific use case.

What's the best alternative for search-focused workflows?

Tavily MCP. Specifically designed for agent search workflows. Firecrawl is better for direct URL scraping and site crawling.

How does this compare to Bright Data / ScrapingBee?

Bright Data and ScrapingBee are traditional scraping infrastructure — more complex, more features, higher cost. Firecrawl is AI-agent-optimized with better defaults for LLM workflows.

Can I run my own local scraper instead?

Yes, using Playwright + custom parsing. Firecrawl saves you from maintaining this. Pick based on whether engineering time or service cost is more valuable.

Where can I test Firecrawl with multiple LLM backends?

Run the Firecrawl MCP server locally, then route your client through TokenMix.ai for access to Claude, GPT, DeepSeek, Kimi, and 300+ other models. Same MCP server, different LLMs — useful for comparing how different models handle the same scraped content.

By TokenMix Research Lab · Updated 2026-04-24

Sources: Firecrawl documentation, Firecrawl MCP GitHub, Model Context Protocol spec, TokenMix.ai multi-LLM routing