TokenMix Research Lab · 2026-04-10

LangChain Tutorial 2026: Python Guide from First Chain to RAG Pipeline and Agents
Last Updated: 2026-04-29
Author: TokenMix Research Lab
LangChain has 100K+ GitHub stars and supports 80+ providers. LCEL is now the standard chain composition (LLMChain/SequentialChain deprecated). Switching providers is one line. Pair with LangGraph for stateful agents.
LangChain is the most popular framework for building LLM-powered applications in Python, with over 100,000 GitHub stars and support for 80+ model providers. This LangChain tutorial covers everything from installation to production deployment: your first chain, building a RAG pipeline, creating agents with tool use, and choosing the right LLMs. Whether you are new to LangChain or upgrading from an older version, this guide gives you working code and practical architecture decisions for 2026.
Table of Contents
- Quick Reference: LangChain Core Concepts
- What Is LangChain and Why Use It in 2026
- Installation and Setup
- Your First LangChain Chain
- Prompt Templates and Output Parsers
- Building a RAG Pipeline with LangChain
- Creating Agents with Tool Use
- Which LLMs to Use with LangChain
- LangChain vs Alternatives
- Production Best Practices
- Which LangChain Architecture Should You Build?
- What's the Bottom Line on LangChain in 2026?
- FAQ
Quick Reference: LangChain Core Concepts
Seven core concepts: Chain (LCEL composition), Prompt Template, Retriever, Agent, Tool, Memory, Output Parser. Every LLM app needs Chain + Prompt Template; RAG adds Retriever; agents add Tool + Agent.
| Concept | What it does | When to use |
|---|---|---|
| Chain (LCEL) | Composes LLM calls with data transformations | Every LLM application |
| Prompt Template | Structures input to the LLM | When you need consistent prompt formatting |
| Retriever | Fetches relevant documents from a data source | RAG applications |
| Agent | LLM decides which tools to call and in what order | Dynamic, multi-step workflows |
| Tool | A function the agent can invoke | When agents need external capabilities |
| Memory | Stores conversation history | Chatbots and multi-turn interactions |
| Output Parser | Structures LLM output into typed objects | When you need structured data from LLMs |
What Is LangChain and Why Use It in 2026
Use it when: 80+ provider unified interface, battle-tested RAG components, agent framework, LangSmith tracing matter. Skip it when: simple single-model API calls, you want max HTTP control, or abstraction adds more complexity than it removes.
LangChain is a Python (and JavaScript) framework that provides abstractions for building applications with LLMs. It standardizes the interface for calling different models, chaining operations, and integrating external data sources and tools.
Why LangChain still matters in 2026:
- Unified interface across 80+ model providers -- switch from OpenAI to Anthropic with one line
- Battle-tested RAG components (document loaders, text splitters, vector store integrations)
- Agent framework for building autonomous workflows
- LangSmith integration for tracing, monitoring, and evaluation
- LangGraph for complex, stateful agent workflows
When not to use LangChain:
- Simple, single-model API calls (just use the provider SDK directly)
- When you need maximum control over every HTTP request
- If your team finds the abstraction layers add more complexity than they remove
The key architectural change in 2026: LangChain Expression Language (LCEL) is the standard way to compose chains. The older LLMChain and SequentialChain classes are deprecated. This tutorial uses LCEL exclusively.
Installation and Setup
Four steps: install langchain + langchain-openai + chromadb, install provider-specific packages (anthropic/google-genai/groq), set API keys via env vars, verify with one-line ChatOpenAI().invoke() call.
Step 1: Install core packages
pip install langchain langchain-openai langchain-community
pip install chromadb # For RAG examples
Step 2: Install provider-specific packages (choose the ones you need)
pip install langchain-anthropic # For Claude models
pip install langchain-google-genai # For Gemini models
pip install langchain-groq # For [Groq](https://tokenmix.ai/blog/groq-api-pricing)-hosted models
Step 3: Set your API keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
# Or use TokenMix.ai unified API key for all providers
export TOKENMIX_API_KEY="tm-..."
Step 4: Verify installation
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
response = llm.invoke("Say hello")
print(response.content)
# Output: Hello! How can I help you today?
If this runs without errors, your setup is complete.
Your First LangChain Chain
Three components piped with |: ChatPromptTemplate (input formatting) → ChatOpenAI (LLM call) → StrOutputParser (extract content). LCEL is the only composition style going forward — older LLMChain is deprecated.
A chain in LangChain is a sequence of operations composed using the pipe (|) operator. Here is the simplest useful chain: a prompt template connected to an LLM connected to an output parser.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Define components
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant that explains {topic} concepts."),
("human", "{question}")
])
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.3)
output_parser = StrOutputParser()
# Compose the chain using LCEL
chain = prompt | llm | output_parser
# Run the chain
result = chain.invoke({
"topic": "machine learning",
"question": "What is gradient descent in 2 sentences?"
})
print(result)
What happens at each step:
promptformats the input variables into a chat messagellmsends the formatted prompt to GPT-4o-minioutput_parserextracts the string content from the response
You can add steps to this chain: logging, caching, retry logic, or parallel branches. That is the power of LCEL -- every component is composable.
Prompt Templates and Output Parsers
Templates standardize prompts; few-shot adds examples. JsonOutputParser + Pydantic gives typed structured output. Pair with format_instructions to inject schema info into prompt.
Prompt Templates prevent prompt injection and ensure consistent formatting.
from langchain_core.prompts import ChatPromptTemplate
# Basic template
template = ChatPromptTemplate.from_messages([
("system", "You are a {role}. Respond in {language}."),
("human", "{query}")
])
# With few-shot examples
from langchain_core.prompts import FewShotChatMessagePromptTemplate
examples = [
{"input": "2+2", "output": "4"},
{"input": "3+3", "output": "6"},
]
example_prompt = ChatPromptTemplate.from_messages([
("human", "{input}"),
("ai", "{output}")
])
few_shot_prompt = FewShotChatMessagePromptTemplate(
example_prompt=example_prompt,
examples=examples,
)
Output Parsers convert unstructured LLM text into structured data.
from langchain_core.output_parsers import JsonOutputParser
from pydantic import BaseModel, Field
class ProductReview(BaseModel):
sentiment: str = Field(description="positive, negative, or neutral")
score: int = Field(description="1-10 rating")
summary: str = Field(description="One sentence summary")
parser = JsonOutputParser(pydantic_object=ProductReview)
prompt = ChatPromptTemplate.from_messages([
("system", "Analyze this product review.\n{format_instructions}"),
("human", "{review}")
])
chain = prompt | llm | parser
result = chain.invoke({
"review": "Great product, fast shipping, exactly as described.",
"format_instructions": parser.get_format_instructions()
})
# result is a dict: {"sentiment": "positive", "score": 9, "summary": "..."}
Building a RAG Pipeline with LangChain
Three steps: load + split (RecursiveCharacterTextSplitter, 512 tokens, 50 overlap) → embed + store (OpenAIEmbeddings + Chroma) → retrieval chain (retriever | format_docs | prompt | LLM | parser). 30 lines for production.
RAG (Retrieval Augmented Generation) is LangChain's most common production use case. Here is a complete implementation.
Step 1: Load and split documents
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Load documents (web pages, PDFs, etc.)
loader = WebBaseLoader("https://docs.example.com/api-reference")
docs = loader.load()
# Split into chunks
splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=50
)
chunks = splitter.split_documents(docs)
print(f"Loaded {len(chunks)} chunks")
Step 2: Create vector store
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
Step 3: Build the RAG chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
prompt = ChatPromptTemplate.from_messages([
("system", """Answer based on the provided context. If the context
doesn't contain the answer, say "I don't have that information."
Context: {context}"""),
("human", "{question}")
])
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.1)
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# Query your knowledge base
answer = rag_chain.invoke("What are the [rate limits](https://tokenmix.ai/blog/ai-api-rate-limits-guide)?")
print(answer)
This three-step setup handles most RAG applications. The retriever fetches relevant chunks, the prompt grounds the LLM, and the chain orchestrates everything.
Creating Agents with Tool Use
Decorate functions with @tool, pass list to create_tool_calling_agent, wrap in AgentExecutor. Agents auto-pick tools and chain calls — e.g., calculate + get_weather in one user query without explicit orchestration.
Agents let the LLM decide which tools to call based on the user's query. This is essential for applications that need to take actions, not just answer questions.
Define tools:
from langchain_core.tools import tool
import requests
@tool
def get_weather(city: str) -> str:
"""Get current weather for a city."""
# Simplified example
return f"Weather in {city}: 72F, sunny"
@tool
def calculate(expression: str) -> str:
"""Evaluate a math expression."""
try:
result = eval(expression) # Use safer eval in production
return str(result)
except Exception as e:
return f"Error: {e}"
@tool
def search_docs(query: str) -> str:
"""Search the knowledge base for information."""
docs = retriever.invoke(query)
return "\n".join(doc.page_content for doc in docs[:3])
tools = [get_weather, calculate, search_docs]
Create the agent:
from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant. Use tools when needed."),
("placeholder", "{chat_history}"),
("human", "{input}"),
("placeholder", "{agent_scratchpad}")
])
llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# The agent decides which tools to use
result = executor.invoke({
"input": "What's 15% of 8500, and what's the weather in Tokyo?",
"chat_history": []
})
print(result["output"])
The agent will automatically call calculate("8500 * 0.15") and get_weather("Tokyo") to answer both parts of the question.
Which LLMs to Use with LangChain
Default: GPT-4o-mini ($0.15/$0.60). Complex reasoning: Sonnet 4.6 ($3/$15). Code: GPT-4o. Budget volume: DeepSeek V3 ($0.27/$1.10). Speed: Groq Llama. Long context: Gemini 2.5 Pro. Switching = one line.
LangChain supports 80+ model providers. Here are the best options for different use cases, with pricing tracked by TokenMix.ai.
| Use case | Recommended model | Price (input/output per 1M tokens) | Why |
|---|---|---|---|
| General chains | GPT-4o-mini | $0.15 / $0.60 | Cheap, fast, good enough for most tasks |
| Complex reasoning | Claude Sonnet 4.6 | $3 / $15 | Best reasoning quality per dollar |
| Code generation | GPT-4o | $2.50 / $10 | Strong code + tool calling |
| Budget / high volume | DeepSeek V3 | $0.27 / $1.10 | Cheapest quality option |
| Speed-critical | Groq (Llama 3.3 70B) | $0.59 / $0.79 | Fastest inference |
| Long context | Gemini 2.5 Pro | $1.25 / $10 | 1M+ context window |
Switching models in LangChain is one line:
# OpenAI
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
# Anthropic
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-sonnet-4-6-20250514")
# Via TokenMix.ai (access any model with one API key)
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="claude-sonnet-4-6-20250514",
base_url="https://api.tokenmix.ai/v1",
api_key="your-tokenmix-key"
)
This is a core LangChain advantage: your application code stays the same regardless of which model runs underneath. TokenMix.ai amplifies this by providing a single API key for all providers.
LangChain vs Alternatives
LangChain wins agents + multi-model breadth (80+ vs LlamaIndex 40+ vs Haystack 20+). LlamaIndex wins RAG depth. Haystack wins production NLP. Direct SDK wins simplicity. Pick by primary use case.
| Feature | LangChain | LlamaIndex | Haystack | Direct SDK |
|---|---|---|---|---|
| Primary focus | General LLM apps | RAG/data indexing | Production NLP | Raw API access |
| Ease of use | Medium | Medium | Medium | Easiest |
| RAG support | Comprehensive | Best-in-class | Good | DIY |
| Agent framework | Strong (LangGraph) | Basic | Moderate | DIY |
| Provider support | 80+ | 40+ | 20+ | 1 per SDK |
| Observability | LangSmith | LlamaTrace | Haystack Studio | DIY |
| Learning curve | Steep | Moderate | Moderate | Low |
| Best for | Full-stack LLM apps | Data-intensive RAG | Production NLP | Simple integrations |
Choose LangChain when: you need agents, multi-model support, and a mature ecosystem. Choose LlamaIndex when: RAG is your primary use case and you want the deepest document processing. Choose direct SDKs when: you have a simple use case and want minimal dependencies.
Production Best Practices
Five rules: LangSmith tracing for every chain, .with_fallbacks() for cross-provider failover, SQLite/Redis cache for repeated calls, stream tokens for user-facing apps, max_retries=2 + request_timeout=30 to cap runaway costs.
1. Use LangSmith for tracing. Every chain call should be traced. LangSmith captures inputs, outputs, latency, token usage, and cost for every step. Set LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY to enable.
2. Implement fallbacks. Use LCEL's .with_fallbacks() to automatically retry with a different model on failure.
primary = ChatOpenAI(model="gpt-4o")
fallback = ChatAnthropic(model="claude-sonnet-4-6-20250514")
llm_with_fallback = primary.with_fallbacks([fallback])
3. Cache repeated calls. LangChain supports in-memory, Redis, and SQLite caching for identical prompts.
from langchain_core.globals import set_llm_cache
from langchain_community.cache import SQLiteCache
set_llm_cache(SQLiteCache(database_path=".langchain.db"))
4. Stream responses. For user-facing applications, stream tokens instead of waiting for full completion.
for chunk in chain.stream({"question": "Explain RAG"}):
print(chunk, end="", flush=True)
5. Set rate limits and timeouts. Protect against runaway costs and hung requests.
llm = ChatOpenAI(
model="gpt-4o-mini",
max_retries=2,
request_timeout=30
)
Which LangChain Architecture Should You Build?
Chatbot: Chain + Memory. Document Q&A: RAG Chain. Customer support: Agent + RAG + Tools. Data extraction: Chain + JsonOutputParser. Multi-step workflow: LangGraph StateGraph. Content gen: chained prompt templates.
| Your application | Architecture | Key components |
|---|---|---|
| Simple chatbot | Chain + Memory | ChatPromptTemplate + LLM + ConversationBufferMemory |
| Q&A over documents | RAG Chain | Document Loader + Splitter + Embeddings + VectorStore + Retriever |
| Customer support bot | Agent + RAG + Tools | Agent with retriever tool + API tools + ticket creation |
| Data extraction | Chain + Output Parser | Prompt + LLM + JsonOutputParser/PydanticOutputParser |
| Multi-step workflow | LangGraph | StateGraph with conditional edges and tool nodes |
| Content generation | Chain + Templates | Multiple prompt templates chained with LLM |
What's the Bottom Line on LangChain in 2026?
Start simple — basic LCEL handles 80% of use cases. Add RAG when you need external knowledge, agents for actions, LangGraph for stateful workflows. GPT-4o-mini covers most chains; route to Sonnet/Groq for specialized needs.
LangChain in 2026 is mature, well-documented, and the standard framework for multi-model LLM applications. The key to success is starting simple -- a basic LCEL chain handles 80% of use cases -- and adding complexity (RAG, agents, LangGraph) only when needed.
For model selection, use GPT-4o-mini for most chains, Claude Sonnet 4.6 for complex reasoning, and Groq for speed. TokenMix.ai provides unified access to all these models through LangChain's OpenAI-compatible interface, letting you switch providers without code changes.
Start with the code examples in this guide. Build your first chain today, add RAG when you need external knowledge, and graduate to agents when your application needs to take actions.
FAQ
Is LangChain still relevant in 2026?
Yes. LangChain has over 100K GitHub stars and remains the most-used framework for LLM applications. The introduction of LCEL and LangGraph addressed earlier criticisms about complexity. For multi-model, multi-step applications, LangChain provides the most mature ecosystem.
What is the difference between LangChain and LangGraph?
LangChain provides the core abstractions (chains, prompts, retrievers, tools). LangGraph extends LangChain for building stateful, multi-agent workflows with cycles and conditional logic. Use LangChain for linear chains and simple agents. Use LangGraph when you need complex agent orchestration with branching logic.
Can I use LangChain with local LLMs?
Yes. LangChain supports local LLMs through integrations with Ollama, llama.cpp, vLLM, and Hugging Face. Install langchain-community and use the appropriate wrapper class. Local models are free to run but require GPU hardware.
How do I reduce LangChain costs?
Four strategies: (1) use GPT-4o-mini instead of GPT-4o for simple tasks, saving 90%+ on token costs, (2) implement caching for repeated queries, (3) use prompt compression to reduce token usage, (4) route through TokenMix.ai for discounted API access across all providers.
What is LCEL in LangChain?
LCEL (LangChain Expression Language) is the standard way to compose chains in LangChain using the pipe operator (|). It replaced older chain classes like LLMChain and SequentialChain. LCEL chains support streaming, async execution, and built-in fallbacks.
How do I add memory to a LangChain chatbot?
Use RunnableWithMessageHistory to add conversation memory to any chain. Connect it to a message store (in-memory, Redis, or PostgreSQL) to persist chat history across sessions. For agents, pass chat_history as a prompt variable.
Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: LangChain Documentation, LangSmith, GitHub LangChain, TokenMix.ai