TokenMix Research Lab ยท 2026-04-10

LangChain Tutorial 2026: Python Guide from First Chain to RAG Pipeline and Agents

LangChain Tutorial 2026: Python Guide from First Chain to RAG Pipeline and Agents

LangChain is the most popular framework for building LLM-powered applications in Python, with over 100,000 GitHub stars and support for 80+ model providers. This LangChain tutorial covers everything from installation to production deployment: your first chain, building a RAG pipeline, creating agents with tool use, and choosing the right LLMs. Whether you are new to LangChain or upgrading from an older version, this guide gives you working code and practical architecture decisions for 2026.

Table of Contents


Quick Reference: LangChain Core Concepts

Concept What it does When to use
Chain (LCEL) Composes LLM calls with data transformations Every LLM application
Prompt Template Structures input to the LLM When you need consistent prompt formatting
Retriever Fetches relevant documents from a data source RAG applications
Agent LLM decides which tools to call and in what order Dynamic, multi-step workflows
Tool A function the agent can invoke When agents need external capabilities
Memory Stores conversation history Chatbots and multi-turn interactions
Output Parser Structures LLM output into typed objects When you need structured data from LLMs

What Is LangChain and Why Use It in 2026

LangChain is a Python (and JavaScript) framework that provides abstractions for building applications with LLMs. It standardizes the interface for calling different models, chaining operations, and integrating external data sources and tools.

Why LangChain still matters in 2026:

When not to use LangChain:

The key architectural change in 2026: LangChain Expression Language (LCEL) is the standard way to compose chains. The older LLMChain and SequentialChain classes are deprecated. This tutorial uses LCEL exclusively.

Installation and Setup

Step 1: Install core packages

pip install langchain langchain-openai langchain-community
pip install chromadb  # For RAG examples

Step 2: Install provider-specific packages (choose the ones you need)

pip install langchain-anthropic    # For Claude models
pip install langchain-google-genai  # For Gemini models
pip install langchain-groq          # For [Groq](https://tokenmix.ai/blog/groq-api-pricing)-hosted models

Step 3: Set your API keys

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
# Or use TokenMix.ai unified API key for all providers
export TOKENMIX_API_KEY="tm-..."

Step 4: Verify installation

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")
response = llm.invoke("Say hello")
print(response.content)
# Output: Hello! How can I help you today?

If this runs without errors, your setup is complete.

Your First LangChain Chain

A chain in LangChain is a sequence of operations composed using the pipe (|) operator. Here is the simplest useful chain: a prompt template connected to an LLM connected to an output parser.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Define components
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that explains {topic} concepts."),
    ("human", "{question}")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.3)
output_parser = StrOutputParser()

# Compose the chain using LCEL
chain = prompt | llm | output_parser

# Run the chain
result = chain.invoke({
    "topic": "machine learning",
    "question": "What is gradient descent in 2 sentences?"
})
print(result)

What happens at each step:

  1. prompt formats the input variables into a chat message
  2. llm sends the formatted prompt to GPT-4o-mini
  3. output_parser extracts the string content from the response

You can add steps to this chain: logging, caching, retry logic, or parallel branches. That is the power of LCEL -- every component is composable.

Prompt Templates and Output Parsers

Prompt Templates prevent prompt injection and ensure consistent formatting.

from langchain_core.prompts import ChatPromptTemplate

# Basic template
template = ChatPromptTemplate.from_messages([
    ("system", "You are a {role}. Respond in {language}."),
    ("human", "{query}")
])

# With few-shot examples
from langchain_core.prompts import FewShotChatMessagePromptTemplate

examples = [
    {"input": "2+2", "output": "4"},
    {"input": "3+3", "output": "6"},
]

example_prompt = ChatPromptTemplate.from_messages([
    ("human", "{input}"),
    ("ai", "{output}")
])

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

Output Parsers convert unstructured LLM text into structured data.

from langchain_core.output_parsers import JsonOutputParser
from pydantic import BaseModel, Field

class ProductReview(BaseModel):
    sentiment: str = Field(description="positive, negative, or neutral")
    score: int = Field(description="1-10 rating")
    summary: str = Field(description="One sentence summary")

parser = JsonOutputParser(pydantic_object=ProductReview)

prompt = ChatPromptTemplate.from_messages([
    ("system", "Analyze this product review.\n{format_instructions}"),
    ("human", "{review}")
])

chain = prompt | llm | parser
result = chain.invoke({
    "review": "Great product, fast shipping, exactly as described.",
    "format_instructions": parser.get_format_instructions()
})
# result is a dict: {"sentiment": "positive", "score": 9, "summary": "..."}

Building a RAG Pipeline with LangChain

RAG (Retrieval Augmented Generation) is LangChain's most common production use case. Here is a complete implementation.

Step 1: Load and split documents

from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load documents (web pages, PDFs, etc.)
loader = WebBaseLoader("https://docs.example.com/api-reference")
docs = loader.load()

# Split into chunks
splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=50
)
chunks = splitter.split_documents(docs)
print(f"Loaded {len(chunks)} chunks")

Step 2: Create vector store

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

Step 3: Build the RAG chain

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_messages([
    ("system", """Answer based on the provided context. If the context
    doesn't contain the answer, say "I don't have that information."

    Context: {context}"""),
    ("human", "{question}")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.1)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Query your knowledge base
answer = rag_chain.invoke("What are the [rate limits](https://tokenmix.ai/blog/ai-api-rate-limits-guide)?")
print(answer)

This three-step setup handles most RAG applications. The retriever fetches relevant chunks, the prompt grounds the LLM, and the chain orchestrates everything.

Creating Agents with Tool Use

Agents let the LLM decide which tools to call based on the user's query. This is essential for applications that need to take actions, not just answer questions.

Define tools:

from langchain_core.tools import tool
import requests

@tool
def get_weather(city: str) -> str:
    """Get current weather for a city."""
    # Simplified example
    return f"Weather in {city}: 72F, sunny"

@tool
def calculate(expression: str) -> str:
    """Evaluate a math expression."""
    try:
        result = eval(expression)  # Use safer eval in production
        return str(result)
    except Exception as e:
        return f"Error: {e}"

@tool
def search_docs(query: str) -> str:
    """Search the knowledge base for information."""
    docs = retriever.invoke(query)
    return "\n".join(doc.page_content for doc in docs[:3])

tools = [get_weather, calculate, search_docs]

Create the agent:

from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Use tools when needed."),
    ("placeholder", "{chat_history}"),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}")
])

llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent = create_tool_calling_agent(llm, tools, prompt)

executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# The agent decides which tools to use
result = executor.invoke({
    "input": "What's 15% of 8500, and what's the weather in Tokyo?",
    "chat_history": []
})
print(result["output"])

The agent will automatically call calculate("8500 * 0.15") and get_weather("Tokyo") to answer both parts of the question.

Which LLMs to Use with LangChain

LangChain supports 80+ model providers. Here are the best options for different use cases, with pricing tracked by TokenMix.ai.

Use case Recommended model Price (input/output per 1M tokens) Why
General chains GPT-4o-mini $0.15 / $0.60 Cheap, fast, good enough for most tasks
Complex reasoning Claude Sonnet 4.6 $3 / 5 Best reasoning quality per dollar
Code generation GPT-4o $2.50 / 0 Strong code + tool calling
Budget / high volume DeepSeek V3 $0.27 / .10 Cheapest quality option
Speed-critical Groq (Llama 3.3 70B) $0.59 / $0.79 Fastest inference
Long context Gemini 2.5 Pro .25 / 0 1M+ context window

Switching models in LangChain is one line:

# OpenAI
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")

# Anthropic
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-sonnet-4-6-20250514")

# Via TokenMix.ai (access any model with one API key)
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
    model="claude-sonnet-4-6-20250514",
    base_url="https://api.tokenmix.ai/v1",
    api_key="your-tokenmix-key"
)

This is a core LangChain advantage: your application code stays the same regardless of which model runs underneath. TokenMix.ai amplifies this by providing a single API key for all providers.

LangChain vs Alternatives

Feature LangChain LlamaIndex Haystack Direct SDK
Primary focus General LLM apps RAG/data indexing Production NLP Raw API access
Ease of use Medium Medium Medium Easiest
RAG support Comprehensive Best-in-class Good DIY
Agent framework Strong (LangGraph) Basic Moderate DIY
Provider support 80+ 40+ 20+ 1 per SDK
Observability LangSmith LlamaTrace Haystack Studio DIY
Learning curve Steep Moderate Moderate Low
Best for Full-stack LLM apps Data-intensive RAG Production NLP Simple integrations

Choose LangChain when: you need agents, multi-model support, and a mature ecosystem. Choose LlamaIndex when: RAG is your primary use case and you want the deepest document processing. Choose direct SDKs when: you have a simple use case and want minimal dependencies.

Production Best Practices

1. Use LangSmith for tracing. Every chain call should be traced. LangSmith captures inputs, outputs, latency, token usage, and cost for every step. Set LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY to enable.

2. Implement fallbacks. Use LCEL's .with_fallbacks() to automatically retry with a different model on failure.

primary = ChatOpenAI(model="gpt-4o")
fallback = ChatAnthropic(model="claude-sonnet-4-6-20250514")
llm_with_fallback = primary.with_fallbacks([fallback])

3. Cache repeated calls. LangChain supports in-memory, Redis, and SQLite caching for identical prompts.

from langchain_core.globals import set_llm_cache
from langchain_community.cache import SQLiteCache
set_llm_cache(SQLiteCache(database_path=".langchain.db"))

4. Stream responses. For user-facing applications, stream tokens instead of waiting for full completion.

for chunk in chain.stream({"question": "Explain RAG"}):
    print(chunk, end="", flush=True)

5. Set rate limits and timeouts. Protect against runaway costs and hung requests.

llm = ChatOpenAI(
    model="gpt-4o-mini",
    max_retries=2,
    request_timeout=30
)

How to Choose Your LangChain Architecture

Your application Architecture Key components
Simple chatbot Chain + Memory ChatPromptTemplate + LLM + ConversationBufferMemory
Q&A over documents RAG Chain Document Loader + Splitter + Embeddings + VectorStore + Retriever
Customer support bot Agent + RAG + Tools Agent with retriever tool + API tools + ticket creation
Data extraction Chain + Output Parser Prompt + LLM + JsonOutputParser/PydanticOutputParser
Multi-step workflow LangGraph StateGraph with conditional edges and tool nodes
Content generation Chain + Templates Multiple prompt templates chained with LLM

Conclusion

LangChain in 2026 is mature, well-documented, and the standard framework for multi-model LLM applications. The key to success is starting simple -- a basic LCEL chain handles 80% of use cases -- and adding complexity (RAG, agents, LangGraph) only when needed.

For model selection, use GPT-4o-mini for most chains, Claude Sonnet 4.6 for complex reasoning, and Groq for speed. TokenMix.ai provides unified access to all these models through LangChain's OpenAI-compatible interface, letting you switch providers without code changes.

Start with the code examples in this guide. Build your first chain today, add RAG when you need external knowledge, and graduate to agents when your application needs to take actions.

FAQ

Is LangChain still relevant in 2026?

Yes. LangChain has over 100K GitHub stars and remains the most-used framework for LLM applications. The introduction of LCEL and LangGraph addressed earlier criticisms about complexity. For multi-model, multi-step applications, LangChain provides the most mature ecosystem.

What is the difference between LangChain and LangGraph?

LangChain provides the core abstractions (chains, prompts, retrievers, tools). LangGraph extends LangChain for building stateful, multi-agent workflows with cycles and conditional logic. Use LangChain for linear chains and simple agents. Use LangGraph when you need complex agent orchestration with branching logic.

Can I use LangChain with local LLMs?

Yes. LangChain supports local LLMs through integrations with Ollama, llama.cpp, vLLM, and Hugging Face. Install langchain-community and use the appropriate wrapper class. Local models are free to run but require GPU hardware.

How do I reduce LangChain costs?

Four strategies: (1) use GPT-4o-mini instead of GPT-4o for simple tasks, saving 90%+ on token costs, (2) implement caching for repeated queries, (3) use prompt compression to reduce token usage, (4) route through TokenMix.ai for discounted API access across all providers.

What is LCEL in LangChain?

LCEL (LangChain Expression Language) is the standard way to compose chains in LangChain using the pipe operator (|). It replaced older chain classes like LLMChain and SequentialChain. LCEL chains support streaming, async execution, and built-in fallbacks.

How do I add memory to a LangChain chatbot?

Use RunnableWithMessageHistory to add conversation memory to any chain. Connect it to a message store (in-memory, Redis, or PostgreSQL) to persist chat history across sessions. For agents, pass chat_history as a prompt variable.


Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: LangChain Documentation, LangSmith, GitHub LangChain, TokenMix.ai