TokenMix Research Lab · 2026-04-10

LangChain Tutorial 2026: Python Guide to First Chain + RAG

LangChain Tutorial 2026: Python Guide from First Chain to RAG Pipeline and Agents

Last Updated: 2026-04-29
Author: TokenMix Research Lab

LangChain has 100K+ GitHub stars and supports 80+ providers. LCEL is now the standard chain composition (LLMChain/SequentialChain deprecated). Switching providers is one line. Pair with LangGraph for stateful agents.

LangChain is the most popular framework for building LLM-powered applications in Python, with over 100,000 GitHub stars and support for 80+ model providers. This LangChain tutorial covers everything from installation to production deployment: your first chain, building a RAG pipeline, creating agents with tool use, and choosing the right LLMs. Whether you are new to LangChain or upgrading from an older version, this guide gives you working code and practical architecture decisions for 2026.

Quick Reference: LangChain Core Concepts
What Is LangChain and Why Use It in 2026
Installation and Setup
Your First LangChain Chain
Prompt Templates and Output Parsers
Building a RAG Pipeline with LangChain
Creating Agents with Tool Use
Which LLMs to Use with LangChain
LangChain vs Alternatives
Production Best Practices
Which LangChain Architecture Should You Build?
What's the Bottom Line on LangChain in 2026?
FAQ

Quick Reference: LangChain Core Concepts

Seven core concepts: Chain (LCEL composition), Prompt Template, Retriever, Agent, Tool, Memory, Output Parser. Every LLM app needs Chain + Prompt Template; RAG adds Retriever; agents add Tool + Agent.

Concept	What it does	When to use
Chain (LCEL)	Composes LLM calls with data transformations	Every LLM application
Prompt Template	Structures input to the LLM	When you need consistent prompt formatting
Retriever	Fetches relevant documents from a data source	RAG applications
Agent	LLM decides which tools to call and in what order	Dynamic, multi-step workflows
Tool	A function the agent can invoke	When agents need external capabilities
Memory	Stores conversation history	Chatbots and multi-turn interactions
Output Parser	Structures LLM output into typed objects	When you need structured data from LLMs

What Is LangChain and Why Use It in 2026

Use it when: 80+ provider unified interface, battle-tested RAG components, agent framework, LangSmith tracing matter. Skip it when: simple single-model API calls, you want max HTTP control, or abstraction adds more complexity than it removes.

LangChain is a Python (and JavaScript) framework that provides abstractions for building applications with LLMs. It standardizes the interface for calling different models, chaining operations, and integrating external data sources and tools.

Why LangChain still matters in 2026:

Unified interface across 80+ model providers -- switch from OpenAI to Anthropic with one line
Battle-tested RAG components (document loaders, text splitters, vector store integrations)
Agent framework for building autonomous workflows
LangSmith integration for tracing, monitoring, and evaluation
LangGraph for complex, stateful agent workflows

When not to use LangChain:

Simple, single-model API calls (just use the provider SDK directly)
When you need maximum control over every HTTP request
If your team finds the abstraction layers add more complexity than they remove

The key architectural change in 2026: LangChain Expression Language (LCEL) is the standard way to compose chains. The older LLMChain and SequentialChain classes are deprecated. This tutorial uses LCEL exclusively.

Installation and Setup

Four steps: install langchain + langchain-openai + chromadb, install provider-specific packages (anthropic/google-genai/groq), set API keys via env vars, verify with one-line ChatOpenAI().invoke() call.

Step 1: Install core packages

pip install langchain langchain-openai langchain-community
pip install chromadb  # For RAG examples

Step 2: Install provider-specific packages (choose the ones you need)

pip install langchain-anthropic    # For Claude models
pip install langchain-google-genai  # For Gemini models
pip install langchain-groq          # For [Groq](https://tokenmix.ai/blog/groq-api-pricing)-hosted models

Step 3: Set your API keys

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
# Or use TokenMix.ai unified API key for all providers
export TOKENMIX_API_KEY="tm-..."

Step 4: Verify installation

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")
response = llm.invoke("Say hello")
print(response.content)
# Output: Hello! How can I help you today?

If this runs without errors, your setup is complete.

Your First LangChain Chain

Three components piped with |: ChatPromptTemplate (input formatting) → ChatOpenAI (LLM call) → StrOutputParser (extract content). LCEL is the only composition style going forward — older LLMChain is deprecated.

A chain in LangChain is a sequence of operations composed using the pipe (|) operator. Here is the simplest useful chain: a prompt template connected to an LLM connected to an output parser.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Define components
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that explains {topic} concepts."),
    ("human", "{question}")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.3)
output_parser = StrOutputParser()

# Compose the chain using LCEL
chain = prompt | llm | output_parser

# Run the chain
result = chain.invoke({
    "topic": "machine learning",
    "question": "What is gradient descent in 2 sentences?"
})
print(result)

What happens at each step:

prompt formats the input variables into a chat message
llm sends the formatted prompt to GPT-4o-mini
output_parser extracts the string content from the response

You can add steps to this chain: logging, caching, retry logic, or parallel branches. That is the power of LCEL -- every component is composable.

Prompt Templates and Output Parsers

Templates standardize prompts; few-shot adds examples. JsonOutputParser + Pydantic gives typed structured output. Pair with format_instructions to inject schema info into prompt.

Prompt Templates prevent prompt injection and ensure consistent formatting.

from langchain_core.prompts import ChatPromptTemplate

# Basic template
template = ChatPromptTemplate.from_messages([
    ("system", "You are a {role}. Respond in {language}."),
    ("human", "{query}")
])

# With few-shot examples
from langchain_core.prompts import FewShotChatMessagePromptTemplate

examples = [
    {"input": "2+2", "output": "4"},
    {"input": "3+3", "output": "6"},
]

example_prompt = ChatPromptTemplate.from_messages([
    ("human", "{input}"),
    ("ai", "{output}")
])

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

Output Parsers convert unstructured LLM text into structured data.

from langchain_core.output_parsers import JsonOutputParser
from pydantic import BaseModel, Field

class ProductReview(BaseModel):
    sentiment: str = Field(description="positive, negative, or neutral")
    score: int = Field(description="1-10 rating")
    summary: str = Field(description="One sentence summary")

parser = JsonOutputParser(pydantic_object=ProductReview)

prompt = ChatPromptTemplate.from_messages([
    ("system", "Analyze this product review.\n{format_instructions}"),
    ("human", "{review}")
])

chain = prompt | llm | parser
result = chain.invoke({
    "review": "Great product, fast shipping, exactly as described.",
    "format_instructions": parser.get_format_instructions()
})
# result is a dict: {"sentiment": "positive", "score": 9, "summary": "..."}

Building a RAG Pipeline with LangChain

Three steps: load + split (RecursiveCharacterTextSplitter, 512 tokens, 50 overlap) → embed + store (OpenAIEmbeddings + Chroma) → retrieval chain (retriever | format_docs | prompt | LLM | parser). 30 lines for production.

RAG (Retrieval Augmented Generation) is LangChain's most common production use case. Here is a complete implementation.

Step 1: Load and split documents

from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load documents (web pages, PDFs, etc.)
loader = WebBaseLoader("https://docs.example.com/api-reference")
docs = loader.load()

# Split into chunks
splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=50
)
chunks = splitter.split_documents(docs)
print(f"Loaded {len(chunks)} chunks")

Step 2: Create vector store

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

Step 3: Build the RAG chain

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_messages([
    ("system", """Answer based on the provided context. If the context
    doesn't contain the answer, say "I don't have that information."

    Context: {context}"""),
    ("human", "{question}")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.1)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Query your knowledge base
answer = rag_chain.invoke("What are the [rate limits](https://tokenmix.ai/blog/ai-api-rate-limits-guide)?")
print(answer)

This three-step setup handles most RAG applications. The retriever fetches relevant chunks, the prompt grounds the LLM, and the chain orchestrates everything.

Creating Agents with Tool Use

Decorate functions with @tool, pass list to create_tool_calling_agent, wrap in AgentExecutor. Agents auto-pick tools and chain calls — e.g., calculate + get_weather in one user query without explicit orchestration.

Agents let the LLM decide which tools to call based on the user's query. This is essential for applications that need to take actions, not just answer questions.

Define tools:

from langchain_core.tools import tool
import requests

@tool
def get_weather(city: str) -> str:
    """Get current weather for a city."""
    # Simplified example
    return f"Weather in {city}: 72F, sunny"

@tool
def calculate(expression: str) -> str:
    """Evaluate a math expression."""
    try:
        result = eval(expression)  # Use safer eval in production
        return str(result)
    except Exception as e:
        return f"Error: {e}"

@tool
def search_docs(query: str) -> str:
    """Search the knowledge base for information."""
    docs = retriever.invoke(query)
    return "\n".join(doc.page_content for doc in docs[:3])

tools = [get_weather, calculate, search_docs]

Create the agent:

from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Use tools when needed."),
    ("placeholder", "{chat_history}"),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}")
])

llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent = create_tool_calling_agent(llm, tools, prompt)

executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# The agent decides which tools to use
result = executor.invoke({
    "input": "What's 15% of 8500, and what's the weather in Tokyo?",
    "chat_history": []
})
print(result["output"])

The agent will automatically call calculate("8500 * 0.15") and get_weather("Tokyo") to answer both parts of the question.

Which LLMs to Use with LangChain

Default: GPT-4o-mini ($0.15/$0.60). Complex reasoning: Sonnet 4.6 ($3/$15). Code: GPT-4o. Budget volume: DeepSeek V3 ($0.27/$1.10). Speed: Groq Llama. Long context: Gemini 2.5 Pro. Switching = one line.

LangChain supports 80+ model providers. Here are the best options for different use cases, with pricing tracked by TokenMix.ai.

Use case	Recommended model	Price (input/output per 1M tokens)	Why
General chains	GPT-4o-mini	$0.15 / $0.60	Cheap, fast, good enough for most tasks
Complex reasoning	Claude Sonnet 4.6	$3 / $15	Best reasoning quality per dollar
Code generation	GPT-4o	$2.50 / $10	Strong code + tool calling
Budget / high volume	DeepSeek V3	$0.27 / $1.10	Cheapest quality option
Speed-critical	Groq (Llama 3.3 70B)	$0.59 / $0.79	Fastest inference
Long context	Gemini 2.5 Pro	$1.25 / $10	1M+ context window

Switching models in LangChain is one line:

# OpenAI
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")

# Anthropic
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-sonnet-4-6-20250514")

# Via TokenMix.ai (access any model with one API key)
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
    model="claude-sonnet-4-6-20250514",
    base_url="https://api.tokenmix.ai/v1",
    api_key="your-tokenmix-key"
)

This is a core LangChain advantage: your application code stays the same regardless of which model runs underneath. TokenMix.ai amplifies this by providing a single API key for all providers.

LangChain vs Alternatives

LangChain wins agents + multi-model breadth (80+ vs LlamaIndex 40+ vs Haystack 20+). LlamaIndex wins RAG depth. Haystack wins production NLP. Direct SDK wins simplicity. Pick by primary use case.

Feature	LangChain	LlamaIndex	Haystack	Direct SDK
Primary focus	General LLM apps	RAG/data indexing	Production NLP	Raw API access
Ease of use	Medium	Medium	Medium	Easiest
RAG support	Comprehensive	Best-in-class	Good	DIY
Agent framework	Strong (LangGraph)	Basic	Moderate	DIY
Provider support	80+	40+	20+	1 per SDK
Observability	LangSmith	LlamaTrace	Haystack Studio	DIY
Learning curve	Steep	Moderate	Moderate	Low
Best for	Full-stack LLM apps	Data-intensive RAG	Production NLP	Simple integrations

Choose LangChain when: you need agents, multi-model support, and a mature ecosystem. Choose LlamaIndex when: RAG is your primary use case and you want the deepest document processing. Choose direct SDKs when: you have a simple use case and want minimal dependencies.

Production Best Practices

Five rules: LangSmith tracing for every chain, .with_fallbacks() for cross-provider failover, SQLite/Redis cache for repeated calls, stream tokens for user-facing apps, max_retries=2 + request_timeout=30 to cap runaway costs.

1. Use LangSmith for tracing. Every chain call should be traced. LangSmith captures inputs, outputs, latency, token usage, and cost for every step. Set LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY to enable.

2. Implement fallbacks. Use LCEL's .with_fallbacks() to automatically retry with a different model on failure.

primary = ChatOpenAI(model="gpt-4o")
fallback = ChatAnthropic(model="claude-sonnet-4-6-20250514")
llm_with_fallback = primary.with_fallbacks([fallback])

3. Cache repeated calls. LangChain supports in-memory, Redis, and SQLite caching for identical prompts.

from langchain_core.globals import set_llm_cache
from langchain_community.cache import SQLiteCache
set_llm_cache(SQLiteCache(database_path=".langchain.db"))

4. Stream responses. For user-facing applications, stream tokens instead of waiting for full completion.

for chunk in chain.stream({"question": "Explain RAG"}):
    print(chunk, end="", flush=True)

5. Set rate limits and timeouts. Protect against runaway costs and hung requests.

llm = ChatOpenAI(
    model="gpt-4o-mini",
    max_retries=2,
    request_timeout=30
)

Which LangChain Architecture Should You Build?

Chatbot: Chain + Memory. Document Q&A: RAG Chain. Customer support: Agent + RAG + Tools. Data extraction: Chain + JsonOutputParser. Multi-step workflow: LangGraph StateGraph. Content gen: chained prompt templates.

Your application	Architecture	Key components
Simple chatbot	Chain + Memory	ChatPromptTemplate + LLM + ConversationBufferMemory
Q&A over documents	RAG Chain	Document Loader + Splitter + Embeddings + VectorStore + Retriever
Customer support bot	Agent + RAG + Tools	Agent with retriever tool + API tools + ticket creation
Data extraction	Chain + Output Parser	Prompt + LLM + JsonOutputParser/PydanticOutputParser
Multi-step workflow	LangGraph	StateGraph with conditional edges and tool nodes
Content generation	Chain + Templates	Multiple prompt templates chained with LLM

What's the Bottom Line on LangChain in 2026?

Start simple — basic LCEL handles 80% of use cases. Add RAG when you need external knowledge, agents for actions, LangGraph for stateful workflows. GPT-4o-mini covers most chains; route to Sonnet/Groq for specialized needs.

LangChain in 2026 is mature, well-documented, and the standard framework for multi-model LLM applications. The key to success is starting simple -- a basic LCEL chain handles 80% of use cases -- and adding complexity (RAG, agents, LangGraph) only when needed.

For model selection, use GPT-4o-mini for most chains, Claude Sonnet 4.6 for complex reasoning, and Groq for speed. TokenMix.ai provides unified access to all these models through LangChain's OpenAI-compatible interface, letting you switch providers without code changes.

Start with the code examples in this guide. Build your first chain today, add RAG when you need external knowledge, and graduate to agents when your application needs to take actions.

FAQ

Is LangChain still relevant in 2026?

Yes. LangChain has over 100K GitHub stars and remains the most-used framework for LLM applications. The introduction of LCEL and LangGraph addressed earlier criticisms about complexity. For multi-model, multi-step applications, LangChain provides the most mature ecosystem.

What is the difference between LangChain and LangGraph?

LangChain provides the core abstractions (chains, prompts, retrievers, tools). LangGraph extends LangChain for building stateful, multi-agent workflows with cycles and conditional logic. Use LangChain for linear chains and simple agents. Use LangGraph when you need complex agent orchestration with branching logic.

Can I use LangChain with local LLMs?

Yes. LangChain supports local LLMs through integrations with Ollama, llama.cpp, vLLM, and Hugging Face. Install langchain-community and use the appropriate wrapper class. Local models are free to run but require GPU hardware.

How do I reduce LangChain costs?

Four strategies: (1) use GPT-4o-mini instead of GPT-4o for simple tasks, saving 90%+ on token costs, (2) implement caching for repeated queries, (3) use prompt compression to reduce token usage, (4) route through TokenMix.ai for discounted API access across all providers.

What is LCEL in LangChain?

LCEL (LangChain Expression Language) is the standard way to compose chains in LangChain using the pipe operator (|). It replaced older chain classes like LLMChain and SequentialChain. LCEL chains support streaming, async execution, and built-in fallbacks.

How do I add memory to a LangChain chatbot?

Use RunnableWithMessageHistory to add conversation memory to any chain. Connect it to a message store (in-memory, Redis, or PostgreSQL) to persist chat history across sessions. For agents, pass chat_history as a prompt variable.

Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: LangChain Documentation, LangSmith, GitHub LangChain, TokenMix.ai