Building applications with LLMs through composability
Go to file
Keyu Chen 03138f41a0
feat(text-splitters): add optional custom header pattern support (#31887)
## Description

This PR adds support for custom header patterns in
`MarkdownHeaderTextSplitter`, allowing users to define non-standard
Markdown header formats (like `**Header**`) and specify their hierarchy
levels.

**Issue:** Fixes #22738

**Dependencies:** None - this change has no new dependencies

**Key Changes:**
- Added optional `custom_header_patterns` parameter to support
non-standard header formats
- Enable splitting on patterns like `**Header**` and `***Header***`
- Maintain full backward compatibility with existing usage
- Added comprehensive tests for custom and mixed header scenarios

## Example Usage

```python
from langchain_text_splitters import MarkdownHeaderTextSplitter

headers_to_split_on = [
    ("**", "Chapter"),
    ("***", "Section"),
]

custom_header_patterns = {
    "**": 1,   # Level 1 headers
    "***": 2,  # Level 2 headers
}

splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=headers_to_split_on,
    custom_header_patterns=custom_header_patterns,
)

# Now **Chapter 1** is treated as a level 1 header
# And ***Section 1.1*** is treated as a level 2 header
```

## Testing

-  Added unit tests for custom header patterns
-  Added tests for mixed standard and custom headers
-  All existing tests pass (backward compatibility maintained)
-  Linting and formatting checks pass

---

The implementation provides a flexible solution while maintaining the
simplicity of the existing API. Users can continue using the splitter
exactly as before, with the new functionality being entirely opt-in
through the `custom_header_patterns` parameter.

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Claude <noreply@anthropic.com>
2025-08-18 10:10:49 -04:00
.devcontainer chore: formatting across codebase (#32466) 2025-08-08 10:20:10 -04:00
.github chore: update CONTRIBUTING.md to more clearly mention forum (#32509) 2025-08-11 23:02:21 +00:00
.vscode feat: port various nit changes from wip-v0.4 (#32506) 2025-08-11 15:09:08 -04:00
cookbook docs: clarify SystemMessage usage in LangGraph agent notebook (#32320) (#32346) 2025-08-11 19:49:42 +00:00
docs docs: add details on message IDs and their assignment process (#32534) 2025-08-15 18:22:28 +00:00
libs feat(text-splitters): add optional custom header pattern support (#31887) 2025-08-18 10:10:49 -04:00
scripts fix: scripts/ errors 2025-07-28 15:03:25 -04:00
.editorconfig chore: add .editorconfig for consistent coding styles across files (#32261) 2025-07-27 23:25:30 -04:00
.gitattributes
.gitignore feat: add VSCode configuration files for Python development (#32263) 2025-07-27 23:37:59 -04:00
.markdownlint.json chore: formatting across codebase (#32466) 2025-08-08 10:20:10 -04:00
.pre-commit-config.yaml refactor: markdownlint (#32259) 2025-07-27 20:00:16 -04:00
.readthedocs.yaml refactor: markdownlint (#32259) 2025-07-27 20:00:16 -04:00
CITATION.cff
CLAUDE.md chore: add CLAUDE.md (#32334) 2025-07-30 23:04:45 +00:00
LICENSE
Makefile fix(docs): local API reference documentation build (#32271) 2025-07-28 00:50:20 -04:00
MIGRATE.md refactor: markdownlint (#32259) 2025-07-27 20:00:16 -04:00
poetry.toml
pyproject.toml feat(docs): improve devx, fix Makefile targets (#32237) 2025-07-25 14:49:03 -04:00
README.md chore: add Chat LangChain to README.md (#32545) 2025-08-14 16:15:27 -04:00
SECURITY.md fix: update link text for reporting security vulnerabilities in SECURITY.md 2025-07-28 15:05:31 -04:00
uv.lock docs(docs): Add RecallIO.AI as a memory provider (#32331) 2025-08-13 15:09:56 +00:00
yarn.lock box: add langchain box package and DocumentLoader (#25506) 2024-08-21 02:23:43 +00:00

LangChain Logo

Release Notes PyPI - License PyPI - Downloads GitHub star chart Open in Dev Containers Open in Github Codespace CodSpeed Badge Twitter

Note

Looking for the JS/TS library? Check out LangChain.js.

LangChain is a framework for building LLM-powered applications. It helps you chain together interoperable components and third-party integrations to simplify AI application development — all while future-proofing decisions as the underlying technology evolves.

pip install -U langchain

To learn more about LangChain, check out the docs. If youre looking for more advanced customization or agent orchestration, check out LangGraph, our framework for building controllable agent workflows.

Why use LangChain?

LangChain helps developers build applications powered by LLMs through a standard interface for models, embeddings, vector stores, and more.

Use LangChain for:

  • Real-time data augmentation. Easily connect LLMs to diverse data sources and external / internal systems, drawing from LangChains vast library of integrations with model providers, tools, vector stores, retrievers, and more.
  • Model interoperability. Swap models in and out as your engineering team experiments to find the best choice for your applications needs. As the industry frontier evolves, adapt quickly — LangChains abstractions keep you moving without losing momentum.

LangChains ecosystem

While the LangChain framework can be used standalone, it also integrates seamlessly with any LangChain product, giving developers a full suite of tools when building LLM applications.

To improve your LLM application development, pair LangChain with:

  • LangSmith - Helpful for agent evals and observability. Debug poor-performing LLM app runs, evaluate agent trajectories, gain visibility in production, and improve performance over time.
  • LangGraph - Build agents that can reliably handle complex tasks with LangGraph, our low-level agent orchestration framework. LangGraph offers customizable architecture, long-term memory, and human-in-the-loop workflows — and is trusted in production by companies like LinkedIn, Uber, Klarna, and GitLab.
  • LangGraph Platform - Deploy and scale agents effortlessly with a purpose-built deployment platform for long running, stateful workflows. Discover, reuse, configure, and share agents across teams — and iterate quickly with visual prototyping in LangGraph Studio.

Additional resources

  • Tutorials: Simple walkthroughs with guided examples on getting started with LangChain.
  • How-to Guides: Quick, actionable code snippets for topics such as tool calling, RAG use cases, and more.
  • Conceptual Guides: Explanations of key concepts behind the LangChain framework.
  • LangChain Forum: Connect with the community and share all of your technical questions, ideas, and feedback.
  • API Reference: Detailed reference on navigating base packages and integrations for LangChain.
  • Chat LangChain: Ask questions & chat with our documentation