langchain/libs/text-splitters
Raghu Kapur 2c9859956a
text-splitters: fix stale header metadata in ExperimentalMarkdownSyntaxTextSplitter (#31622)
**Description:**

Previously, when transitioning from a deeper Markdown header (e.g., ###)
to a shallower one (e.g., ##), the
ExperimentalMarkdownSyntaxTextSplitter retained the deeper header in the
metadata.

This commit updates the `_resolve_header_stack` method to remove headers
at the same or deeper levels before appending the current header. As a
result, each chunk now reflects only the active header context.

Fixes unexpected metadata leakage across sections in nested Markdown
documents.

Additionally, test cases have been updated to:
- Validate correct header resolution and metadata assignment.
- Cover edge cases with nested headers and horizontal rules.

**Issue:** 
Fixes [#31596](https://github.com/langchain-ai/langchain/issues/31596)

**Dependencies:**
None

**Twitter handle:** -> [_RaghuKapur](https://twitter.com/_RaghuKapur)

**LinkedIn:** ->
[https://www.linkedin.com/in/raghukapur/](https://www.linkedin.com/in/raghukapur/)
2025-06-20 15:52:17 -04:00
..
langchain_text_splitters text-splitters: fix stale header metadata in ExperimentalMarkdownSyntaxTextSplitter (#31622) 2025-06-20 15:52:17 -04:00
scripts multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
tests text-splitters: fix stale header metadata in ExperimentalMarkdownSyntaxTextSplitter (#31622) 2025-06-20 15:52:17 -04:00
extended_testing_deps.txt multiple: get rid of pyproject extras (#22581) 2024-06-06 15:45:22 -07:00
Makefile text-splitters: Set strict mypy rules (#30900) 2025-04-22 20:41:24 -07:00
pyproject.toml text-splitters[patch]: fix some import-untyped errors (#31030) 2025-05-15 11:34:22 -04:00
README.md docs: more api ref links, add linting step to prevent more (#28495) 2024-12-04 04:19:42 +00:00
uv.lock chore: Bump langsmith in splitter uv (#31626) 2025-06-16 16:58:46 -07:00

🦜✂️ LangChain Text Splitters

Downloads License: MIT

Quick Install

pip install langchain-text-splitters

What is it?

LangChain Text Splitters contains utilities for splitting into chunks a wide variety of text documents.

For full documentation see the API reference and the Text Splitters module in the main docs.

📕 Releases & Versioning

langchain-text-splitters is currently on version 0.0.x.

Minor version increases will occur for:

  • Breaking changes for any public interfaces NOT marked beta

Patch version increases will occur for:

  • Bug fixes
  • New features
  • Any changes to private interfaces
  • Any changes to beta features

💁 Contributing

As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.

For detailed information on how to contribute, see the Contributing Guide.