langchain/libs/text-splitters/langchain_text_splitters
Raghu Kapur 2c9859956a
text-splitters: fix stale header metadata in ExperimentalMarkdownSyntaxTextSplitter (#31622)
**Description:**

Previously, when transitioning from a deeper Markdown header (e.g., ###)
to a shallower one (e.g., ##), the
ExperimentalMarkdownSyntaxTextSplitter retained the deeper header in the
metadata.

This commit updates the `_resolve_header_stack` method to remove headers
at the same or deeper levels before appending the current header. As a
result, each chunk now reflects only the active header context.

Fixes unexpected metadata leakage across sections in nested Markdown
documents.

Additionally, test cases have been updated to:
- Validate correct header resolution and metadata assignment.
- Cover edge cases with nested headers and horizontal rules.

**Issue:** 
Fixes [#31596](https://github.com/langchain-ai/langchain/issues/31596)

**Dependencies:**
None

**Twitter handle:** -> [_RaghuKapur](https://twitter.com/_RaghuKapur)

**LinkedIn:** ->
[https://www.linkedin.com/in/raghukapur/](https://www.linkedin.com/in/raghukapur/)
2025-06-20 15:52:17 -04:00
..
xsl text-splitters[patch]: delete unused html_chunks_with_headers.xslt (#29340) 2025-01-23 11:29:08 -05:00
__init__.py text-splitters: Add JSFrameworkTextSplitter for Handling JavaScript Framework Code (#28972) 2025-03-17 23:32:33 +00:00
base.py text-splitters[patch]: fix some import-untyped errors (#31030) 2025-05-15 11:34:22 -04:00
character.py text-splitters: Fix regex separator merge bug in CharacterTextSplitter (#31137) 2025-05-10 15:42:03 -04:00
html.py text-splitters: Add keep_separator arg to HTMLSemanticPreservingSplitter (#31588) 2025-06-14 17:56:14 -04:00
json.py text-splitters: Set strict mypy rules (#30900) 2025-04-22 20:41:24 -07:00
jsx.py text-splitters: Add JSFrameworkTextSplitter for Handling JavaScript Framework Code (#28972) 2025-03-17 23:32:33 +00:00
konlpy.py text-splitters[patch]: fix some import-untyped errors (#31030) 2025-05-15 11:34:22 -04:00
latex.py
markdown.py text-splitters: fix stale header metadata in ExperimentalMarkdownSyntaxTextSplitter (#31622) 2025-06-20 15:52:17 -04:00
nltk.py text-splitters[patch]: fix some import-untyped errors (#31030) 2025-05-15 11:34:22 -04:00
py.typed
python.py
sentence_transformers.py text-splitters: Set strict mypy rules (#30900) 2025-04-22 20:41:24 -07:00
spacy.py text-splitters[patch]: fix some import-untyped errors (#31030) 2025-05-15 11:34:22 -04:00