Files
langchain/libs/text-splitters
Julia (Juli) Huang cd5b36456a fix(text-splitters): HTMLSemanticPreservingSplitter nested preserved … (#34587)
Summary
Fixes an issue where HTMLSemanticPreservingSplitter failed to preserve
elements nested inside non-container tags. With these changes, preserved
elements are now correctly detected and handled at any nesting depth.

Root Cause
`_process_element()` only recursed into a small set of hard-coded
container tags (`html`, `body`, `div`, `main`). For other tags, the
subtree was flattened into text, preventing nested preserved elements
(inside `<p>`, `<section>`, `<article>`, etc.) from being detected.


Fix
- Updated traversal logic in _process_element (html.py) to recursively
process child elements for any tag that contains nested elements
- Avoided duplicate text extraction
- Preserved correct placeholder ordering
- Treated leaf nodes as text only

Tests
Adds regression tests covering preserved elements nested inside
non-container tags, including:
- table inside section
- nested divs
- code inside paragraph

All existing tests pass (make lint, format, test, etc).

Breaking changes
None.

Fixes
Fixes #31569

Disclaimer
GitHub Copilot was used to assist with test case design in
test_text_splitters.py and documentation comments; all code logic was
manually implemented and reviewed.

---------

Co-authored-by: julih <julih@julihs-MacBook-Pro.local>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2026-01-05 10:28:27 -05:00
..

🦜✂️ LangChain Text Splitters

PyPI - Version PyPI - License PyPI - Downloads Twitter

Looking for the JS/TS version? Check out LangChain.js.

Quick Install

pip install langchain-text-splitters

🤔 What is this?

LangChain Text Splitters contains utilities for splitting into chunks a wide variety of text documents.

📖 Documentation

For full documentation, see the API reference.

📕 Releases & Versioning

See our Releases and Versioning policies.

We encourage pinning your version to a specific version in order to avoid breaking your CI when we publish new tests. We recommend upgrading to the latest version periodically to make sure you have the latest tests.

Not pinning your version will ensure you always have the latest tests, but it may also break your CI if we introduce tests that your integration doesn't pass.

💁 Contributing

As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.

For detailed information on how to contribute, see the Contributing Guide.