Summary Fixes an issue where HTMLSemanticPreservingSplitter failed to preserve elements nested inside non-container tags. With these changes, preserved elements are now correctly detected and handled at any nesting depth. Root Cause `_process_element()` only recursed into a small set of hard-coded container tags (`html`, `body`, `div`, `main`). For other tags, the subtree was flattened into text, preventing nested preserved elements (inside `<p>`, `<section>`, `<article>`, etc.) from being detected. Fix - Updated traversal logic in _process_element (html.py) to recursively process child elements for any tag that contains nested elements - Avoided duplicate text extraction - Preserved correct placeholder ordering - Treated leaf nodes as text only Tests Adds regression tests covering preserved elements nested inside non-container tags, including: - table inside section - nested divs - code inside paragraph All existing tests pass (make lint, format, test, etc). Breaking changes None. Fixes Fixes #31569 Disclaimer GitHub Copilot was used to assist with test case design in test_text_splitters.py and documentation comments; all code logic was manually implemented and reviewed. --------- Co-authored-by: julih <julih@julihs-MacBook-Pro.local> Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>
🦜✂️ LangChain Text Splitters
Looking for the JS/TS version? Check out LangChain.js.
Quick Install
pip install langchain-text-splitters
🤔 What is this?
LangChain Text Splitters contains utilities for splitting into chunks a wide variety of text documents.
📖 Documentation
For full documentation, see the API reference.
📕 Releases & Versioning
See our Releases and Versioning policies.
We encourage pinning your version to a specific version in order to avoid breaking your CI when we publish new tests. We recommend upgrading to the latest version periodically to make sure you have the latest tests.
Not pinning your version will ensure you always have the latest tests, but it may also break your CI if we introduce tests that your integration doesn't pass.
💁 Contributing
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.
For detailed information on how to contribute, see the Contributing Guide.