langchain/libs/text-splitters
Yuxin Chen 3256b5d6ae
text-splitters: fix state persistence issue in ExperimentalMarkdownSyntaxTextSplitter (#28373)
- **Description:** 
This PR resolves an issue with the
`ExperimentalMarkdownSyntaxTextSplitter` class, which retains the
internal state across multiple calls to the `split_text` method. This
behaviour caused an unintended accumulation of chunks in `self`
variables, leading to incorrect outputs when processing multiple
Markdown files sequentially.

- Modified `libs\text-splitters\langchain_text_splitters\markdown.py` to
reset the relevant internal attributes at the start of each `split_text`
invocation. This ensures each call processes the input independently.
- Added unit tests in
`libs\text-splitters\tests\unit_tests\test_text_splitters.py` to verify
the fix and ensure the state does not persist across calls.

- **Issue:**  
Fixes [#26440](https://github.com/langchain-ai/langchain/issues/26440).

- **Dependencies:**
No additional dependencies are introduced with this change.


- [x] Unit tests were added to verify the changes.
- [x] Updated documentation where necessary.  
- [x] Ran `make format`, `make lint`, and `make test` to ensure
compliance with project standards.

---------

Co-authored-by: Angel Chen <angelchen396@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-18 20:27:59 +00:00
..
langchain_text_splitters text-splitters: fix state persistence issue in ExperimentalMarkdownSyntaxTextSplitter (#28373) 2024-12-18 20:27:59 +00:00
scripts multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
tests text-splitters: fix state persistence issue in ExperimentalMarkdownSyntaxTextSplitter (#28373) 2024-12-18 20:27:59 +00:00
extended_testing_deps.txt multiple: get rid of pyproject extras (#22581) 2024-06-06 15:45:22 -07:00
Makefile text-splitters: Inconsistent results with NLTKTextSplitter's add_start_index=True (#27782) 2024-12-16 19:53:15 +00:00
poetry.lock text-splitters: release 0.3.4 (#28795) 2024-12-18 09:44:36 -08:00
pyproject.toml text-splitters: release 0.3.4 (#28795) 2024-12-18 09:44:36 -08:00
README.md docs: more api ref links, add linting step to prevent more (#28495) 2024-12-04 04:19:42 +00:00

🦜✂️ LangChain Text Splitters

Downloads License: MIT

Quick Install

pip install langchain-text-splitters

What is it?

LangChain Text Splitters contains utilities for splitting into chunks a wide variety of text documents.

For full documentation see the API reference and the Text Splitters module in the main docs.

📕 Releases & Versioning

langchain-text-splitters is currently on version 0.0.x.

Minor version increases will occur for:

  • Breaking changes for any public interfaces NOT marked beta

Patch version increases will occur for:

  • Bug fixes
  • New features
  • Any changes to private interfaces
  • Any changes to beta features

💁 Contributing

As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.

For detailed information on how to contribute, see the Contributing Guide.