langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-05-30 19:49:09 +00:00

History

Yuxin Chen 3256b5d6ae text-splitters: fix state persistence issue in ExperimentalMarkdownSyntaxTextSplitter (#28373 ) - Description: This PR resolves an issue with the `ExperimentalMarkdownSyntaxTextSplitter` class, which retains the internal state across multiple calls to the `split_text` method. This behaviour caused an unintended accumulation of chunks in `self` variables, leading to incorrect outputs when processing multiple Markdown files sequentially. - Modified `libs\text-splitters\langchain_text_splitters\markdown.py` to reset the relevant internal attributes at the start of each `split_text` invocation. This ensures each call processes the input independently. - Added unit tests in `libs\text-splitters\tests\unit_tests\test_text_splitters.py` to verify the fix and ensure the state does not persist across calls. - Issue: Fixes [#26440](https://github.com/langchain-ai/langchain/issues/26440). - Dependencies: No additional dependencies are introduced with this change. - [x] Unit tests were added to verify the changes. - [x] Updated documentation where necessary. - [x] Ran `make format`, `make lint`, and `make test` to ensure compliance with project standards. --------- Co-authored-by: Angel Chen <angelchen396@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>		2024-12-18 20:27:59 +00:00
..
xsl	text-splitters[minor]: Adding a new section aware splitter to langchain (#16526 )	2024-04-01 20:32:26 +00:00
__init__.py	text-splitters: add pydocstyle linting (#28127 )	2024-12-09 06:01:03 +00:00
base.py	text-splitters: add pydocstyle linting (#28127 )	2024-12-09 06:01:03 +00:00
character.py	text-splitters: add pydocstyle linting (#28127 )	2024-12-09 06:01:03 +00:00
html.py	text-splitters: add pydocstyle linting (#28127 )	2024-12-09 06:01:03 +00:00
json.py	text-splitters: add pydocstyle linting (#28127 )	2024-12-09 06:01:03 +00:00
konlpy.py	text-splitters[minor], langchain[minor], community[patch], templates, docs: langchain-text-splitters 0.0.1 (#18346 )	2024-02-29 18:33:21 -08:00
latex.py	text-splitters[minor], langchain[minor], community[patch], templates, docs: langchain-text-splitters 0.0.1 (#18346 )	2024-02-29 18:33:21 -08:00
markdown.py	text-splitters: fix state persistence issue in ExperimentalMarkdownSyntaxTextSplitter (#28373 )	2024-12-18 20:27:59 +00:00
nltk.py	text-splitters: Inconsistent results with `NLTKTextSplitter`'s `add_start_index=True` (#27782 )	2024-12-16 19:53:15 +00:00
py.typed	text-splitters[minor], langchain[minor], community[patch], templates, docs: langchain-text-splitters 0.0.1 (#18346 )	2024-02-29 18:33:21 -08:00
python.py	text-splitters[minor], langchain[minor], community[patch], templates, docs: langchain-text-splitters 0.0.1 (#18346 )	2024-02-29 18:33:21 -08:00
sentence_transformers.py	text-splitters: Inconsistent results with `NLTKTextSplitter`'s `add_start_index=True` (#27782 )	2024-12-16 19:53:15 +00:00
spacy.py	text-splitters: add pydocstyle linting (#28127 )	2024-12-09 06:01:03 +00:00