Manas karthik
048de6dfb6
test(text-splitters): add edge case tests for CharacterTextSplitter ( #34628 )
2026-01-07 11:06:44 -05:00
Julia (Juli) Huang
cd5b36456a
fix(text-splitters): HTMLSemanticPreservingSplitter nested preserved … ( #34587 )
...
Summary
Fixes an issue where HTMLSemanticPreservingSplitter failed to preserve
elements nested inside non-container tags. With these changes, preserved
elements are now correctly detected and handled at any nesting depth.
Root Cause
`_process_element()` only recursed into a small set of hard-coded
container tags (`html`, `body`, `div`, `main`). For other tags, the
subtree was flattened into text, preventing nested preserved elements
(inside `<p>`, `<section>`, `<article>`, etc.) from being detected.
Fix
- Updated traversal logic in _process_element (html.py) to recursively
process child elements for any tag that contains nested elements
- Avoided duplicate text extraction
- Preserved correct placeholder ordering
- Treated leaf nodes as text only
Tests
Adds regression tests covering preserved elements nested inside
non-container tags, including:
- table inside section
- nested divs
- code inside paragraph
All existing tests pass (make lint, format, test, etc).
Breaking changes
None.
Fixes
Fixes #31569
Disclaimer
GitHub Copilot was used to assist with test case design in
test_text_splitters.py and documentation comments; all code logic was
manually implemented and reviewed.
---------
Co-authored-by: julih <julih@julihs-MacBook-Pro.local >
Co-authored-by: Mason Daugherty <github@mdrxy.com >
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2026-01-05 10:28:27 -05:00
Christophe Bornet
e03d6b80d5
chore(deps): bump mypy to v1.19 and ruff to v1.14 ( #34521 )
...
* Set mypy to >=1.19.1,<1.20
* Set ruff to >=0.14.10,<0.15
2025-12-29 18:07:55 -06:00
Christophe Bornet
ea25f5ebdd
chore(text-splitters): bump dependency locks for python 3.14 ( #34522 )
...
* Support sentence-transformers optional dep on python 3.14
* Bump some dep locks to use pre-built wheels instead of building them
(murmurhash, cymem, preshed, thinc, srsly, blis)
* Still not possible to use spacy: even though there are wheels
available, spacy depends on Pydantic v1 which doesn't work on Python
3.14.
* Speeds up installation and CI.
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-12-29 17:55:34 -06:00
Christophe Bornet
5884fb9523
style(text-splitters,standard-tests,cli): add ruff TC and RUF012 rules ( #34495 )
...
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-12-27 01:41:33 -06:00
Christophe Bornet
d46187201d
style: add ruff ISC001 rule ( #34493 )
...
ISC001 doesn't conflict anymore with the formatter. See
https://github.com/astral-sh/ruff/issues/8272
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-12-26 21:39:56 -06:00
ccurme
795e746ca7
release(core): 1.2.3 ( #34421 )
2025-12-18 15:06:32 -05:00
Mason Daugherty
71778cb721
feat(infra): add CI check for out of date lockfiles ( #34397 )
2025-12-16 22:23:25 -05:00
Mason Daugherty
0cd72b50fb
release(text-splitters): 1.1.0 ( #34346 )
2025-12-13 20:13:03 -05:00
rari404
0f940d74b2
feat(text-splitters): add R programming language support ( #34241 )
2025-12-12 13:34:22 -05:00
William FH
1867521d1a
feat: Use uuid7 for run ids ( #34172 )
...
Co-authored-by: Sydney Runkle <54324534+sydney-runkle@users.noreply.github.com >
Co-authored-by: Sydney Runkle <sydneymarierunkle@gmail.com >
2025-12-03 10:09:10 -08:00
Christophe Bornet
ef79c26f18
chore(cli,standard-tests,text-splitters): fix some ruff TC rules ( #33934 )
...
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-11-12 14:06:31 -05:00
Mason Daugherty
e023201d42
style: some cleanup ( #33857 )
2025-11-06 23:50:46 -05:00
Mason Daugherty
d40e340479
chore: attribute package change versions ( #33854 )
...
Needed to disambiguate for within inherited docs
2025-11-06 16:57:30 -05:00
Mason Daugherty
123e29dc26
style: more refs fixes ( #33730 )
2025-10-29 16:34:46 -04:00
Mason Daugherty
f15391f4fc
chore(text-splitters): API reference link in README ( #33713 )
2025-10-28 23:28:48 -04:00
Mason Daugherty
e5e1d6c705
style: more refs work ( #33707 )
2025-10-28 14:43:28 -04:00
Mason Daugherty
db7f2db1ae
feat(infra): langchain docs MCP ( #33636 )
2025-10-22 11:50:35 -04:00
Mason Daugherty
e731ba1e47
style: more refs work ( #33616 )
2025-10-20 18:40:19 -04:00
Mason Daugherty
64e6798a39
chore: update pyproject.toml url entries ( #33587 )
2025-10-17 17:16:55 -04:00
ccurme
3152d25811
fix: support python 3.14 in various projects ( #33575 )
...
Co-authored-by: cbornet <cbornet@hotmail.com >
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-10-17 11:06:23 -04:00
ccurme
3b8cb3d4b6
release(text-splitters): 1.0.0 ( #33565 )
2025-10-17 10:30:42 -04:00
Mason Daugherty
26e0a00c4c
style: more work for refs ( #33508 )
...
Largely:
- Remove explicit `"Default is x"` since new refs show default inferred
from sig
- Inline code (useful for eventual parsing)
- Fix code block rendering (indentations)
2025-10-15 18:46:55 -04:00
Mason Daugherty
79200cf3c2
docs: update package READMEs ( #33488 )
2025-10-15 10:49:35 -04:00
Christophe Bornet
83901b30e3
chore(text-splitters): remove arg types from docstrings ( #33406 )
...
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-10-10 11:37:53 -04:00
Mason Daugherty
6fc21afbc9
style: .. code-block:: admonition translations ( #33400 )
...
biiiiiiiiiiiiiiiigggggggg pass
2025-10-09 16:52:58 -04:00
Mason Daugherty
d8a680ee57
style: address Sphinx double-backtick snippet syntax ( #33389 )
2025-10-09 13:35:51 -04:00
Mason Daugherty
b6132fc23e
style: remove more Optional syntax ( #33371 )
2025-10-08 23:28:43 -04:00
Mason Daugherty
d13823043d
style: monorepo pass for refs ( #33359 )
...
* Delete some double backticks previously used by Sphinx (not done
everywhere yet)
* Fix some code blocks / dropdowns
Ignoring CLI CI for now
2025-10-08 18:41:39 -04:00
Mason Daugherty
cda336295f
chore: enrich pyproject.toml files with links to new references, others ( #33343 )
2025-10-07 16:17:14 -04:00
Mason Daugherty
8bcdfbb24e
chore: clean up pyproject.toml files, use core a7 ( #33334 )
2025-10-07 10:49:04 -04:00
Christophe Bornet
20e04fc3dd
chore(text-splitters): cleanup ruff config ( #33247 )
...
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-10-06 17:02:31 -04:00
Mason Daugherty
90e4d944ac
chore(infra): pdm -> hatchling ( #33289 )
2025-10-05 23:52:52 -04:00
Mason Daugherty
ae5b105d11
docs: v1 docs updates ( #33173 )
...
Co-authored-by: Mohammad Mohtashim <45242107+keenborder786@users.noreply.github.com >
Co-authored-by: Caspar Broekhuizen <caspar@langchain.dev >
Co-authored-by: ccurme <chester.curme@gmail.com >
Co-authored-by: Christophe Bornet <cbornet@hotmail.com >
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com >
Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com >
Co-authored-by: Vadym Barda <vadim.barda@gmail.com >
2025-10-02 18:46:26 -04:00
Mason Daugherty
ae16392ada
release(text-splitters): 1.0.0a1 ( #33214 )
2025-10-02 13:56:10 -04:00
Mason Daugherty
5e8cb58e6a
refactor(text-splitters): drop python 3.9 ( #33212 )
2025-10-02 13:51:10 -04:00
Mason Daugherty
eaa6dcce9e
release: v1.0.0 ( #32567 )
...
Co-authored-by: Mohammad Mohtashim <45242107+keenborder786@users.noreply.github.com >
Co-authored-by: Caspar Broekhuizen <caspar@langchain.dev >
Co-authored-by: ccurme <chester.curme@gmail.com >
Co-authored-by: Christophe Bornet <cbornet@hotmail.com >
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com >
Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com >
Co-authored-by: Vadym Barda <vadim.barda@gmail.com >
2025-10-02 10:49:42 -04:00
Mason Daugherty
986302322f
docs: more standardization ( #33124 )
2025-09-25 20:46:20 -04:00
Christophe Bornet
eaf8dce7c2
chore: bump ruff version to 0.13 ( #33043 )
...
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-09-25 12:27:39 -04:00
Mason Daugherty
e3efd1e891
test(text-splitters): capture beta warnings ( #33113 )
2025-09-25 01:30:20 -04:00
Mason Daugherty
d6769cf032
test(text-splitters): resolve pytest marker warning ( #33112 )
...
#33111
2025-09-25 01:29:42 -04:00
Mason Daugherty
781db9d892
chore: update pyproject.toml files, remove codespell ( #33028 )
...
- Removes Codespell from deps, docs, and `Makefile`s
- Python version requirements in all `pyproject.toml` files now use the
`~=` (compatible release) specifier
- All dependency groups and main dependencies now use explicit lower and
upper bounds, reducing potential for breaking changes
2025-09-20 22:09:33 -04:00
Christophe Bornet
cbaf97ada4
chore: bump mypy version to 1.18 ( #32914 )
2025-09-12 09:19:23 -04:00
Hyunjoon Jeong
9cc85387d1
fix(text-splitters): add validation to prevent infinite loop and prevent empty token splitter ( #32205 )
...
### Description
1) Add validation to prevent infinite loop condition when
```tokenizer.tokens_per_chunk > tokenizer.chunk_overlap```
2) Avoid empty decoded chunk when splitter appends tokens
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com >
2025-09-11 16:55:32 -04:00
Mason Daugherty
7a158c7f1c
revert: "chore: remove ruff target-version" ( #32895 )
...
Reverts langchain-ai/langchain#32880
Not needed at the moment, will do when finishing v1
2025-09-10 20:56:48 -04:00
Christophe Bornet
b274416441
chore: remove ruff target-version ( #32880 )
...
This is not needed anymore since `requires-python` was added when moving
to `uv`.
2025-09-10 11:12:30 -04:00
Mason Daugherty
c124e67325
chore(docs): update package READMEs ( #32869 )
...
- Fix badges
- Focus on agents
- Cut down fluff
2025-09-09 14:50:32 +00:00
Christophe Bornet
8b90eae455
chore(text-splitters): enable ruff docstring-code-format ( #32854 )
2025-09-08 16:40:11 -04:00
Christophe Bornet
0c3e8ccd0e
chore(text-splitters): select ALL rules with exclusions ( #32325 )
...
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-09-08 14:46:09 +00:00
Mason Daugherty
6b5fdfb804
release(text-splitters): 0.3.11 ( #32770 )
...
Fixes #32747
SpaCy integration test fixture was trying to use pip to download the
SpaCy language model (`en_core_web_sm`), but uv-managed environments
don't include pip by default. Fail test if not installed as opposed to
downloading.
2025-08-31 23:00:05 +00:00