Mason Daugherty
0cd72b50fb
release(text-splitters): 1.1.0 ( #34346 )
2025-12-13 20:13:03 -05:00
rari404
0f940d74b2
feat(text-splitters): add R programming language support ( #34241 )
2025-12-12 13:34:22 -05:00
William FH
1867521d1a
feat: Use uuid7 for run ids ( #34172 )
...
Co-authored-by: Sydney Runkle <54324534+sydney-runkle@users.noreply.github.com >
Co-authored-by: Sydney Runkle <sydneymarierunkle@gmail.com >
2025-12-03 10:09:10 -08:00
Christophe Bornet
ef79c26f18
chore(cli,standard-tests,text-splitters): fix some ruff TC rules ( #33934 )
...
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-11-12 14:06:31 -05:00
Mason Daugherty
e023201d42
style: some cleanup ( #33857 )
2025-11-06 23:50:46 -05:00
Mason Daugherty
d40e340479
chore: attribute package change versions ( #33854 )
...
Needed to disambiguate for within inherited docs
2025-11-06 16:57:30 -05:00
Mason Daugherty
123e29dc26
style: more refs fixes ( #33730 )
2025-10-29 16:34:46 -04:00
Mason Daugherty
f15391f4fc
chore(text-splitters): API reference link in README ( #33713 )
2025-10-28 23:28:48 -04:00
Mason Daugherty
e5e1d6c705
style: more refs work ( #33707 )
2025-10-28 14:43:28 -04:00
Mason Daugherty
db7f2db1ae
feat(infra): langchain docs MCP ( #33636 )
2025-10-22 11:50:35 -04:00
Mason Daugherty
e731ba1e47
style: more refs work ( #33616 )
2025-10-20 18:40:19 -04:00
Mason Daugherty
64e6798a39
chore: update pyproject.toml url entries ( #33587 )
2025-10-17 17:16:55 -04:00
ccurme
3152d25811
fix: support python 3.14 in various projects ( #33575 )
...
Co-authored-by: cbornet <cbornet@hotmail.com >
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-10-17 11:06:23 -04:00
ccurme
3b8cb3d4b6
release(text-splitters): 1.0.0 ( #33565 )
2025-10-17 10:30:42 -04:00
Mason Daugherty
26e0a00c4c
style: more work for refs ( #33508 )
...
Largely:
- Remove explicit `"Default is x"` since new refs show default inferred
from sig
- Inline code (useful for eventual parsing)
- Fix code block rendering (indentations)
2025-10-15 18:46:55 -04:00
Mason Daugherty
79200cf3c2
docs: update package READMEs ( #33488 )
2025-10-15 10:49:35 -04:00
Christophe Bornet
83901b30e3
chore(text-splitters): remove arg types from docstrings ( #33406 )
...
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-10-10 11:37:53 -04:00
Mason Daugherty
6fc21afbc9
style: .. code-block:: admonition translations ( #33400 )
...
biiiiiiiiiiiiiiiigggggggg pass
2025-10-09 16:52:58 -04:00
Mason Daugherty
d8a680ee57
style: address Sphinx double-backtick snippet syntax ( #33389 )
2025-10-09 13:35:51 -04:00
Mason Daugherty
b6132fc23e
style: remove more Optional syntax ( #33371 )
2025-10-08 23:28:43 -04:00
Mason Daugherty
d13823043d
style: monorepo pass for refs ( #33359 )
...
* Delete some double backticks previously used by Sphinx (not done
everywhere yet)
* Fix some code blocks / dropdowns
Ignoring CLI CI for now
2025-10-08 18:41:39 -04:00
Mason Daugherty
cda336295f
chore: enrich pyproject.toml files with links to new references, others ( #33343 )
2025-10-07 16:17:14 -04:00
Mason Daugherty
8bcdfbb24e
chore: clean up pyproject.toml files, use core a7 ( #33334 )
2025-10-07 10:49:04 -04:00
Christophe Bornet
20e04fc3dd
chore(text-splitters): cleanup ruff config ( #33247 )
...
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-10-06 17:02:31 -04:00
Mason Daugherty
90e4d944ac
chore(infra): pdm -> hatchling ( #33289 )
2025-10-05 23:52:52 -04:00
Mason Daugherty
ae5b105d11
docs: v1 docs updates ( #33173 )
...
Co-authored-by: Mohammad Mohtashim <45242107+keenborder786@users.noreply.github.com >
Co-authored-by: Caspar Broekhuizen <caspar@langchain.dev >
Co-authored-by: ccurme <chester.curme@gmail.com >
Co-authored-by: Christophe Bornet <cbornet@hotmail.com >
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com >
Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com >
Co-authored-by: Vadym Barda <vadim.barda@gmail.com >
2025-10-02 18:46:26 -04:00
Mason Daugherty
ae16392ada
release(text-splitters): 1.0.0a1 ( #33214 )
2025-10-02 13:56:10 -04:00
Mason Daugherty
5e8cb58e6a
refactor(text-splitters): drop python 3.9 ( #33212 )
2025-10-02 13:51:10 -04:00
Mason Daugherty
eaa6dcce9e
release: v1.0.0 ( #32567 )
...
Co-authored-by: Mohammad Mohtashim <45242107+keenborder786@users.noreply.github.com >
Co-authored-by: Caspar Broekhuizen <caspar@langchain.dev >
Co-authored-by: ccurme <chester.curme@gmail.com >
Co-authored-by: Christophe Bornet <cbornet@hotmail.com >
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com >
Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com >
Co-authored-by: Vadym Barda <vadim.barda@gmail.com >
2025-10-02 10:49:42 -04:00
Mason Daugherty
986302322f
docs: more standardization ( #33124 )
2025-09-25 20:46:20 -04:00
Christophe Bornet
eaf8dce7c2
chore: bump ruff version to 0.13 ( #33043 )
...
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-09-25 12:27:39 -04:00
Mason Daugherty
e3efd1e891
test(text-splitters): capture beta warnings ( #33113 )
2025-09-25 01:30:20 -04:00
Mason Daugherty
d6769cf032
test(text-splitters): resolve pytest marker warning ( #33112 )
...
#33111
2025-09-25 01:29:42 -04:00
Mason Daugherty
781db9d892
chore: update pyproject.toml files, remove codespell ( #33028 )
...
- Removes Codespell from deps, docs, and `Makefile`s
- Python version requirements in all `pyproject.toml` files now use the
`~=` (compatible release) specifier
- All dependency groups and main dependencies now use explicit lower and
upper bounds, reducing potential for breaking changes
2025-09-20 22:09:33 -04:00
Christophe Bornet
cbaf97ada4
chore: bump mypy version to 1.18 ( #32914 )
2025-09-12 09:19:23 -04:00
Hyunjoon Jeong
9cc85387d1
fix(text-splitters): add validation to prevent infinite loop and prevent empty token splitter ( #32205 )
...
### Description
1) Add validation to prevent infinite loop condition when
```tokenizer.tokens_per_chunk > tokenizer.chunk_overlap```
2) Avoid empty decoded chunk when splitter appends tokens
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com >
2025-09-11 16:55:32 -04:00
Mason Daugherty
7a158c7f1c
revert: "chore: remove ruff target-version" ( #32895 )
...
Reverts langchain-ai/langchain#32880
Not needed at the moment, will do when finishing v1
2025-09-10 20:56:48 -04:00
Christophe Bornet
b274416441
chore: remove ruff target-version ( #32880 )
...
This is not needed anymore since `requires-python` was added when moving
to `uv`.
2025-09-10 11:12:30 -04:00
Mason Daugherty
c124e67325
chore(docs): update package READMEs ( #32869 )
...
- Fix badges
- Focus on agents
- Cut down fluff
2025-09-09 14:50:32 +00:00
Christophe Bornet
8b90eae455
chore(text-splitters): enable ruff docstring-code-format ( #32854 )
2025-09-08 16:40:11 -04:00
Christophe Bornet
0c3e8ccd0e
chore(text-splitters): select ALL rules with exclusions ( #32325 )
...
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-09-08 14:46:09 +00:00
Mason Daugherty
6b5fdfb804
release(text-splitters): 0.3.11 ( #32770 )
...
Fixes #32747
SpaCy integration test fixture was trying to use pip to download the
SpaCy language model (`en_core_web_sm`), but uv-managed environments
don't include pip by default. Fail test if not installed as opposed to
downloading.
2025-08-31 23:00:05 +00:00
Christophe Bornet
e0a4af8d8b
docs(text-splitters): fix some docstrings ( #32767 )
2025-08-31 13:46:11 -05:00
Sydney Runkle
b26e52aa4d
chore(text-splitters): bump version of core ( #32740 )
2025-08-28 13:14:57 -04:00
Sydney Runkle
38cdd7a2ec
chore(text-splitters): relax max bound for langchain-core ( #32739 )
2025-08-28 13:05:47 -04:00
Mason Daugherty
3d08b6bd11
chore: adress pytest-asyncio deprecation warnings + other nits ( #32696 )
...
amongst some linting imcompatible rules
2025-08-26 15:51:38 -04:00
Maitrey Talware
622337a297
docs(docs): fixed typos in documentations ( #32661 )
...
Minor typo fixes. (Not linked to current open issues)
2025-08-25 10:02:53 -04:00
Christophe Bornet
73a7de63aa
chore(text-splitters): add mypy pydantic plugin ( #32611 )
2025-08-19 16:58:12 -04:00
Keyu Chen
03138f41a0
feat(text-splitters): add optional custom header pattern support ( #31887 )
...
## Description
This PR adds support for custom header patterns in
`MarkdownHeaderTextSplitter`, allowing users to define non-standard
Markdown header formats (like `**Header**`) and specify their hierarchy
levels.
**Issue:** Fixes #22738
**Dependencies:** None - this change has no new dependencies
**Key Changes:**
- Added optional `custom_header_patterns` parameter to support
non-standard header formats
- Enable splitting on patterns like `**Header**` and `***Header***`
- Maintain full backward compatibility with existing usage
- Added comprehensive tests for custom and mixed header scenarios
## Example Usage
```python
from langchain_text_splitters import MarkdownHeaderTextSplitter
headers_to_split_on = [
("**", "Chapter"),
("***", "Section"),
]
custom_header_patterns = {
"**": 1, # Level 1 headers
"***": 2, # Level 2 headers
}
splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=headers_to_split_on,
custom_header_patterns=custom_header_patterns,
)
# Now **Chapter 1** is treated as a level 1 header
# And ***Section 1.1*** is treated as a level 2 header
```
## Testing
- ✅ Added unit tests for custom header patterns
- ✅ Added tests for mixed standard and custom headers
- ✅ All existing tests pass (backward compatibility maintained)
- ✅ Linting and formatting checks pass
---
The implementation provides a flexible solution while maintaining the
simplicity of the existing API. Users can continue using the splitter
exactly as before, with the new functionality being entirely opt-in
through the `custom_header_patterns` parameter.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev >
Co-authored-by: Claude <noreply@anthropic.com >
2025-08-18 10:10:49 -04:00
Christophe Bornet
4656f727da
chore(text-splitters): add mypy warn_unreachable ( #32558 )
2025-08-15 09:45:20 -04:00