Mason Daugherty
7a158c7f1c
revert: "chore: remove ruff target-version" ( #32895 )
...
Reverts langchain-ai/langchain#32880
Not needed at the moment, will do when finishing v1
2025-09-10 20:56:48 -04:00
Christophe Bornet
b274416441
chore: remove ruff target-version ( #32880 )
...
This is not needed anymore since `requires-python` was added when moving
to `uv`.
2025-09-10 11:12:30 -04:00
Christophe Bornet
8b90eae455
chore(text-splitters): enable ruff docstring-code-format ( #32854 )
2025-09-08 16:40:11 -04:00
Christophe Bornet
0c3e8ccd0e
chore(text-splitters): select ALL rules with exclusions ( #32325 )
...
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-09-08 14:46:09 +00:00
Mason Daugherty
6b5fdfb804
release(text-splitters): 0.3.11 ( #32770 )
...
Fixes #32747
SpaCy integration test fixture was trying to use pip to download the
SpaCy language model (`en_core_web_sm`), but uv-managed environments
don't include pip by default. Fail test if not installed as opposed to
downloading.
2025-08-31 23:00:05 +00:00
Sydney Runkle
b26e52aa4d
chore(text-splitters): bump version of core ( #32740 )
2025-08-28 13:14:57 -04:00
Sydney Runkle
38cdd7a2ec
chore(text-splitters): relax max bound for langchain-core ( #32739 )
2025-08-28 13:05:47 -04:00
Christophe Bornet
73a7de63aa
chore(text-splitters): add mypy pydantic plugin ( #32611 )
2025-08-19 16:58:12 -04:00
Christophe Bornet
4656f727da
chore(text-splitters): add mypy warn_unreachable
( #32558 )
2025-08-15 09:45:20 -04:00
Christophe Bornet
8b663ed6c6
chore(text-splitters): bump mypy version to 1.17 ( #32387 )
...
Co-authored-by: Mason Daugherty <mason@langchain.dev >
2025-08-11 22:24:49 +00:00
Mason Daugherty
457ce9c4b0
feat(text-splitters): ruff fixes and rules ( #32502 )
2025-08-11 13:28:22 -04:00
Mason Daugherty
77c981999e
fix(text-splitters): update langchain-core version to 0.3.72
2025-07-24 10:35:07 -04:00
Christophe Bornet
060fc0e3c9
text-splitters: Add ruff rules FBT ( #31935 )
...
See [flake8-boolean-trap
(FBT)](https://docs.astral.sh/ruff/rules/#flake8-boolean-trap-fbt )
2025-07-09 18:36:58 -04:00
Christophe Bornet
a15c3e0856
text-splitters: Bump ruff version to 0.12 ( #31866 )
2025-07-05 17:13:08 -04:00
Christophe Bornet
ee3709535d
text-splitters: bump spacy version to 3.8.7 ( #31834 )
...
This allows to use spacy with Python 3.13
2025-07-03 10:13:25 -04:00
Christophe Bornet
802d2bf249
text-splitters: Add ruff rule UP (pyupgrade) ( #31841 )
...
See https://docs.astral.sh/ruff/rules/#pyupgrade-up
All auto-fixed except `typing.AbstractSet` -> `collections.abc.Set`
2025-07-03 10:11:35 -04:00
Eugene Yurtsev
10ec5c8f02
text-splitters: 0.3.9 ( #31844 )
...
Release langchain-text-splitters 0.3.9
2025-07-03 10:02:35 -04:00
Christophe Bornet
eab8484a80
text-splitters[patch]: fix some import-untyped errors ( #31030 )
2025-05-15 11:34:22 -04:00
Sydney Runkle
7e926520d5
packaging: remove Python upper bound for langchain and co libs ( #31025 )
...
Follow up to https://github.com/langchain-ai/langsmith-sdk/pull/1696 ,
I've bumped the `langsmith` version where applicable in `uv.lock`.
Type checking problems here because deps have been updated in
`pyproject.toml` and `uv lock` hasn't been run - we should enforce that
in the future - goes with the other dependabot todos :).
2025-04-28 14:44:28 -04:00
Christophe Bornet
8c5ae108dd
text-splitters: Set strict mypy rules ( #30900 )
...
* Add strict mypy rules
* Fix mypy violations
* Add error codes to all type ignores
* Add ruff rule PGH003
* Bump mypy version to 1.15
2025-04-22 20:41:24 -07:00
ccurme
0c2c8c36c1
text-splitters: release 0.3.8 ( #30671 )
2025-04-04 09:58:45 -04:00
ccurme
958f85d541
text-splitters: release 0.3.7 ( #30347 )
2025-03-18 19:11:37 +00:00
Erick Friis
1a225fad03
multiple: fix uv path deps ( #29790 )
...
file:// format wasn't working with updates - it doesn't install as an
editable dep
move to tool.uv.sources with path= instead
2025-02-13 21:32:34 +00:00
ccurme
e1b593ae77
text-splitters[patch]: release 0.3.6 ( #29647 )
2025-02-06 16:16:05 -05:00
ccurme
a91e58bc10
core: release 0.3.34 ( #29644 )
2025-02-06 15:53:56 -05:00
ccurme
d172984c91
infra: migrate to uv ( #29566 )
2025-02-06 13:36:26 -05:00
Christophe Bornet
836c791829
text-splitters: Bump ruff version to 0.9 ( #29231 )
...
Co-authored-by: Erick Friis <erick@langchain.dev >
2025-01-22 00:27:58 +00:00
ccurme
55677e31f7
text-splitters[patch]: release 0.3.5 ( #29054 )
...
Resolves https://github.com/langchain-ai/langchain/issues/29053
2025-01-07 09:48:26 -05:00
Bagatur
1c797ac68f
infra: speed up unit tests ( #28974 )
...
Co-authored-by: Erick Friis <erick@langchain.dev >
2025-01-02 04:13:08 +00:00
Erick Friis
9b024d00c9
text-splitters: release 0.3.4 ( #28795 )
2024-12-18 09:44:36 -08:00
Antonio Lanza
b2102b8cc4
text-splitters: Inconsistent results with NLTKTextSplitter
's add_start_index=True
( #27782 )
...
This PR closes #27781
# Problem
The current implementation of `NLTKTextSplitter` is using
`sent_tokenize`. However, this `sent_tokenize` doesn't handle chars
between 2 tokenized sentences... hence, this behavior throws errors when
we are using `add_start_index=True`, as described in issue #27781 . In
particular:
```python
from nltk.tokenize import sent_tokenize
output1 = sent_tokenize("Innovation drives our success. Collaboration fosters creative solutions. Efficiency enhances data management.", language="english")
print(output1)
output2 = sent_tokenize("Innovation drives our success. Collaboration fosters creative solutions. Efficiency enhances data management.", language="english")
print(output2)
>>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.']
>>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.']
```
# Solution
With this new `use_span_tokenize` parameter, we can use NLTK to create
sentences (with `span_tokenize`), but also add extra chars to be sure
that we still can map the chunks to the original text.
---------
Co-authored-by: Erick Friis <erick@langchain.dev >
Co-authored-by: Erick Friis <erickfriis@gmail.com >
2024-12-16 19:53:15 +00:00
Erick Friis
8ec1c72e03
text-splitters: test without socket ( #28732 )
2024-12-15 22:10:35 +00:00
Bagatur
679e3a9970
text-splitters[patch]: Release 0.3.3 ( #28723 )
2024-12-14 19:20:22 +00:00
Ankit Dangi
90f162efb6
text-splitters: add pydocstyle linting ( #28127 )
...
As seen in #23188 , turned on Google-style docstrings by enabling
`pydocstyle` linting in the `text-splitters` package. Each resulting
linting error was addressed differently: ignored, resolved, suppressed,
and missing docstrings were added.
Fixes one of the checklist items from #25154 , similar to #25939 in
`core` package. Ran `make format`, `make lint` and `make test` from the
root of the package `text-splitters` to ensure no issues were found.
---------
Co-authored-by: Erick Friis <erick@langchain.dev >
2024-12-09 06:01:03 +00:00
Bagatur
ee63d21915
many: use core 0.3.15 ( #27834 )
2024-11-01 20:35:55 +00:00
Bagatur
8f4423e042
text-splitters[patch]: Release 0.3.1 ( #27726 )
2024-10-30 00:04:48 +00:00
Erick Friis
600b7bdd61
all: test 3.13 ci ( #27197 )
...
Co-authored-by: Bagatur <baskaryan@gmail.com >
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com >
2024-10-25 12:56:58 -07:00
Erick Friis
92ae61bcc8
multiple: rely on asyncio_mode auto in tests ( #27200 )
2024-10-15 16:26:38 +00:00
ccurme
d1462badaf
text-splitters: release 0.3 ( #26460 )
...
Co-authored-by: Erick Friis <erick@langchain.dev >
2024-09-13 22:31:06 +00:00
Erick Friis
c2a3021bb0
multiple: pydantic 2 compatibility, v0.3 ( #26443 )
...
Signed-off-by: ChengZi <chen.zhang@zilliz.com >
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com >
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com >
Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com >
Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com >
Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no >
Co-authored-by: Bagatur <baskaryan@gmail.com >
Co-authored-by: ccurme <chester.curme@gmail.com >
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com >
Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com >
Co-authored-by: ZhangShenao <15201440436@163.com >
Co-authored-by: Friso H. Kingma <fhkingma@gmail.com >
Co-authored-by: ChengZi <chen.zhang@zilliz.com >
Co-authored-by: Nuno Campos <nuno@langchain.dev >
Co-authored-by: Morgante Pell <morgantep@google.com >
2024-09-13 14:38:45 -07:00
Bagatur
fb642e1e27
text-splitters[patch]: Release 0.2.4 ( #25979 )
2024-09-03 18:09:43 +00:00
Bagatur
dd8e4cd020
text-splitters[patch]: Release 0.2.3 ( #24998 )
2024-08-02 20:27:22 +00:00
Erick Friis
3dce2e1d35
all: add release notes to pypi ( #24519 )
2024-07-22 13:59:13 -07:00
Bagatur
a0c2281540
infra: update mypy 1.10, ruff 0.5 ( #23721 )
...
```python
"""python scripts/update_mypy_ruff.py"""
import glob
import tomllib
from pathlib import Path
import toml
import subprocess
import re
ROOT_DIR = Path(__file__).parents[1]
def main():
for path in glob.glob(str(ROOT_DIR / "libs/**/pyproject.toml"), recursive=True):
print(path)
with open(path, "rb") as f:
pyproject = tomllib.load(f)
try:
pyproject["tool"]["poetry"]["group"]["typing"]["dependencies"]["mypy"] = (
"^1.10"
)
pyproject["tool"]["poetry"]["group"]["lint"]["dependencies"]["ruff"] = (
"^0.5"
)
except KeyError:
continue
with open(path, "w") as f:
toml.dump(pyproject, f)
cwd = "/".join(path.split("/")[:-1])
completed = subprocess.run(
"poetry lock --no-update; poetry install --with typing; poetry run mypy . --no-color",
cwd=cwd,
shell=True,
capture_output=True,
text=True,
)
logs = completed.stdout.split("\n")
to_ignore = {}
for l in logs:
if re.match("^(.*)\:(\d+)\: error:.*\[(.*)\]", l):
path, line_no, error_type = re.match(
"^(.*)\:(\d+)\: error:.*\[(.*)\]", l
).groups()
if (path, line_no) in to_ignore:
to_ignore[(path, line_no)].append(error_type)
else:
to_ignore[(path, line_no)] = [error_type]
print(len(to_ignore))
for (error_path, line_no), error_types in to_ignore.items():
all_errors = ", ".join(error_types)
full_path = f"{cwd}/{error_path}"
try:
with open(full_path, "r") as f:
file_lines = f.readlines()
except FileNotFoundError:
continue
file_lines[int(line_no) - 1] = (
file_lines[int(line_no) - 1][:-1] + f" # type: ignore[{all_errors}]\n"
)
with open(full_path, "w") as f:
f.write("".join(file_lines))
subprocess.run(
"poetry run ruff format .; poetry run ruff --select I --fix .",
cwd=cwd,
shell=True,
capture_output=True,
text=True,
)
if __name__ == "__main__":
main()
```
2024-07-03 10:33:27 -07:00
ccurme
6f7fe82830
text-splitters: release 0.2.2 ( #23508 )
2024-06-25 18:26:05 -04:00
Erick Friis
a24a9c6427
multiple: get rid of pyproject extras ( #22581 )
...
They cause `poetry lock` to take a ton of time, and `uv pip install` can
resolve the constraints from these toml files in trivial time
(addressing problem with #19153 )
This allows us to properly upgrade lockfile dependencies moving forward,
which revealed some issues that were either fixed or type-ignored (see
file comments)
2024-06-06 15:45:22 -07:00
Bagatur
99a3cad258
text-splitters[patch]: Release 0.2.1 ( #22490 )
2024-06-04 11:19:21 -07:00
ccurme
0e72ed39a0
infra: fix CI on text-splitters ( #21935 )
2024-05-20 14:03:42 -07:00
Erick Friis
1b555021f7
text-splitters: release 0.2.0 ( #21832 )
2024-05-17 13:30:54 -07:00
Erick Friis
c77d2f2b06
multiple: core 0.2 nonbreaking dep, check_diff community->langchain dep ( #21646 )
...
0.2 is not a breaking release for core (but it is for langchain and
community)
To keep the core+langchain+community packages in sync at 0.2, we will
relax deps throughout the ecosystem to tolerate `langchain-core` 0.2
2024-05-13 19:50:36 -07:00