Files
langchain/libs/text-splitters
Mason Daugherty 07fa576de1 ci: avoid unnecessary dep installs in lint targets (#36046)
CI lint jobs use `uv run --all-groups` for all tools, but ruff doesn't
need dependency resolution — only mypy does. By splitting into
`UV_RUN_LINT` (ruff) and `UV_RUN_TYPE` (mypy), the CI-facing targets run
ruff with `--group lint` only, giving fast-fail feedback before mypy
triggers the full environment sync.

For packages where source code only conditionally imports heavy deps
(text-splitters, huggingface), `lint_package` also overrides
`UV_RUN_TYPE` to `--group lint --group typing`, skipping the ~3.5GB
`test_integration` download entirely. `lint_tests` keeps `--all-groups`
since test code legitimately imports those deps.

Additionally, `lint_imports.sh` was inconsistently wired — most packages
had the script but weren't calling it.

## Changes

**Makefile optimization**
- Introduce `UV_RUN_LINT` and `UV_RUN_TYPE` Make variables, both
defaulting to `uv run --all-groups`. For `lint_package` and
`lint_tests`, `UV_RUN_LINT` is overridden to `uv run --group lint` so
ruff runs instantly without syncing heavy deps
- For `text-splitters` and `huggingface`, override `UV_RUN_TYPE` on
`lint_package` to `uv run --group lint --group typing` — mypy runs
without downloading torch, CUDA, spacy, etc.

**mypy config for lean groups**
- Add `transformers` and `transformers.*` to `ignore_missing_imports` in
`text-splitters` pyproject.toml (conditional `try/except` import, same
treatment as existing `konlpy`/`nltk` entries)
- Add `torch`, `torch.*`, `langchain_community`, `langchain_community.*`
to `ignore_missing_imports` in `huggingface` pyproject.toml
- Add dual `# type: ignore[unreachable, unused-ignore]` in
`text-splitters/base.py` to handle the `PreTrainedTokenizerBase`
isinstance check that behaves differently depending on whether
transformers is installed

**lint_imports.sh consistency**
- Add `./scripts/lint_imports.sh` to the lint recipe in every package
that wasn't calling it (standard-tests, model-profiles, all 15
partners), and create the script for the two packages missing it
entirely (`model-profiles`, `openrouter`)
- Update all `lint_imports.sh` scripts to allow `from langchain.agents`
and `from langchain.tools` imports (legitimate v1 middleware
dependencies used by `langchain-anthropic` and `langchain-openai`)
2026-03-17 21:23:29 -04:00
..
2026-01-13 01:54:11 -05:00

🦜✂️ LangChain Text Splitters

PyPI - Version PyPI - License PyPI - Downloads Twitter

Looking for the JS/TS version? Check out LangChain.js.

Quick Install

pip install langchain-text-splitters

🤔 What is this?

LangChain Text Splitters contains utilities for splitting into chunks a wide variety of text documents.

📖 Documentation

For full documentation, see the API reference.

📕 Releases & Versioning

See our Releases and Versioning policies.

We encourage pinning your version to a specific version in order to avoid breaking your CI when we publish new tests. We recommend upgrading to the latest version periodically to make sure you have the latest tests.

Not pinning your version will ensure you always have the latest tests, but it may also break your CI if we introduce tests that your integration doesn't pass.

💁 Contributing

As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.

For detailed information on how to contribute, see the Contributing Guide.