Commit Graph

16 Commits

Author SHA1 Message Date
Christophe Bornet
0c3e8ccd0e chore(text-splitters): select ALL rules with exclusions (#32325)
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-08 14:46:09 +00:00
Mason Daugherty
457ce9c4b0 feat(text-splitters): ruff fixes and rules (#32502) 2025-08-11 13:28:22 -04:00
Fabio Fontana
fd168e1c11 feat(text-splitters): add Visual Basic 6 support (#31173)
### **Description**

Add Visual Basic 6 support.

---

### **Issue**

No specific issue addressed.

---

### **Dependencies**

No additional dependencies required.

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-14 13:51:16 +00:00
Christophe Bornet
060fc0e3c9 text-splitters: Add ruff rules FBT (#31935)
See [flake8-boolean-trap
(FBT)](https://docs.astral.sh/ruff/rules/#flake8-boolean-trap-fbt)
2025-07-09 18:36:58 -04:00
Michael Li
5b3e29f809 text splitters: add chunk_size and chunk_overlap validations (#31916)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
  - Example: "core: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
2025-07-08 12:22:33 -04:00
Christophe Bornet
451c90fefa text-splitters: Ruff autofixes (#31858)
Auto-fixes from ruff with rule `ALL`
2025-07-07 10:06:08 -04:00
Christophe Bornet
802d2bf249 text-splitters: Add ruff rule UP (pyupgrade) (#31841)
See https://docs.astral.sh/ruff/rules/#pyupgrade-up
All auto-fixed except `typing.AbstractSet` -> `collections.abc.Set`
2025-07-03 10:11:35 -04:00
Christophe Bornet
eab8484a80 text-splitters[patch]: fix some import-untyped errors (#31030) 2025-05-15 11:34:22 -04:00
Christophe Bornet
8c5ae108dd text-splitters: Set strict mypy rules (#30900)
* Add strict mypy rules
* Fix mypy violations
* Add error codes to all type ignores
* Add ruff rule PGH003
* Bump mypy version to 1.15
2025-04-22 20:41:24 -07:00
keshavshrikant
e8be3cca5c fix huggingface tokenizer default length function (#30185)
#30184
2025-03-31 11:54:30 -04:00
Ankit Dangi
90f162efb6 text-splitters: add pydocstyle linting (#28127)
As seen in #23188, turned on Google-style docstrings by enabling
`pydocstyle` linting in the `text-splitters` package. Each resulting
linting error was addressed differently: ignored, resolved, suppressed,
and missing docstrings were added.

Fixes one of the checklist items from #25154, similar to #25939 in
`core` package. Ran `make format`, `make lint` and `make test` from the
root of the package `text-splitters` to ensure no issues were found.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 06:01:03 +00:00
Tamir Zitman
b3e1378f2b langchain : text_splitters Added PowerShell (#24582)
- **Description:** Added PowerShell support for text splitters language
include docs relevant update
  - **Issue:** None
  - **Dependencies:** None

---------

Co-authored-by: tzitman <tamir.zitman@intel.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-30 16:13:52 +00:00
Max Mulatz
058a64c563 Community[minor]: Add language parser for Elixir (#22742)
Hi 👋 

First off, thanks a ton for your work on this 💚 Really appreciate what
you're providing here for the community.

## Description

This PR adds a basic language parser for the
[Elixir](https://elixir-lang.org/) programming language. The parser code
is based upon the approach outlined in
https://github.com/langchain-ai/langchain/pull/13318: it's using
`tree-sitter` under the hood and aligns with all the other `tree-sitter`
based parses added that PR.

The `CHUNK_QUERY` I'm using here is probably not the most sophisticated
one, but it worked for my application. It's a starting point to provide
"core" parsing support for Elixir in LangChain. It enables people to use
the language parser out in real world applications which may then lead
to further tweaking of the queries. I consider this PR just the ground
work.

- **Dependencies:** requires `tree-sitter` and `tree-sitter-languages`
from the extended dependencies
- **Twitter handle:**`@bitcrowd`

## Checklist

- [x] **PR title**: "package: description"
- [x] **Add tests and docs**
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified.

<!-- If no one reviews your PR within a few days, please @-mention one
of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. -->
2024-06-10 15:56:57 +00:00
Christos Boulmpasakos
c3bcfad66d text-splitters[patch]: Extend TextSplitter:keep_separator functionality (#21130)
**Description:** Added extra functionality to `CharacterTextSplitter`,
`TextSplitter` classes.
The user can select whether to append the separator to the previous
chunk with `keep_separator='end' ` or else prepend to the next chunk.
Previous functionality prepended by default to next chunk.
  
**Issue:** Fixes #20908

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-05-22 13:17:45 -07:00
Nisarg Trivedi
1252ccce6f text-splitters[minor]: Added Haskell support in langchain.text_splitter module (#16191)
- **Description:** Haskell language support added in text_splitter
module
  - **Dependencies:** No
  - **Twitter handle:** @nisargtr

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-29 20:17:50 +00:00
Bagatur
5efb5c099f text-splitters[minor], langchain[minor], community[patch], templates, docs: langchain-text-splitters 0.0.1 (#18346) 2024-02-29 18:33:21 -08:00