langchain

mirror of https://github.com/hwchase17/langchain.git synced 2026-06-09 10:17:00 +00:00

Author	SHA1	Message	Date
Mason Daugherty	5a9b1ec2dc	refactor(langchain-classic): retarget deprecations to `create_agent`, other chores (#37164 ) Sweep classic deprecations so every removal lands on `2.0.0`, runtime warnings carry the auto-generated since/removal/alternative line, and replacements steer at `langchain.agents.create_agent` and `with_structured_output(...)` instead of pre-v1 LangGraph + `python.langchain.com` links. ## Changes - Bump removal targets from `1.0` / `1.0.0` to `2.0.0` across agents, chains, memory, retrievers, structured-output, vectorstore toolkits, and the `langchain_classic._api.module_import` shim — gives users a real runway now that v1 has shipped. - Move bespoke `message=` strings onto `addendum=` (or split into `alternative=` + `addendum=`). `warn_deprecated` skips the auto-generated since/removal/alternative line whenever `message=` is set, so the prior pattern silently dropped that info from the runtime `LangChainDeprecationWarning`. Matches the pattern already used in `HTMLHeaderTextSplitter.split_text_from_url`, which is updated for consistency. - Repoint `alternative=` at v1 replacements: chains/memory/agent toolkits → `langchain.agents.create_agent` (with checkpointer or retrieval-tool guidance in the addendum); `openai_functions` and `chains/structured_output` → `ChatModel.with_structured_output(...)`; `openapi` chains → `ChatModel.bind_tools(...)` + HTTP client. `ConversationChain` no longer points at `RunnableWithMessageHistory`. - Refresh `AGENT_DEPRECATION_WARNING` in `langchain_classic._api.deprecation` — drop stale LangGraph and `python.langchain.com` links in favor of `langchain.agents.create_agent` and the `docs.langchain.com/oss/python/migrate/langchain-v1` guide. Propagates to all 13 caller sites in `agents/`. - Newly deprecate `langchain_classic.chat_models.init_chat_model` and `langchain_classic.embeddings.init_embeddings` with the framing "maintained in `langchain`; `langchain-classic` retains this entry point for import-compatibility only". The classic docstring examples and the warning admonition both point at `langchain.chat_models`. - Improve `init_chat_model` docstrings in both `langchain_v1` and the classic copy: clarify `provider:model` prefix vs. `model_provider=`, recommend pinned IDs over moving aliases, add the `upstage` provider row, and refresh examples to GA models (`gpt-5.5`, `claude-opus-4-7`). - Standardize partner Anthropic deprecations: replace `AnthropicLLM`'s `model_validator(raise_warning)` with `@deprecated(since="0.1.0", removal="2.0.0", alternative="ChatAnthropic")`, and pin the `ChatAnthropic` `output_format` runtime warning at `langchain-anthropic 2.0.0` instead of "a future version".	2026-05-03 13:15:59 -04:00
Deepak Bhagat	cd80a805b2	fix(text-splitters): remove invalid and duplicate separators in Kotlin, Rust, and Haskell (#37039 ) ## Summary Fixes four issues in `get_separators_for_language()` in `character.py`: - Kotlin: removed `"\ncase "` — `case` is not a Kotlin keyword. Kotlin uses `when` expressions (already present in the list). This was copied from Java/Swift. - Rust: removed duplicate `"\nconst "` — appeared twice, once under function definitions and again under control flow statements. - Haskell: removed duplicate `"\n:: "` — appeared under function definitions and again under type declarations. - Haskell: removed duplicate `"\ndata "` — appeared under type declarations and again under record field declarations. All four are dead separators that never match or produce redundant splits. ## Issue Closes #37038 ## Types of changes - [x] Bug fix ## Checklist - [x] I have read the CONTRIBUTING doc - [x] Lint and unit tests pass locally with my changes	2026-04-27 15:08:12 -04:00
Dayna Blackwell	3b9750f0a4	fix(text-splitters): remove incorrect C# and Elixir separator keywords (#37037 ) ## Summary Removes two incorrect separators from `get_separators_for_language()` in `RecursiveCharacterTextSplitter`: - C#: `"\nimplements "` is a Java keyword. C# uses `:` for interface implementation. This separator never matches valid C# source code. - Elixir: `"\nwhile "` does not exist in Elixir. The language uses recursion and `Enum.reduce_while/3` instead of while loops. Both are dead separators that silently degrade chunking quality by occupying positions in the separator priority list without contributing useful split points. ## Tests Added two targeted tests: - `test_csharp_separators_no_java_keywords`: verifies `"\nimplements "` is not in the C# separator list - `test_elixir_separators_no_while`: verifies `"\nwhile "` is not in the Elixir separator list Existing `test_csharp_code_splitter` continues to pass (no change to expected output since `implements` never matched valid C# code). Full suite: 129 passed, 0 failed. Fixes #37030	2026-04-27 13:48:19 -04:00
ccurme	c289bf10e9	fix(text-splitters): deprecate and use SSRF-safe transport in split_text_from_url (#36821 )	2026-04-16 10:13:31 -04:00
Mohammad Mohtashim	eb28ae1b20	fix(text-splitters): prevent silent data loss for empty dict values in RecursiveJsonSplitter (#35079 )	2026-03-28 21:27:53 -04:00
Mason Daugherty	07fa576de1	ci: avoid unnecessary dep installs in lint targets (#36046 ) CI lint jobs use `uv run --all-groups` for all tools, but ruff doesn't need dependency resolution — only mypy does. By splitting into `UV_RUN_LINT` (ruff) and `UV_RUN_TYPE` (mypy), the CI-facing targets run ruff with `--group lint` only, giving fast-fail feedback before mypy triggers the full environment sync. For packages where source code only conditionally imports heavy deps (text-splitters, huggingface), `lint_package` also overrides `UV_RUN_TYPE` to `--group lint --group typing`, skipping the ~3.5GB `test_integration` download entirely. `lint_tests` keeps `--all-groups` since test code legitimately imports those deps. Additionally, `lint_imports.sh` was inconsistently wired — most packages had the script but weren't calling it. ## Changes Makefile optimization - Introduce `UV_RUN_LINT` and `UV_RUN_TYPE` Make variables, both defaulting to `uv run --all-groups`. For `lint_package` and `lint_tests`, `UV_RUN_LINT` is overridden to `uv run --group lint` so ruff runs instantly without syncing heavy deps - For `text-splitters` and `huggingface`, override `UV_RUN_TYPE` on `lint_package` to `uv run --group lint --group typing` — mypy runs without downloading torch, CUDA, spacy, etc. mypy config for lean groups - Add `transformers` and `transformers.` to `ignore_missing_imports` in `text-splitters` pyproject.toml (conditional `try/except` import, same treatment as existing `konlpy`/`nltk` entries) - Add `torch`, `torch.`, `langchain_community`, `langchain_community.` to `ignore_missing_imports` in `huggingface` pyproject.toml - Add dual `# type: ignore[unreachable, unused-ignore]` in `text-splitters/base.py` to handle the `PreTrainedTokenizerBase` isinstance check that behaves differently depending on whether transformers is installed lint_imports.sh consistency* - Add `./scripts/lint_imports.sh` to the lint recipe in every package that wasn't calling it (standard-tests, model-profiles, all 15 partners), and create the script for the two packages missing it entirely (`model-profiles`, `openrouter`) - Update all `lint_imports.sh` scripts to allow `from langchain.agents` and `from langchain.tools` imports (legitimate v1 middleware dependencies used by `langchain-anthropic` and `langchain-openai`)	2026-03-17 21:23:29 -04:00
Mason Daugherty	2bad58a809	chore: bump locks, lint (#35985 )	2026-03-16 23:59:08 -04:00
Tejas Attarde	d6dbcf6294	perf(.github): set a timeout on get min versions HTTP calls (#35851 ) During an automated code review of .github/scripts/get_min_versions.py, the following issue was identified. Set a timeout on get min versions HTTP calls. Network calls without a timeout can hang a worker indefinitely. I kept the patch small and re-ran syntax checks after applying it.	2026-03-13 17:24:32 -04:00
Maxime Grenu	8951c01fe8	fix(text-splitters): prevent JSFrameworkTextSplitter from mutating self._separators on each split_text() call (#35316 )	2026-02-18 17:51:42 -05:00
corridor-security[bot]	1493b4c5ee	fix: Server-Side Request Forgery (SSRF) in HTMLHeaderTextSplitter.split_text_from_url (#35196 )	2026-02-12 18:48:05 -05:00
Katha	253398ebca	feat(text-splitters): add model_kwargs to SentenceTransformersTokenTextSplitter (#35113 )	2026-02-11 12:26:58 -05:00
Rohan Disa	16f2c7d13b	fix(text-splitters): reverse preserved elements iterator in `HTMLSemanticPreservingSplitter` (#34080 )	2026-02-02 18:25:39 -05:00
Mason Daugherty	51f13f7bff	style(text-splitters): lint (#34865 )	2026-01-23 23:07:36 -05:00
Christophe Bornet	fd69425439	style(text-splitters): fix some ruff preview rules (#34665 ) Co-authored-by: Mason Daugherty <mason@langchain.dev> Co-authored-by: Mason Daugherty <github@mdrxy.com>	2026-01-09 17:28:18 -05:00
Julia (Juli) Huang	cd5b36456a	fix(text-splitters): HTMLSemanticPreservingSplitter nested preserved … (#34587 ) Summary Fixes an issue where HTMLSemanticPreservingSplitter failed to preserve elements nested inside non-container tags. With these changes, preserved elements are now correctly detected and handled at any nesting depth. Root Cause `_process_element()` only recursed into a small set of hard-coded container tags (`html`, `body`, `div`, `main`). For other tags, the subtree was flattened into text, preventing nested preserved elements (inside `<p>`, `<section>`, `<article>`, etc.) from being detected. Fix - Updated traversal logic in _process_element (html.py) to recursively process child elements for any tag that contains nested elements - Avoided duplicate text extraction - Preserved correct placeholder ordering - Treated leaf nodes as text only Tests Adds regression tests covering preserved elements nested inside non-container tags, including: - table inside section - nested divs - code inside paragraph All existing tests pass (make lint, format, test, etc). Breaking changes None. Fixes Fixes #31569 Disclaimer GitHub Copilot was used to assist with test case design in test_text_splitters.py and documentation comments; all code logic was manually implemented and reviewed. --------- Co-authored-by: julih <julih@julihs-MacBook-Pro.local> Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2026-01-05 10:28:27 -05:00
Christophe Bornet	5884fb9523	style(text-splitters,standard-tests,cli): add ruff TC and RUF012 rules (#34495 ) Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-12-27 01:41:33 -06:00
rari404	0f940d74b2	feat(text-splitters): add R programming language support (#34241 )	2025-12-12 13:34:22 -05:00
Christophe Bornet	ef79c26f18	chore(cli,standard-tests,text-splitters): fix some ruff TC rules (#33934 ) Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-11-12 14:06:31 -05:00
Mason Daugherty	d40e340479	chore: attribute package change versions (#33854 ) Needed to disambiguate for within inherited docs	2025-11-06 16:57:30 -05:00
Mason Daugherty	123e29dc26	style: more refs fixes (#33730 )	2025-10-29 16:34:46 -04:00
Mason Daugherty	e5e1d6c705	style: more refs work (#33707 )	2025-10-28 14:43:28 -04:00
Mason Daugherty	db7f2db1ae	feat(infra): langchain docs MCP (#33636 )	2025-10-22 11:50:35 -04:00
Mason Daugherty	e731ba1e47	style: more refs work (#33616 )	2025-10-20 18:40:19 -04:00
ccurme	3152d25811	fix: support python 3.14 in various projects (#33575 ) Co-authored-by: cbornet <cbornet@hotmail.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-10-17 11:06:23 -04:00
Mason Daugherty	26e0a00c4c	style: more work for refs (#33508 ) Largely: - Remove explicit `"Default is x"` since new refs show default inferred from sig - Inline code (useful for eventual parsing) - Fix code block rendering (indentations)	2025-10-15 18:46:55 -04:00
Christophe Bornet	83901b30e3	chore(text-splitters): remove arg types from docstrings (#33406 ) Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-10-10 11:37:53 -04:00
Mason Daugherty	6fc21afbc9	style: `.. code-block::` admonition translations (#33400 ) biiiiiiiiiiiiiiiigggggggg pass	2025-10-09 16:52:58 -04:00
Mason Daugherty	d8a680ee57	style: address Sphinx double-backtick snippet syntax (#33389 )	2025-10-09 13:35:51 -04:00
Mason Daugherty	b6132fc23e	style: remove more `Optional` syntax (#33371 )	2025-10-08 23:28:43 -04:00
Mason Daugherty	d13823043d	style: monorepo pass for refs (#33359 ) * Delete some double backticks previously used by Sphinx (not done everywhere yet) * Fix some code blocks / dropdowns Ignoring CLI CI for now	2025-10-08 18:41:39 -04:00
Christophe Bornet	20e04fc3dd	chore(text-splitters): cleanup ruff config (#33247 ) Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-10-06 17:02:31 -04:00
Mason Daugherty	ae5b105d11	docs: v1 docs updates (#33173 ) Co-authored-by: Mohammad Mohtashim <45242107+keenborder786@users.noreply.github.com> Co-authored-by: Caspar Broekhuizen <caspar@langchain.dev> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Christophe Bornet <cbornet@hotmail.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com> Co-authored-by: Vadym Barda <vadim.barda@gmail.com>	2025-10-02 18:46:26 -04:00
Mason Daugherty	5e8cb58e6a	refactor(text-splitters): drop python 3.9 (#33212 )	2025-10-02 13:51:10 -04:00
Mason Daugherty	eaa6dcce9e	release: v1.0.0 (#32567 ) Co-authored-by: Mohammad Mohtashim <45242107+keenborder786@users.noreply.github.com> Co-authored-by: Caspar Broekhuizen <caspar@langchain.dev> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Christophe Bornet <cbornet@hotmail.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com> Co-authored-by: Vadym Barda <vadim.barda@gmail.com>	2025-10-02 10:49:42 -04:00
Hyunjoon Jeong	9cc85387d1	fix(text-splitters): add validation to prevent infinite loop and prevent empty token splitter (#32205 ) ### Description 1) Add validation to prevent infinite loop condition when ```tokenizer.tokens_per_chunk > tokenizer.chunk_overlap``` 2) Avoid empty decoded chunk when splitter appends tokens --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-09-11 16:55:32 -04:00
Christophe Bornet	0c3e8ccd0e	chore(text-splitters): select ALL rules with exclusions (#32325 ) Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-09-08 14:46:09 +00:00
Christophe Bornet	e0a4af8d8b	docs(text-splitters): fix some docstrings (#32767 )	2025-08-31 13:46:11 -05:00
Maitrey Talware	622337a297	docs(docs): fixed typos in documentations (#32661 ) Minor typo fixes. (Not linked to current open issues)	2025-08-25 10:02:53 -04:00
Keyu Chen	03138f41a0	feat(text-splitters): add optional custom header pattern support (#31887 ) ## Description This PR adds support for custom header patterns in `MarkdownHeaderTextSplitter`, allowing users to define non-standard Markdown header formats (like `Header`) and specify their hierarchy levels. Issue: Fixes #22738 Dependencies: None - this change has no new dependencies Key Changes: - Added optional `custom_header_patterns` parameter to support non-standard header formats - Enable splitting on patterns like `Header` and `*Header` - Maintain full backward compatibility with existing usage - Added comprehensive tests for custom and mixed header scenarios ## Example Usage ```python from langchain_text_splitters import MarkdownHeaderTextSplitter headers_to_split_on = [ ("", "Chapter"), ("", "Section"), ] custom_header_patterns = { "": 1, # Level 1 headers "*": 2, # Level 2 headers } splitter = MarkdownHeaderTextSplitter( headers_to_split_on=headers_to_split_on, custom_header_patterns=custom_header_patterns, ) # Now Chapter 1 is treated as a level 1 header # And Section 1.1** is treated as a level 2 header ``` ## Testing - ✅ Added unit tests for custom header patterns - ✅ Added tests for mixed standard and custom headers - ✅ All existing tests pass (backward compatibility maintained) - ✅ Linting and formatting checks pass --- The implementation provides a flexible solution while maintaining the simplicity of the existing API. Users can continue using the splitter exactly as before, with the new functionality being entirely opt-in through the `custom_header_patterns` parameter. --------- Co-authored-by: Mason Daugherty <mason@langchain.dev> Co-authored-by: Claude <noreply@anthropic.com>	2025-08-18 10:10:49 -04:00
Christophe Bornet	4656f727da	chore(text-splitters): add mypy `warn_unreachable` (#32558 )	2025-08-15 09:45:20 -04:00
Mason Daugherty	457ce9c4b0	feat(text-splitters): ruff fixes and rules (#32502 )	2025-08-11 13:28:22 -04:00
Mason Daugherty	c31236264e	chore: formatting across codebase (#32466 )	2025-08-08 10:20:10 -04:00
Mason Daugherty	96cbd90cba	fix: formatting issues in docstrings (#32265 ) Ensures proper reStructuredText formatting by adding the required blank line before closing docstring quotes, which resolves the "Block quote ends without a blank line; unexpected unindent" warning.	2025-07-27 23:37:47 -04:00
tanwirahmad	622bb05751	fix(langchain): class HTMLSemanticPreservingSplitter ignores the text inside the div tag (#32213 ) Description: We collect the text from the "html", "body", "div", and "main" nodes, if they have any. Issue: Fixes #32206.	2025-07-24 10:09:03 -04:00
Fabio Fontana	fd168e1c11	feat(text-splitters): add Visual Basic 6 support (#31173 ) ### Description Add Visual Basic 6 support. --- ### Issue No specific issue addressed. --- ### Dependencies No additional dependencies required. --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-14 13:51:16 +00:00
Christophe Bornet	060fc0e3c9	text-splitters: Add ruff rules FBT (#31935 ) See [flake8-boolean-trap (FBT)](https://docs.astral.sh/ruff/rules/#flake8-boolean-trap-fbt)	2025-07-09 18:36:58 -04:00
Michael Li	5b3e29f809	text splitters: add chunk_size and chunk_overlap validations (#31916 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "core: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. If no one reviews your PR within a few days, please @-mention one of baskaryan, eyurtsev, ccurme, vbarda, hwchase17.	2025-07-08 12:22:33 -04:00
Christophe Bornet	451c90fefa	text-splitters: Ruff autofixes (#31858 ) Auto-fixes from ruff with rule `ALL`	2025-07-07 10:06:08 -04:00
Christophe Bornet	802d2bf249	text-splitters: Add ruff rule UP (pyupgrade) (#31841 ) See https://docs.astral.sh/ruff/rules/#pyupgrade-up All auto-fixed except `typing.AbstractSet` -> `collections.abc.Set`	2025-07-03 10:11:35 -04:00
Cole Murray	43eef43550	security: Remove xslt_path and harden XML parsers in HTMLSectionSplitter: package: langchain-text-splitters (#31819 ) ## Summary - Removes the `xslt_path` parameter from HTMLSectionSplitter to eliminate XXE attack vector - Hardens XML/HTML parsers with secure configurations to prevent XXE attacks - Adds comprehensive security tests to ensure the vulnerability is fixed ## Context This PR addresses a critical XXE vulnerability discovered in the HTMLSectionSplitter component. The vulnerability allowed attackers to: - Read sensitive local files (SSH keys, passwords, configuration files) - Perform Server-Side Request Forgery (SSRF) attacks - Exfiltrate data to attacker-controlled servers ## Changes Made 1. Removed `xslt_path` parameter - This eliminates the primary attack vector where users could supply malicious XSLT files 2. Hardened XML parsers - Added security configurations to prevent XXE attacks even with the default XSLT: - `no_network=True` - Blocks network access - `resolve_entities=False` - Prevents entity expansion - `load_dtd=False` - Disables DTD processing - `XSLTAccessControl.DENY_ALL` - Blocks all file/network I/O in XSLT transformations 3. Added security tests - New test file `test_html_security.py` with comprehensive tests for various XXE attack vectors 4. Updated existing tests - Modified tests that were using the removed `xslt_path` parameter ## Test Plan - [x] All existing tests pass - [x] New security tests verify XXE attacks are blocked - [x] Code passes linting and formatting checks - [x] Tested with both old and new versions of lxml Twitter handle: @_colemurray	2025-07-02 15:24:08 -04:00

1 2

89 Commits