langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-08-14 23:26:34 +00:00

Author	SHA1	Message	Date
Mason Daugherty	5599c59d4a	chore: formatting across codebase (#32456 ) To prevent polluting future PRs	2025-08-07 22:09:26 -04:00
Mason Daugherty	5e9eb19a83	chore: update branch with changes from master (#32277 ) Co-authored-by: Maxime Grenu <69890511+cluster2600@users.noreply.github.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: jmaillefaud <jonathan.maillefaud@evooq.ch> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: tanwirahmad <tanwirahmad@users.noreply.github.com> Co-authored-by: Christophe Bornet <cbornet@hotmail.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: niceg <79145285+growmuye@users.noreply.github.com> Co-authored-by: Chaitanya varma <varmac301@gmail.com> Co-authored-by: dishaprakash <57954147+dishaprakash@users.noreply.github.com> Co-authored-by: Chester Curme <chester.curme@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Kanav Bansal <13186335+bansalkanav@users.noreply.github.com> Co-authored-by: Aleksandr Filippov <71711753+alex-feel@users.noreply.github.com> Co-authored-by: Alex Feel <afilippov@spotware.com>	2025-07-28 10:39:41 -04:00
Christophe Bornet	03e8327e01	core: Ruff preview fixes (#31877 ) Auto-fixes from `uv run ruff check --fix --unsafe-fixes --preview` --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-07 13:02:40 -04:00
Eugene Yurtsev	9164e6f906	core[patch]: Add additional hashing options to indexing API, warn on SHA-1 (#31649 ) Add additional hashing options to the indexing API, warn on SHA-1 Requires: - Bumping langchain-core version - bumping min langchain-core in langchain --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2025-06-24 14:44:06 -04:00
Shkarupa Alex	671e4fd114	langchain[patch]: Allow async indexing code to work for vectorstores that only defined sync delete (#30869 ) `aindex` function should check not only `adelete` method, but `delete` method too PR title: "core: fix async indexing issue with adelete/delete checking" PR message: Currently `langchain.indexes.aindex` checks if vector store has overrided adelete method. But due to `adelete` default implementation store can have just `delete` overrided to make `adelete` working. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-05-16 15:10:25 -04:00
Christophe Bornet	a8f2ddee31	core: Add ruff rules RUF (#29353 ) See https://docs.astral.sh/ruff/rules/#ruff-specific-rules-ruf Mostly: * [RUF022](https://docs.astral.sh/ruff/rules/unsorted-dunder-all/) (unsorted `__all__`) * [RUF100](https://docs.astral.sh/ruff/rules/unused-noqa/) (unused noqa) * [RUF021](https://docs.astral.sh/ruff/rules/parenthesize-chained-operators/) (parenthesize-chained-operators) * [RUF015](https://docs.astral.sh/ruff/rules/unnecessary-iterable-allocation-for-first-element/) (unnecessary-iterable-allocation-for-first-element) * [RUF005](https://docs.astral.sh/ruff/rules/collection-literal-concatenation/) (collection-literal-concatenation) * [RUF046](https://docs.astral.sh/ruff/rules/unnecessary-cast-to-int/) (unnecessary-cast-to-int) --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-05-15 15:43:57 -04:00
Lope Ramos	b8ae2de169	langchain-core[patch]: `Incremental` record manager deletion should be batched (#31206 ) Description: Before this commit, if one record is batched in more than 32k rows for sqlite3 >= 3.32 or more than 999 rows for sqlite3 < 3.31, the `record_manager.delete_keys()` will fail, as we are creating a query with too many variables. This commit ensures that we are batching the delete operation leveraging the `cleanup_batch_size` as it is already done for `full` cleanup. Added unit tests for incremental mode as well on different deleting batch size.	2025-05-14 11:38:21 -04:00
Sydney Runkle	75e50a3efd	core[patch]: Raise `AttributeError` (instead of `ModuleNotFoundError`) in custom `__getattr__` (#30905 ) Follow up to https://github.com/langchain-ai/langchain/pull/30769, fixing the regression reported [here](https://github.com/langchain-ai/langchain/pull/30769#issuecomment-2807483610), thanks @krassowski for the report! Fix inspired by https://github.com/PrefectHQ/prefect/pull/16172/files Other changes: * Using tuples for `__all__`, except in `output_parsers` bc of a list namespace conflict * Using a helper function for imports due to repeated logic across `__init__.py` files becoming hard to maintain. Co-authored-by: Michał Krassowski < krassowski 5832902+krassowski@users.noreply.github.com>"	2025-04-17 14:15:28 -04:00
Christophe Bornet	a4ca1fe0ed	core: Remove some noqa (#30855 )	2025-04-15 13:08:40 -04:00
Sydney Runkle	edb6a23aea	core[lint]: fix issue with unused ignore in `__init__.py` files (#30825 ) Fixing a race condition between https://github.com/langchain-ai/langchain/pull/30769 and https://github.com/langchain-ai/langchain/pull/30737	2025-04-14 17:57:00 +00:00
Sydney Runkle	4f69094b51	core[performance]: use custom `__getattr__` in `__init__.py` files for lazy imports (#30769 ) Most easily reviewed with the "hide whitespace" option toggled. Seeing 10-50% speed ups in import time for common structures 🚀 The general purpose of this PR is to lazily import structures within `langchain_core.XXX_module.__init__.py` so that we're not eagerly importing expensive dependencies (`pydantic`, `requests`, etc). Analysis of flamegraphs generated with `importtime` motivated these changes. For example, the one below demonstrates that importing `HumanMessage` accidentally triggered imports for `importlib.metadata`, `requests`, etc. There's still much more to do on this front, and we can start digging into our own internal code for optimizations now that we're less concerned about external imports. <img width="1210" alt="Screenshot 2025-04-11 at 1 10 54 PM" src="https://github.com/user-attachments/assets/112a3fe7-24a9-4294-92c1-d5ae64df839e" /> I've tracked the improvements with some local benchmarks: ## `pytest-benchmark` results \| Name \| Before (s) \| After (s) \| Delta (s) \| % Change \| \|-----------------------------\|------------\|-----------\|-----------\|----------\| \| Document \| 2.8683 \| 1.2775 \| -1.5908 \| -55.46% \| \| HumanMessage \| 2.2358 \| 1.1673 \| -1.0685 \| -47.79% \| \| ChatPromptTemplate \| 5.5235 \| 2.9709 \| -2.5526 \| -46.22% \| \| Runnable \| 2.9423 \| 1.7793 \| -1.163 \| -39.53% \| \| InMemoryVectorStore \| 3.1180 \| 1.8417 \| -1.2763 \| -40.93% \| \| RunnableLambda \| 2.7385 \| 1.8745 \| -0.864 \| -31.55% \| \| tool \| 5.1231 \| 4.0771 \| -1.046 \| -20.42% \| \| CallbackManager \| 4.2263 \| 3.4099 \| -0.8164 \| -19.32% \| \| LangChainTracer \| 3.8394 \| 3.3101 \| -0.5293 \| -13.79% \| \| BaseChatModel \| 4.3317 \| 3.8806 \| -0.4511 \| -10.41% \| \| PydanticOutputParser \| 3.2036 \| 3.2995 \| 0.0959 \| 2.99% \| \| InMemoryRateLimiter \| 0.5311 \| 0.5995 \| 0.0684 \| 12.88% \| Note the lack of change for `InMemoryRateLimiter` and `PydanticOutputParser` is just random noise, I'm getting comparable numbers locally. ## Local CodSpeed results We're still working on configuring CodSpeed on CI. The local usage produced similar results.	2025-04-14 08:57:54 -04:00
Christophe Bornet	42944f3499	core: Improve mypy config (#30737 ) * Cleanup mypy config * Add mypy `strict` rules except `disallow_any_generics`, `warn_return_any` and `strict_equality` (TODO) * Add mypy `strict_byte` rule * Add mypy support for PEP702 `@deprecated` decorator * Bump mypy version to 1.15 --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-04-11 16:35:13 -04:00
Sydney Runkle	fdc2b4bcac	core[lint]: Use 3.9 formatting for docs and tests (#30780 ) Looks like `pyupgrade` was already used here but missed some docs and tests. This helps to keep our docs looking professional and up to date. Eventually, we should lint / format our inline docs.	2025-04-11 10:39:25 -04:00
Christophe Bornet	4cc7bc6c93	core: Add ruff rules PLR (#30696 ) Add ruff rules [PLR](https://docs.astral.sh/ruff/rules/#refactor-plr) Except PLR09xxx and PLR2004. Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-04-09 15:15:38 -04:00
Christophe Bornet	98f0016fc2	core: Add ruff rules ARG (#30732 ) See https://docs.astral.sh/ruff/rules/#flake8-unused-arguments-arg	2025-04-09 14:39:36 -04:00
Christophe Bornet	f241fd5c11	core: Add ruff rules RET (#29384 ) See https://docs.astral.sh/ruff/rules/#flake8-return-ret All auto-fixes	2025-04-02 16:59:56 -04:00
Christophe Bornet	4f8ea13cea	core: Add ruff rules PERF (#29375 ) See https://docs.astral.sh/ruff/rules/#perflint-perf	2025-04-01 13:34:56 -04:00
Christophe Bornet	88b4233fa1	core: Add ruff rules D (docstring) (#29406 ) This ensures that the code is properly documented: https://docs.astral.sh/ruff/rules/#pydocstyle-d Related to #21983	2025-04-01 13:15:45 -04:00
Christophe Bornet	e181d43214	core: Bump ruff version to 0.11 (#30519 ) Changes are from the new TC006 rule: https://docs.astral.sh/ruff/rules/runtime-cast-value/ TC006 is auto-fixed.	2025-03-27 13:01:49 -04:00
Keiichi Hirobe	956b09f468	core[patch]: stop deleting records with "scoped_full" when doc is empty (#30520 ) Fix a bug that causes `scoped_full` in index to delete records when there are no input docs.	2025-03-27 11:04:34 -04:00
Christophe Bornet	b3885c124f	core: Add ruff rules TC (#29268 ) See https://docs.astral.sh/ruff/rules/#flake8-type-checking-tc Some fixes done for TC001,TC002 and TC003 but these rules are excluded since they don't play well with Pydantic. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-26 19:39:05 +00:00
Christophe Bornet	e4a78dfc2a	core: Bump ruff version to 0.9 (#29201 ) Also run some preview autofix and formatting --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-22 00:20:09 +00:00
Christophe Bornet	1c4ce7b42b	core: Auto-fix some docstrings (#29337 )	2025-01-21 13:29:53 -05:00
Keiichi Hirobe	67fd554512	core[patch]: throw exception indexing code if deletion fails in vectorstore (#28103 ) The delete methods in the VectorStore and DocumentIndex interfaces return a status indicating the result. Therefore, we can assume that their implementations don't throw exceptions but instead return a result indicating whether the delete operations have failed. The current implementation doesn't check the returned value, so I modified it to throw an exception when the operation fails. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 16:14:27 -05:00
Keiichi Hirobe	258b3be5ec	core[minor]: add new clean up strategy "scoped_full" to indexing (#28505 ) ~Note that this PR is now Draft, so I didn't add change to `aindex` function and didn't add test codes for my change. After we have an agreement on the direction, I will add commits.~ `batch_size` is very difficult to decide because setting a large number like >10000 will impact VectorDB and RecordManager, while setting a small number will delete records unnecessarily, leading to redundant work, as the `IMPORTANT` section says. On the other hand, we can't use `full` because the loader returns just a subset of the dataset in our use case. I guess many people are in the same situation as us. So, as one of the possible solutions for it, I would like to introduce a new argument, `scoped_full_cleanup`. This argument will be valid only when `claneup` is Full. If True, Full cleanup deletes all documents that haven't been updated AND that are associated with source ids that were seen during indexing. Default is False. This change keeps backward compatibility. --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 20:35:25 +00:00
Eugene Yurtsev	ce90b25313	core[patch]: Update error message in indexing code for unreachable code assertion (#28712 ) Minor update for error message that should never be triggered	2024-12-13 20:21:14 +00:00
Keiichi Hirobe	da28cf1f54	core[patch]: Reverts PR #25754 and add unit tests (#28702 ) I reported the bug 2 weeks ago here: https://github.com/langchain-ai/langchain/issues/28447 I believe this is a critical bug for the indexer, so I submitted a PR to revert the change and added unit tests to prevent similar bugs from being introduced in the future. @eyurtsev Could you check this?	2024-12-13 15:13:06 -05:00
Eric Pinzur	eadc2f6a90	core: added DeleteResponse to the module (#28069 ) Description: * added `DeleteResponse` to the `langchain_core.indexing` module, for implementing DocumentIndex classes.	2024-11-13 11:08:08 -05:00
ZhangShenao	ad0387ac97	Improvement [docs] Improve api docs (#27787 ) - Add missing param - Remove unused param --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-10-31 16:56:44 +00:00
Erick Friis	265e0a164a	core: add flake8-bandit (S) ruff rules to core (#27368 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-10-24 22:33:41 +00:00
Christophe Bornet	d31ec8810a	core: Add ruff rules for error messages (EM) (#26965 ) All auto-fixes Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-07 22:12:28 +00:00
João Carlos Ferra de Almeida	780ce00dea	core[minor]: add kwargs to index and aindex functions for custom vector_field support (#26998 ) Added `kwargs` parameters to the `index` and `aindex` functions in `libs/core/langchain_core/indexing/api.py`. This allows users to pass additional arguments to the `add_documents` and `aadd_documents` methods, enabling the specification of a custom `vector_field`. For example, users can now use `vector_field="embedding"` when indexing documents in `OpenSearchVectorStore` --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-10-07 14:52:50 -04:00
federico-pisanu	2538963945	core[patch]: improve index/aindex api when batch_size<n_docs (#25754 ) - Description: prevent index function to re-index entire source document even if nothing has changed. - Issue: #22135 I worked on a solution to this issue that is a compromise between being cheap and being fast. In the previous code, when batch_size is greater than the number of docs from a certain source almost the entire source is deleted (all documents from that source except for the documents in the first batch) My solution deletes documents from vector store and record manager only if at least one document has changed for that source. Hope this can help! --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-09-30 20:57:41 +00:00
Christophe Bornet	db8845a62a	core: Add ruff rules for pycodestyle Warning (W) (#26964 ) All auto-fixes.	2024-09-30 09:31:43 -04:00
Christophe Bornet	7809b31b95	core[patch]: Add ruff rules for flake8-simplify (SIM) (#26848 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-09-27 20:13:23 +00:00
Christophe Bornet	a47b332841	core: Put Python version as a project requirement so it is considered by ruff (#26608 ) Ruff doesn't know about the python version in `[tool.poetry.dependencies]`. It can get it from `project.requires-python`. Notes: * poetry seems to have issues getting the python constraints from `requires-python` and using `python` in per dependency constraints. So I had to duplicate the info. I will open an issue on poetry. * `inspect.isclass()` doesn't work correctly with `GenericAlias` (`list[...]`, `dict[..., ...]`) on Python <3.11 so I added some `not isinstance(type, GenericAlias)` checks: Python 3.11 ```pycon >>> import inspect >>> inspect.isclass(list) True >>> inspect.isclass(list[str]) False ``` Python 3.9 ```pycon >>> import inspect >>> inspect.isclass(list) True >>> inspect.isclass(list[str]) True ``` Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-09-18 14:37:57 +00:00
Christophe Bornet	3a99467ccb	core[patch]: Add ruff rule UP006(use PEP585 annotations) (#26574 ) * Added rules `UPD006` now that Pydantic is v2+ --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-09-17 21:22:50 +00:00
Erick Friis	c2a3021bb0	multiple: pydantic 2 compatibility, v0.3 (#26443 ) Signed-off-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com> Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com> Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: ZhangShenao <15201440436@163.com> Co-authored-by: Friso H. Kingma <fhkingma@gmail.com> Co-authored-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: Morgante Pell <morgantep@google.com>	2024-09-13 14:38:45 -07:00
Christophe Bornet	ff0df5ea15	core[patch]: Add B(bugbear) ruff rules (#25520 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-08-28 07:09:29 +00:00
Eugene Yurtsev	425f6ffa5b	core[patch]: Fix aindex API (#25155 ) A previous PR accidentally broke the aindex API by renaming a positional argument vectorstore into vector_store. This PR reverts this change.	2024-08-08 12:08:18 -04:00
Eugene Yurtsev	d283f452cc	core[minor]: Add support for DocumentIndex in the index api (#25100 ) Support document index in the index api.	2024-08-06 12:30:49 -07:00
Eugene Yurtsev	41dfad5104	core[minor]: Introduce DocumentIndex abstraction (#25062 ) This PR adds a minimal document indexer abstraction. The goal of this abstraction is to allow developers to create custom retrievers that also have a standard indexing API and allow updating the document content in them. The abstraction comes with a test suite that can verify that the indexer implements the correct semantics. This is an iteration over a previous PRs (https://github.com/langchain-ai/langchain/pull/24364). The main difference is that we're sub-classing from BaseRetriever in this iteration and as so have consolidated the sync and async interfaces. The main problem with the current design is that runt time search configuration has to be specified at init rather than provided at run time. We will likely resolve this issue in one of the two ways: (1) Define a method (`get_retriever`) that will allow creating a retriever at run time with a specific configuration.. If we do this, we will likely break the subclass on BaseRetriever (2) Generalize base retriever so it can support structured queries --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-08-05 18:06:33 +00:00
Eugene Yurtsev	e0186df56b	core[patch]: Clarify upsert response semantics (#23921 )	2024-07-05 15:59:47 -04:00
Eugene Yurtsev	6f08e11d7c	core[minor]: add upsert, streaming_upsert, aupsert, astreaming_upsert methods to the VectorStore abstraction (#23774 ) This PR rolls out part of the new proposed interface for vectorstores (https://github.com/langchain-ai/langchain/pull/23544) to existing store implementations. The PR makes the following changes: 1. Adds standard upsert, streaming_upsert, aupsert, astreaming_upsert methods to the vectorstore. 2. Updates `add_texts` and `aadd_texts` to be non required with a default implementation that delegates to `upsert` and `aupsert` if those have been implemented. The original `add_texts` and `aadd_texts` methods are problematic as they spread object specific information across document and *kwargs. (e.g., ids are not a part of the document) 3. Adds a default implementation to `add_documents` and `aadd_documents` that delegates to `upsert` and `aupsert` respectively. 4. Adds standard unit tests to verify that a given vectorstore implements a correct read/write API. A downside of this implementation is that it creates `upsert` with a very similar signature to `add_documents`. The reason for introducing `upsert` is to: Remove any ambiguities about what information is allowed in `kwargs`. Specifically kwargs should only be used for information common to all indexed data. (e.g., indexing timeout). *Allow inheriting from an anticipated generalized interface for indexing that will allow indexing `BaseMedia` (i.e., allow making a vectorstore for images/audio etc.) `add_documents` can be deprecated in the future in favor of `upsert` to make sure that users have a single correct way of indexing content. --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-07-05 12:21:40 -04:00
Leonid Ganeline	716a316654	core: docstrings `indexing` (#23785 ) Added missed docstrings. Formatted docstrings to the consistent form.	2024-07-03 11:27:34 -04:00
Philippe PRADOS	8711c61298	core[minor]: Adds an in-memory implementation of RecordManager (#13200 ) Description: langchain offers three technologies to save data: - [vectorstore](https://python.langchain.com/docs/modules/data_connection/vectorstores/) - [docstore](https://js.langchain.com/docs/api/schema/classes/Docstore) - [record manager](https://python.langchain.com/docs/modules/data_connection/indexing) If you want to combine these technologies in a sample persistence stategy you need a common implementation for each. `DocStore` propose `InMemoryDocstore`. We propose the class `MemoryRecordManager` to complete the system. This is the prelude to another full-request, which needs a consistent combination of persistence components. Tag maintainer: @baskaryan Twitter handle: @pprados --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-06-20 12:19:10 -04:00
Eugene Yurtsev	4fe8403bfb	core[patch]: Expand documentation in the indexing namespace (#23134 )	2024-06-19 10:11:44 -04:00
Leonid Ganeline	1a2ff56cd8	core[patch[: docstring update (#21036 ) Added missed docstrings. Updated docstrings to consistent format.	2024-04-29 15:35:34 -04:00
Eugene Yurtsev	d8aa72f51d	core[minor],langchain[patch]: Move base indexing interface and logic to core (#20667 ) This PR moves the interface and the logic to core. The following changes to namespaces: `indexes` -> `indexing` `indexes._api` -> `indexing.api` Testing code is intentionally duplicated for now since it's testing different implementations of the record manager (in-memory vs. SQL). Common logic will need to be pulled out into the test client. A follow up PR will move the SQL based implementation outside of LangChain.	2024-04-24 13:18:42 -04:00

49 Commits