langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-09-23 11:30:37 +00:00

Author	SHA1	Message	Date
Mason Daugherty	ef9b5a9e18	add back standard_outputs	2025-07-28 10:47:26 -04:00
Mason Daugherty	5e9eb19a83	chore: update branch with changes from master (#32277 ) Co-authored-by: Maxime Grenu <69890511+cluster2600@users.noreply.github.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: jmaillefaud <jonathan.maillefaud@evooq.ch> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: tanwirahmad <tanwirahmad@users.noreply.github.com> Co-authored-by: Christophe Bornet <cbornet@hotmail.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: niceg <79145285+growmuye@users.noreply.github.com> Co-authored-by: Chaitanya varma <varmac301@gmail.com> Co-authored-by: dishaprakash <57954147+dishaprakash@users.noreply.github.com> Co-authored-by: Chester Curme <chester.curme@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Kanav Bansal <13186335+bansalkanav@users.noreply.github.com> Co-authored-by: Aleksandr Filippov <71711753+alex-feel@users.noreply.github.com> Co-authored-by: Alex Feel <afilippov@spotware.com>	2025-07-28 10:39:41 -04:00
ccurme	8acfd677bc	fix(core): add type key when tracing in some cases (#31825 )	2025-07-22 18:08:16 +00:00
Copilot	18c64aed6d	feat(core): add `sanitize_for_postgres` utility to fix PostgreSQL NUL byte DataError (#32157 ) This PR fixes the PostgreSQL NUL byte issue that causes `psycopg.DataError` when inserting documents containing `\x00` bytes into PostgreSQL-based vector stores. ## Problem PostgreSQL text fields cannot contain NUL (0x00) bytes. When documents with such characters are processed by PGVector or langchain-postgres implementations, they fail with: ``` (psycopg.DataError) PostgreSQL text fields cannot contain NUL (0x00) bytes ``` This commonly occurs when processing PDFs, documents from various loaders, or text extracted by libraries like unstructured that may contain embedded NUL bytes. ## Solution Added `sanitize_for_postgres()` utility function to `langchain_core.utils.strings` that removes or replaces NUL bytes from text content. ### Key Features - Simple API: `sanitize_for_postgres(text, replacement="")` - Configurable: Replace NUL bytes with empty string (default) or space for readability - Comprehensive: Handles all problematic examples from the original issue - Well-tested: Complete unit tests with real-world examples - Backward compatible: No breaking changes, purely additive ### Usage Example ```python from langchain_core.utils import sanitize_for_postgres from langchain_core.documents import Document # Before: This would fail with DataError problematic_content = "Getting\x00Started with embeddings" # After: Clean the content before database insertion clean_content = sanitize_for_postgres(problematic_content) # Result: "GettingStarted with embeddings" # Or preserve readability with spaces readable_content = sanitize_for_postgres(problematic_content, " ") # Result: "Getting Started with embeddings" # Use in Document processing doc = Document(page_content=clean_content, metadata={...}) ``` ### Integration Pattern PostgreSQL vector store implementations should sanitize content before insertion: ```python def add_documents(self, documents: List[Document]) -> List[str]: # Sanitize documents before insertion sanitized_docs = [] for doc in documents: sanitized_content = sanitize_for_postgres(doc.page_content, " ") sanitized_doc = Document( page_content=sanitized_content, metadata=doc.metadata, id=doc.id ) sanitized_docs.append(sanitized_doc) return self._insert_documents_to_db(sanitized_docs) ``` ## Changes Made - Added `sanitize_for_postgres()` function in `langchain_core/utils/strings.py` - Updated `langchain_core/utils/__init__.py` to export the new function - Added comprehensive unit tests in `tests/unit_tests/utils/test_strings.py` - Validated against all examples from the original issue report ## Testing All tests pass, including: - Basic NUL byte removal and replacement - Multiple consecutive NUL bytes - Empty string handling - Real examples from the GitHub issue - Backward compatibility with existing string utilities This utility enables PostgreSQL integrations in both langchain-community and langchain-postgres packages to handle documents with NUL bytes reliably. Fixes #26033. <!-- START COPILOT CODING AGENT TIPS --> --- 💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click [here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to start the survey. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com> Co-authored-by: Mason Daugherty <github@mdrxy.com>	2025-07-21 20:33:20 -04:00
Mohammad Mohtashim	095f4a7c28	fix(core): fix `parse_result`in case of self.first_tool_only with multiple keys matching for JsonOutputKeyToolsParser (#32106 ) * Description: Updated `parse_result` logic to handle cases where `self.first_tool_only` is `True` and multiple matching keys share the same function name. Instead of returning the first match prematurely, the method now prioritizes filtering results by the specified key to ensure correct selection. * Issue: #32100 --------- Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-21 12:50:22 -04:00
Isaac Francisco	98bfd57a76	fix(core): better error message for empty var names (#32073 ) Previously, we hit an index out of range error with empty variable names (accessing tag[0]), now we through a slightly nicer error --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-18 17:00:02 -04:00
Gurram Siddarth Reddy	427d2d6397	fix(core): implement sleep delay in FakeMessagesListChatModel `_generate` (#32014 ) implement sleep delay in FakeMessagesListChatModel._generate so the sleep parameter is respected, matching the documented behavior. This adds artificial latency between responses for testing purposes. Issue: closes [#31974](https://github.com/langchain-ai/langchain/issues/31974) following [docs](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.fake_chat_models.FakeMessagesListChatModel.html#langchain_core.language_models.fake_chat_models.FakeMessagesListChatModel.sleep) Dependencies: none Twitter handle: [@siddarthreddyg2](https://x.com/siddarthreddyg2) --------- Signed-off-by: Siddarthreddygsr <siddarthreddygsr@gmail.com>	2025-07-18 15:54:28 -04:00
open-swe[bot]	5da986c3f6	fix(core): JSON Schema reference resolution for list indices (#32088 ) Fixes #32042 ## Summary Fixes a critical bug in JSON Schema reference resolution that prevented correctly dereferencing numeric components in JSON pointer paths, specifically for list indices in `anyOf`, `oneOf`, and `allOf` arrays. ## Changes - Fixed `_retrieve_ref` function in `libs/core/langchain_core/utils/json_schema.py` to properly handle numeric components - Added comprehensive test function `test_dereference_refs_list_index()` in `libs/core/tests/unit_tests/utils/test_json_schema.py` - Resolved line length formatting issues - Improved type checking and index validation for list and dictionary references ## Key Improvements - Correctly handles list index references in JSON pointer paths - Maintains backward compatibility with existing dictionary numeric key functionality - Adds robust error handling for out-of-bounds and invalid indices - Passes all test cases covering various reference scenarios ## Test Coverage - Verified fix for `#/properties/payload/anyOf/1/properties/startDate` reference - Tested edge cases including out-of-bounds and negative indices - Ensured no regression in existing reference resolution functionality Resolves the reported issue with JSON Schema reference dereferencing for list indices. --------- Co-authored-by: open-swe-dev[bot] <open-swe-dev@users.noreply.github.com> Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-17 15:54:38 -04:00
Mohammad Mohtashim	96bf8262e2	fix: fixing missing Docstring Bug if no Docstring is provided in BaseModel class (#31608 ) - Description: Ensure that the tool description is an empty string when creating a Structured Tool from a Pydantic class in case no description is provided - Issue: Fixes #31606 --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-16 11:56:05 -04:00
Jacob Lee	535ba43b0d	feat(core): add an option to make deserialization more permissive (#32054 ) ## Description Currently when deserializing objects that contain non-deserializable values, we throw an error. However, there are cases (e.g. proxies that return response fields containing extra fields like Python datetimes), where these values are not important and we just want to drop them. Twitter handle: @hacubu --------- Co-authored-by: Mason Daugherty <github@mdrxy.com>	2025-07-15 17:00:01 -04:00
Eugene Yurtsev	02d0a9af6c	chore(core): unpin packaging dependency (#32032 ) Unpin packaging dependency --------- Co-authored-by: ntjohnson1 <24689722+ntjohnson1@users.noreply.github.com>	2025-07-14 21:42:32 +00:00
Christophe Bornet	d57216c295	feat(core): add ruff rules D to tests except D1 (#32000 ) Docs are not required for tests but when there are docstrings, they shall be correctly formatted. See https://docs.astral.sh/ruff/rules/#pydocstyle-d	2025-07-14 10:42:03 -04:00
Azhagammal	4d9c0b0883	fix[core]: added error message if the query vector or embedding contains NaN values (#31822 ) Description: Added an explicit validation step in `langchain_core.vectorstores.utils._cosine_similarity` to raise a `ValueError` if the input query or any embedding contains `NaN` values. This prevents silent failures or unstable behavior during similarity calculations, especially when using maximal_marginal_relevance. Issue: Fixes #31806 Dependencies: None --------- Co-authored-by: Azhagammal S C <azhagammal@kofluence.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-09 18:30:26 -04:00
Christophe Bornet	4215261be1	core: Cleanup pyproject (#31857 ) * Reorganize some toml properties * Fix some E501: line too long Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-07 13:30:48 -04:00
Mason Daugherty	a751a23c4e	fix: remove unused type ignore from three_values fixture in TestAsyncInMemoryStore (#31895 )	2025-07-07 13:22:53 -04:00
Christophe Bornet	03e8327e01	core: Ruff preview fixes (#31877 ) Auto-fixes from `uv run ruff check --fix --unsafe-fixes --preview` --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-07 13:02:40 -04:00
Christophe Bornet	4134b36db8	core: make ruff rule PLW1510 unfixable (#31868 ) See https://github.com/astral-sh/ruff/discussions/17087#discussioncomment-12675815 Tha autofix is misleading: it chooses to add `check=False` to keep the runtime behavior but in reality it hides the fact that most probably the user would prefer `check=True`.	2025-07-07 10:28:30 -04:00
Christophe Bornet	8aed3b61a9	core: Bump ruff version to 0.12 (#31846 )	2025-07-07 10:02:51 -04:00
Mohammad Mohtashim	b26d2250ba	core[patch]: Int Combine when Merging Dicts (#31572 ) - Description: Combining the Int Types by adding them which makes the most sense. - Issue: #31565	2025-07-04 14:44:16 -04:00
Christophe Bornet	46745f91b5	core: Use parametric tests in test_openai_tools (#31839 )	2025-07-03 08:43:46 -04:00
Eugene Yurtsev	9164e6f906	core[patch]: Add additional hashing options to indexing API, warn on SHA-1 (#31649 ) Add additional hashing options to the indexing API, warn on SHA-1 Requires: - Bumping langchain-core version - bumping min langchain-core in langchain --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2025-06-24 14:44:06 -04:00
Christophe Bornet	c7e82ad95d	core: Use parametrized test in test_correct_get_tracer_project (#31513 )	2025-06-23 18:55:57 -04:00
ccurme	ee83993b91	docs: document Anthropic cache TTL count details (#31708 )	2025-06-23 20:16:42 +00:00
Christophe Bornet	b1cc972567	core[patch]: Improve `RunnableWithMessageHistory` init arg types (#31639 ) `Runnable`'s `Input` is contravariant so we need to enumerate all possible inputs and it's not possible to put them in a `Union`. Also, it's better to only require a runnable that accepts`list[BaseMessage]` instead of a broader `Sequence[BaseMessage]` as internally the runnable is only called with a list.	2025-06-23 13:45:52 -04:00
Mikhail	6105a5841b	core: fix `get_buffer_string` output for structured message content (#31600 )	2025-06-20 23:21:50 +00:00
Mohammad Mohtashim	7ff405077d	core[patch]: Returning always 2D Array for _cosine_similarity (#31528 ) - Description: Very simple change in `_cosine_similarity` which always 2D array. - Issue: #31497	2025-06-20 11:25:02 -04:00
Christophe Bornet	7e046ea848	core: Cleanup Pydantic models and handle deprecation warnings (#30799 ) * Simplified Pydantic handling since Pydantic v1 is not supported anymore. * Replace use of deprecated v1 methods by corresponding v2 methods. * Remove use of other deprecated methods. * Activate mypy errors on deprecated methods use. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-06-20 10:42:52 -04:00
Sydney Runkle	5b165effcd	core(fix): revert `set_text` optimization (#31555 ) Revert serialization regression introduced in https://github.com/langchain-ai/langchain/pull/31238 Fixes https://github.com/langchain-ai/langchain/issues/31486	2025-06-10 13:36:55 -04:00
lc-arjun	35ae5eab4f	core: use run tree post/patch (#31500 ) Use run post/patch	2025-06-05 14:05:57 -07:00
Mohammad Mohtashim	ae3551c96b	core[patch]: Correct type casting of annotations in _infer_arg_descriptions (#31181 ) - Description: - In _infer_arg_descriptions, the annotations dictionary contains string representations of types instead of actual typing objects. This causes _is_annotated_type to fail, preventing the correct description from being generated. - This is a simple fix using the get_type_hints method, which resolves the annotations properly and is supported across all Python versions. - Issue: #31051 --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-06-05 11:58:36 -04:00
ccurme	741bb1ffa1	core[patch]: revert change to stream type hint (#31501 ) https://github.com/langchain-ai/langchain/pull/31286 included an update to the return type for `BaseChatModel.(a)stream`, from `Iterator[BaseMessageChunk]` to `Iterator[BaseMessage]`. This change is correct, because when streaming is disabled, the stream methods return an iterator of `BaseMessage`, and the inheritance is such that an `BaseMessage` is not a `BaseMessageChunk` (but the reverse is true). However, LangChain includes a pattern throughout its docs of [summing BaseMessageChunks](https://python.langchain.com/docs/how_to/streaming/#llms-and-chat-models) to accumulate a chat model stream. This pattern is implemented in tests for most integration packages and appears in application code. So https://github.com/langchain-ai/langchain/pull/31286 introduces mypy errors throughout the ecosystem (or maybe more accurately, it reveals that this pattern does not account for use of the `.stream` method when streaming is disabled). Here we revert just the change to the stream return type to unblock things. A fix for this should address docs + integration packages (or if we elect to just force people to update code, be explicit about that).	2025-06-05 11:20:06 -04:00
Christophe Bornet	539e5b6936	core: Add mypy strict-equality rule (#31286 )	2025-06-02 18:24:35 +00:00
Christophe Bornet	17c5a1621f	core: Improve Runnable `__or__` method typing annotations (#31273 ) * It is possible to chain a `Runnable` with an `AsyncIterator` as seen in `test_runnable.py`. * Iterator and AsyncIterator Input/Output of Callables must be put before `Callable[[Other], Any]` otherwise the pattern matching picks the latter.	2025-05-19 09:32:31 -04:00
OysterMax	eb25d7472d	core: support `Union` type args in strict mode of OpenAI function calling / structured output (#30971 ) Issue:[ #309070](https://github.com/langchain-ai/langchain/issues/30970) Cause Arg type in python code ``` arg: Union[SubSchema1, SubSchema2] ``` is translated to `anyOf` in json schema ``` "anyOf" : [{sub schema 1 ...}, {sub schema 1 ...}] ``` The value of anyOf is a list sub schemas. The bug is caused since the sub schemas inside `anyOf` list is not taken care of. The location where the issue happens is `convert_to_openai_function` function -> `_recursive_set_additional_properties_false` function, that recursively adds `"additionalProperties": false` to json schema which is [required by OpenAI's strict function calling](https://platform.openai.com/docs/guides/structured-outputs?api-mode=responses#additionalproperties-false-must-always-be-set-in-objects) Solution: This PR fixes this issue by iterating each sub schema inside `anyOf` list. A unit test is added. Twitter handle: shengboma If no one reviews your PR within a few days, please @-mention one of baskaryan, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2025-05-16 16:20:32 -04:00
Christophe Bornet	c982573f1e	core: Add ruff rules A (builtins shadowing) (#29312 ) See https://docs.astral.sh/ruff/rules/#flake8-builtins-a * Renamed vars where possible * Added `noqa` where backward compatibility was needed * Added `@override` when applicable	2025-05-16 15:19:37 -04:00
Christophe Bornet	a8f2ddee31	core: Add ruff rules RUF (#29353 ) See https://docs.astral.sh/ruff/rules/#ruff-specific-rules-ruf Mostly: * [RUF022](https://docs.astral.sh/ruff/rules/unsorted-dunder-all/) (unsorted `__all__`) * [RUF100](https://docs.astral.sh/ruff/rules/unused-noqa/) (unused noqa) * [RUF021](https://docs.astral.sh/ruff/rules/parenthesize-chained-operators/) (parenthesize-chained-operators) * [RUF015](https://docs.astral.sh/ruff/rules/unnecessary-iterable-allocation-for-first-element/) (unnecessary-iterable-allocation-for-first-element) * [RUF005](https://docs.astral.sh/ruff/rules/collection-literal-concatenation/) (collection-literal-concatenation) * [RUF046](https://docs.astral.sh/ruff/rules/unnecessary-cast-to-int/) (unnecessary-cast-to-int) --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-05-15 15:43:57 -04:00
Lope Ramos	b8ae2de169	langchain-core[patch]: `Incremental` record manager deletion should be batched (#31206 ) Description: Before this commit, if one record is batched in more than 32k rows for sqlite3 >= 3.32 or more than 999 rows for sqlite3 < 3.31, the `record_manager.delete_keys()` will fail, as we are creating a query with too many variables. This commit ensures that we are batching the delete operation leveraging the `cleanup_batch_size` as it is already done for `full` cleanup. Added unit tests for incremental mode as well on different deleting batch size.	2025-05-14 11:38:21 -04:00
CtrlMj	1e56c66f86	core: Fix issue 31035 alias fields in base tool langchain core (#31112 ) Description: The 'inspect' package in python skips over the aliases set in the schema of a pydantic model. This is a workound to include the aliases from the original input. issue: #31035 Cc: @ccurme @eyurtsev --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-05-12 11:04:13 -04:00
Jacob Lee	66d1ed6099	fix(core): Permit OpenAI style blocks to be passed into convert_to_openai_messages (#31140 ) Should effectively be a noop, just shouldn't throw CC @madams0013 --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2025-05-07 10:57:37 -04:00
ccurme	26ad239669	core, openai[patch]: prefer provider-assigned IDs when aggregating message chunks (#31080 ) When aggregating AIMessageChunks in a stream, core prefers the leftmost non-null ID. This is problematic because: - Core assigns IDs when they are null to `f"run-{run_manager.run_id}"` - The desired meaningful ID might not be available until midway through the stream, as is the case for the OpenAI Responses API. For the OpenAI Responses API, we assign message IDs to the top-level `AIMessage.id`. This works in `.(a)invoke`, but during `.(a)stream` the IDs get overwritten by the defaults assigned in langchain-core. These IDs [must](https://community.openai.com/t/how-to-solve-badrequesterror-400-item-rs-of-type-reasoning-was-provided-without-its-required-following-item-error-in-responses-api/1151686/9) be available on the AIMessage object to support passing reasoning items back to the API (e.g., if not using OpenAI's `previous_response_id` feature). We could add them elsewhere, but seeing as we've already made the decision to store them in `.id` during `.(a)invoke`, addressing the issue in core lets us fix the problem with no interface changes.	2025-05-02 11:18:18 -04:00
ccurme	f4863f82e2	core[patch]: fix edge cases for _is_openai_data_block (#30997 )	2025-04-24 10:48:52 -04:00
Jacob Lee	6b0b317cb5	feat(core): Autogenerate filenames for when converting file content blocks to OpenAI format (#30984 ) CC @ccurme --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-04-24 13:36:31 +00:00
ccurme	faef3e5d50	core, standard-tests: support PDF and audio input in Chat Completions format (#30979 ) Chat models currently implement support for: - images in OpenAI Chat Completions format - other multimodal types (e.g., PDF and audio) in a cross-provider [standard format](https://python.langchain.com/docs/how_to/multimodal_inputs/) Here we update core to extend support to PDF and audio input in Chat Completions format. If an OAI-format PDF or audio content block is passed into any chat model, it will be transformed to the LangChain standard format. We assume that any chat model supporting OAI-format PDF or audio has implemented support for the standard format.	2025-04-23 18:32:51 +00:00
Bagatur	d4fc734250	core[patch]: update dict prompt template (#30967 ) Align with JS changes made in https://github.com/langchain-ai/langchainjs/pull/8043	2025-04-23 10:04:50 -07:00
ccurme	4bc70766b5	core, openai: support standard multi-modal blocks in convert_to_openai_messages (#30968 )	2025-04-23 11:20:44 -04:00
Ahmed Tammaa	de56c31672	core: Improve OutputParser error messaging when model output is truncated (max_tokens) (#30936 ) Addresses #30158 When using the output parser—either in a chain or standalone—hitting max_tokens triggers a misleading “missing variable” error instead of indicating the output was truncated. This subtle bug often surfaces with Anthropic models. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-04-21 10:06:18 -04:00
Sydney Runkle	75e50a3efd	core[patch]: Raise `AttributeError` (instead of `ModuleNotFoundError`) in custom `__getattr__` (#30905 ) Follow up to https://github.com/langchain-ai/langchain/pull/30769, fixing the regression reported [here](https://github.com/langchain-ai/langchain/pull/30769#issuecomment-2807483610), thanks @krassowski for the report! Fix inspired by https://github.com/PrefectHQ/prefect/pull/16172/files Other changes: * Using tuples for `__all__`, except in `output_parsers` bc of a list namespace conflict * Using a helper function for imports due to repeated logic across `__init__.py` files becoming hard to maintain. Co-authored-by: Michał Krassowski < krassowski 5832902+krassowski@users.noreply.github.com>"	2025-04-17 14:15:28 -04:00
ccurme	86d51f6be6	multiple: permit optional fields on multimodal content blocks (#30887 ) Instead of stuffing provider-specific fields in `metadata`, they can go directly on the content block.	2025-04-17 12:48:46 +00:00
Sydney Runkle	88fce67724	core: Removing unnecessary `pydantic` core schema rebuilds (#30848 ) We only need to rebuild model schemas if type annotation information isn't available during declaration - that shouldn't be the case for these types corrected here. Need to do more thorough testing to make sure these structures have complete schemas, but hopefully this boosts startup / import time.	2025-04-16 12:00:08 -04:00
Christophe Bornet	a4ca1fe0ed	core: Remove some noqa (#30855 )	2025-04-15 13:08:40 -04:00

1 2 3 4 5 ...

560 Commits