langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-08-07 03:56:39 +00:00

Author	SHA1	Message	Date
Mason Daugherty	96cbd90cba	fix: formatting issues in docstrings (#32265 ) Ensures proper reStructuredText formatting by adding the required blank line before closing docstring quotes, which resolves the "Block quote ends without a blank line; unexpected unindent" warning.	2025-07-27 23:37:47 -04:00
Mason Daugherty	f624ad489a	feat(docs): improve devx, fix `Makefile` targets (#32237 ) TL;DR much of the provided `Makefile` targets were broken, and any time I wanted to preview changes locally I either had to refer to a command Chester gave me or try waiting on a Vercel preview deployment. With this PR, everything should behave like normal. Significant updates to the `Makefile` and documentation files, focusing on improving usability, adding clear messaging, and fixing/enhancing documentation workflows. ### Updates to `Makefile`: #### Enhanced build and cleaning processes: - Added informative messages (e.g., "📚 Building LangChain documentation...") to makefile targets like `docs_build`, `docs_clean`, and `api_docs_build` for better user feedback during execution. - Introduced a `clean-cache` target to the `docs` `Makefile` to clear cached dependencies and ensure clean builds. #### Improved dependency handling: - Modified `install-py-deps` to create a `.venv/deps_installed` marker, preventing redundant/duplicate dependency installations and improving efficiency. #### Streamlined file generation and infrastructure setup: - Added caching for the LangServe README download and parallelized feature table generation - Added user-friendly completion messages for targets like `copy-infra` and `render`. #### Documentation server updates: - Enhanced the `start` target with messages indicating server start and URL for local documentation viewing. --- ### Documentation Improvements: #### Content clarity and consistency: - Standardized section titles for consistency across documentation files. [[1]](diffhunk://#diff-9b1a85ea8a9dcf79f58246c88692cd7a36316665d7e05a69141cfdc50794c82aL1-R1) [[2]](diffhunk://#diff-944008ad3a79d8a312183618401fcfa71da0e69c75803eff09b779fc8e03183dL1-R1) - Refined phrasing and formatting in sections like "Dependency management" and "Formatting and linting" for better readability. [[1]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L6-R6) [[2]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L84-R82) #### Enhanced workflows: - Updated instructions for building and viewing documentation locally, including tips for specifying server ports and handling API reference previews. [[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L60-R94) [[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L82-R126) - Expanded guidance on cleaning documentation artifacts and using linting tools effectively. [[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L82-R126) [[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L107-R142) #### API reference documentation: - Improved instructions for generating and formatting in-code documentation, highlighting best practices for docstring writing. [[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L107-R142) [[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L144-R186) --- ### Minor Changes: - Added support for a new package name (`langchain_v1`) in the API documentation generation script. - Fixed minor capitalization and formatting issues in documentation files. [[1]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L40-R40) [[2]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L166-R160) --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-25 14:49:03 -04:00
niceg	0d6f915442	fix: LLM mimicking Unicode responses due to forced Unicode conversion of non-ASCII characters. (#32222 ) fix: Fix LLM mimicking Unicode responses due to forced Unicode conversion of non-ASCII characters. - Description: This PR fixes an issue where the LLM would mimic Unicode responses due to forced Unicode conversion of non-ASCII characters in tool calls. The fix involves disabling the `ensure_ascii` flag in `json.dumps()` when converting tool calls to OpenAI format. - Issue: Fixes ↓↓↓ input： ```json {'role': 'assistant', 'tool_calls': [{'type': 'function', 'id': 'call_nv9trcehdpihr21zj9po19vq', 'function': {'name': 'create_customer', 'arguments': '{"customer_name": "你好啊集团"}'}}]} ``` output: ```json {'role': 'assistant', 'tool_calls': [{'type': 'function', 'id': 'call_nv9trcehdpihr21zj9po19vq', 'function': {'name': 'create_customer', 'arguments': '{"customer_name": "\\u4f60\\u597d\\u554a\\u96c6\\u56e2"}'}}]} ``` then: llm will mimic outputting unicode. Unicode's vast number of symbols can lengthen LLM responses, leading to slower performance. <img width="686" height="277" alt="image" src="https://github.com/user-attachments/assets/28f3b007-3964-4455-bee2-68f86ac1906d" /> --------- Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-24 17:01:31 -04:00
Mason Daugherty	d53ebf367e	fix(docs): capitalization, codeblock formatting, and hyperlinks, note blocks (#32235 ) widespread cleanup attempt	2025-07-24 16:55:04 -04:00
Mason Daugherty	bd3d6496f3	release(core): 0.3.72 (#32214 ) fixes #32170	2025-07-23 20:33:48 -04:00
jmaillefaud	fb5da8384e	fix(core): Dereference Refs for pydantic schema fails in tool schema generation (#32203 ) The `_dereference_refs_helper` in `langchain_core.utils.json_schema` incorrectly handled objects with a reference and other fields. Issue: #32170 # Description We change the check so that it accepts other keys in the object.	2025-07-23 20:28:27 -04:00
Mason Daugherty	a02ad3d192	docs: formatting cleanup (#32188 ) * formatting cleaning * make `init_chat_model` more prominent in list of guides	2025-07-22 15:46:15 -04:00
ccurme	0c4054a7fc	release(core): 0.3.71 (#32186 )	2025-07-22 15:44:36 -04:00
ccurme	ebf2e11bcb	fix(core): exclude api_key from tracing metadata (#32184 ) (standard param)	2025-07-22 15:32:12 -04:00
ccurme	8acfd677bc	fix(core): add type key when tracing in some cases (#31825 )	2025-07-22 18:08:16 +00:00
Copilot	18c64aed6d	feat(core): add `sanitize_for_postgres` utility to fix PostgreSQL NUL byte DataError (#32157 ) This PR fixes the PostgreSQL NUL byte issue that causes `psycopg.DataError` when inserting documents containing `\x00` bytes into PostgreSQL-based vector stores. ## Problem PostgreSQL text fields cannot contain NUL (0x00) bytes. When documents with such characters are processed by PGVector or langchain-postgres implementations, they fail with: ``` (psycopg.DataError) PostgreSQL text fields cannot contain NUL (0x00) bytes ``` This commonly occurs when processing PDFs, documents from various loaders, or text extracted by libraries like unstructured that may contain embedded NUL bytes. ## Solution Added `sanitize_for_postgres()` utility function to `langchain_core.utils.strings` that removes or replaces NUL bytes from text content. ### Key Features - Simple API: `sanitize_for_postgres(text, replacement="")` - Configurable: Replace NUL bytes with empty string (default) or space for readability - Comprehensive: Handles all problematic examples from the original issue - Well-tested: Complete unit tests with real-world examples - Backward compatible: No breaking changes, purely additive ### Usage Example ```python from langchain_core.utils import sanitize_for_postgres from langchain_core.documents import Document # Before: This would fail with DataError problematic_content = "Getting\x00Started with embeddings" # After: Clean the content before database insertion clean_content = sanitize_for_postgres(problematic_content) # Result: "GettingStarted with embeddings" # Or preserve readability with spaces readable_content = sanitize_for_postgres(problematic_content, " ") # Result: "Getting Started with embeddings" # Use in Document processing doc = Document(page_content=clean_content, metadata={...}) ``` ### Integration Pattern PostgreSQL vector store implementations should sanitize content before insertion: ```python def add_documents(self, documents: List[Document]) -> List[str]: # Sanitize documents before insertion sanitized_docs = [] for doc in documents: sanitized_content = sanitize_for_postgres(doc.page_content, " ") sanitized_doc = Document( page_content=sanitized_content, metadata=doc.metadata, id=doc.id ) sanitized_docs.append(sanitized_doc) return self._insert_documents_to_db(sanitized_docs) ``` ## Changes Made - Added `sanitize_for_postgres()` function in `langchain_core/utils/strings.py` - Updated `langchain_core/utils/__init__.py` to export the new function - Added comprehensive unit tests in `tests/unit_tests/utils/test_strings.py` - Validated against all examples from the original issue report ## Testing All tests pass, including: - Basic NUL byte removal and replacement - Multiple consecutive NUL bytes - Empty string handling - Real examples from the GitHub issue - Backward compatibility with existing string utilities This utility enables PostgreSQL integrations in both langchain-community and langchain-postgres packages to handle documents with NUL bytes reliably. Fixes #26033. <!-- START COPILOT CODING AGENT TIPS --> --- 💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click [here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to start the survey. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com> Co-authored-by: Mason Daugherty <github@mdrxy.com>	2025-07-21 20:33:20 -04:00
Mohammad Mohtashim	095f4a7c28	fix(core): fix `parse_result`in case of self.first_tool_only with multiple keys matching for JsonOutputKeyToolsParser (#32106 ) * Description: Updated `parse_result` logic to handle cases where `self.first_tool_only` is `True` and multiple matching keys share the same function name. Instead of returning the first match prematurely, the method now prioritizes filtering results by the specified key to ensure correct selection. * Issue: #32100 --------- Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-21 12:50:22 -04:00
ccurme	0355da3159	release(core): 0.3.70 (#32144 )	2025-07-21 10:49:32 -04:00
astraszab	668c084520	docs(core): move incorrect arg limitation in rate limiter's docstring (#32118 )	2025-07-20 14:28:35 -04:00
Yoshi	6d71bb83de	fix(core): fix docstrings and add sleep to FakeListChatModel._call (#32108 )	2025-07-19 17:30:15 -04:00
Isaac Francisco	98bfd57a76	fix(core): better error message for empty var names (#32073 ) Previously, we hit an index out of range error with empty variable names (accessing tag[0]), now we through a slightly nicer error --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-18 17:00:02 -04:00
Gurram Siddarth Reddy	427d2d6397	fix(core): implement sleep delay in FakeMessagesListChatModel `_generate` (#32014 ) implement sleep delay in FakeMessagesListChatModel._generate so the sleep parameter is respected, matching the documented behavior. This adds artificial latency between responses for testing purposes. Issue: closes [#31974](https://github.com/langchain-ai/langchain/issues/31974) following [docs](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.fake_chat_models.FakeMessagesListChatModel.html#langchain_core.language_models.fake_chat_models.FakeMessagesListChatModel.sleep) Dependencies: none Twitter handle: [@siddarthreddyg2](https://x.com/siddarthreddyg2) --------- Signed-off-by: Siddarthreddygsr <siddarthreddygsr@gmail.com>	2025-07-18 15:54:28 -04:00
open-swe[bot]	5da986c3f6	fix(core): JSON Schema reference resolution for list indices (#32088 ) Fixes #32042 ## Summary Fixes a critical bug in JSON Schema reference resolution that prevented correctly dereferencing numeric components in JSON pointer paths, specifically for list indices in `anyOf`, `oneOf`, and `allOf` arrays. ## Changes - Fixed `_retrieve_ref` function in `libs/core/langchain_core/utils/json_schema.py` to properly handle numeric components - Added comprehensive test function `test_dereference_refs_list_index()` in `libs/core/tests/unit_tests/utils/test_json_schema.py` - Resolved line length formatting issues - Improved type checking and index validation for list and dictionary references ## Key Improvements - Correctly handles list index references in JSON pointer paths - Maintains backward compatibility with existing dictionary numeric key functionality - Adds robust error handling for out-of-bounds and invalid indices - Passes all test cases covering various reference scenarios ## Test Coverage - Verified fix for `#/properties/payload/anyOf/1/properties/startDate` reference - Tested edge cases including out-of-bounds and negative indices - Ensured no regression in existing reference resolution functionality Resolves the reported issue with JSON Schema reference dereferencing for list indices. --------- Co-authored-by: open-swe-dev[bot] <open-swe-dev@users.noreply.github.com> Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-17 15:54:38 -04:00
efj-amzn	d3072e2d2e	feat(core): update `_import_utils.py` to not mask the thrown exception (#32071 )	2025-07-16 17:11:56 -04:00
Mohammad Mohtashim	96bf8262e2	fix: fixing missing Docstring Bug if no Docstring is provided in BaseModel class (#31608 ) - Description: Ensure that the tool description is an empty string when creating a Structured Tool from a Pydantic class in case no description is provided - Issue: Fixes #31606 --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-16 11:56:05 -04:00
Casi	686a6b754c	fix: issue a warning if `np.nan` or `np.inf` are in `_cosine_similarity` argument Matrices (#31532 ) - Description: issues a warning if inf and nan are passed as inputs to langchain_core.vectorstores.utils._cosine_similarity - Issue: Fixes #31496 - Dependencies: no external dependencies added, only warnings module imported --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-16 11:50:09 -04:00
Mason Daugherty	ad44f0688b	release(core): release 0.3.69 (#32056 )	2025-07-15 17:13:46 -04:00
Jacob Lee	535ba43b0d	feat(core): add an option to make deserialization more permissive (#32054 ) ## Description Currently when deserializing objects that contain non-deserializable values, we throw an error. However, there are cases (e.g. proxies that return response fields containing extra fields like Python datetimes), where these values are not important and we just want to drop them. Twitter handle: @hacubu --------- Co-authored-by: Mason Daugherty <github@mdrxy.com>	2025-07-15 17:00:01 -04:00
董哥的黑板报	553ac1863b	docs: add deprecation notice for PipelinePromptTemplate (#31999 ) PR title: add deprecation notice for PipelinePromptTemplate PR message: In the API documentation, PipelinePromptTemplate is marked as deprecated, but this is not mentioned in the docs. I'm submitting this PR to add a deprecation notice to the docs. Tests: N/A (documentation only) --------- Co-authored-by: Mason Daugherty <github@mdrxy.com>	2025-07-14 15:27:29 +00:00
Andreas V. Jonsterhaug	6dcca35a34	fix(core): correct return type hints in BaseChatPromptTemplate (#32009 ) This PR changes the return type hints of the `format_prompt` and `aformat_prompt` methods in `BaseChatPromptTemplate` from `PromptValue` to `ChatPromptValue`. Since both methods always return a `ChatPromptValue`.	2025-07-14 11:00:01 -04:00
Azhagammal	4d9c0b0883	fix[core]: added error message if the query vector or embedding contains NaN values (#31822 ) Description: Added an explicit validation step in `langchain_core.vectorstores.utils._cosine_similarity` to raise a `ValueError` if the input query or any embedding contains `NaN` values. This prevents silent failures or unstable behavior during similarity calculations, especially when using maximal_marginal_relevance. Issue: Fixes #31806 Dependencies: None --------- Co-authored-by: Azhagammal S C <azhagammal@kofluence.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-09 18:30:26 -04:00
Chris G	65b098325b	core: docs: clarify where the kwargs in on_tool_start and on_tool_end go (#31909 ) Description: I traced the kwargs starting at `.invoke()` and it was not clear where they go. it was clarified to two layers down. so I changed it to make it more documented for the next person. Issue: No related issue. Dependencies: No dependency changes. Twitter handle: Nah. We're good. If no one reviews your PR within a few days, please @-mention one of baskaryan, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-08 10:35:31 -04:00
Mason Daugherty	e686a70ee0	ollama: thinking, tool streaming, docs, tests (#31772 ) * New `reasoning` (bool) param to support toggling [Ollama thinking](https://ollama.com/blog/thinking) (#31573, #31700). If `reasoning=True`, Ollama's `thinking` content will be placed in the model responses' `additional_kwargs.reasoning_content`. * Supported by: * ChatOllama (class level, invocation level TODO) * OllamaLLM (TODO) * Added tests to ensure streaming tool calls is successful (#29129) * Refactored tests that relied on `extract_reasoning()` * Myriad docs additions and consistency/typo fixes * Improved type safety in some spots Closes #29129 Addresses #31573 and #31700 Supersedes #31701	2025-07-07 13:56:41 -04:00
Michael Li	47d330f4e6	fix: fix file open with encoding in chat_history.py (#31884 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "core: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. If no one reviews your PR within a few days, please @-mention one of baskaryan, eyurtsev, ccurme, vbarda, hwchase17. Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-07 13:30:59 -04:00
Christophe Bornet	03e8327e01	core: Ruff preview fixes (#31877 ) Auto-fixes from `uv run ruff check --fix --unsafe-fixes --preview` --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-07 13:02:40 -04:00
Christophe Bornet	8aed3b61a9	core: Bump ruff version to 0.12 (#31846 )	2025-07-07 10:02:51 -04:00
Mohammad Mohtashim	b26d2250ba	core[patch]: Int Combine when Merging Dicts (#31572 ) - Description: Combining the Int Types by adding them which makes the most sense. - Issue: #31565	2025-07-04 14:44:16 -04:00
ccurme	2090f85789	core: release 0.3.68 (#31848 ) Also add `search_result` to recognized tool message block types.	2025-07-03 12:36:25 -04:00
Eugene Yurtsev	73fefe0295	core[path]: Use context manager for FileCallbackHandler (#31813 ) Recommend using context manager for FileCallbackHandler to avoid opening too many file descriptors --------- Co-authored-by: Mason Daugherty <github@mdrxy.com>	2025-07-02 13:31:58 -04:00
ccurme	04cc674e80	core: release 0.3.67 (#31791 )	2025-06-30 12:00:39 -04:00
ccurme	46cef90f7b	core: expose tool message recognized block types (#31787 )	2025-06-30 11:19:34 -04:00
Mason Daugherty	9aa75eaef3	docs: enhance docstring for `disable_streaming` parameter in BaseChatModel (#31759 ) Resolves #31758	2025-06-27 11:27:41 -04:00
Mason Daugherty	3c3320ae30	fix: update import paths for ChatOllama to use langchain_ollama instead of community (#31721 )	2025-06-24 16:19:31 -04:00
Eugene Yurtsev	9164e6f906	core[patch]: Add additional hashing options to indexing API, warn on SHA-1 (#31649 ) Add additional hashing options to the indexing API, warn on SHA-1 Requires: - Bumping langchain-core version - bumping min langchain-core in langchain --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2025-06-24 14:44:06 -04:00
Mason Daugherty	6d71b6b6ee	standard-tests: refactoring and fixes (#31703 ) - `libs/core/langchain_core/messages/base.py`: add model name to examples [per docs](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.html#langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.test_usage_metadata) ("0.3.17: Additionally check for the presence of model_name in the response metadata, which is needed for usage tracking in callback handlers") - `libs/core/langchain_core/utils/function_calling.py`: correct typo - `libs/standard-tests/langchain_tests/integration_tests/chat_models.py`: - `magic_function(input)` -> `magic_function(_input)` to prevent warning about redefining built in `input` - relocate a few tests for better grouping and narrative flow - suppress some type hint warnings following suit from similar tests - fix a few more typos - validate not only that `model_name` is defined, but that it is not empty (test_usage_metadata)	2025-06-23 23:22:31 +00:00
ccurme	ee83993b91	docs: document Anthropic cache TTL count details (#31708 )	2025-06-23 20:16:42 +00:00
Christophe Bornet	b1cc972567	core[patch]: Improve `RunnableWithMessageHistory` init arg types (#31639 ) `Runnable`'s `Input` is contravariant so we need to enumerate all possible inputs and it's not possible to put them in a `Union`. Also, it's better to only require a runnable that accepts`list[BaseMessage]` instead of a broader `Sequence[BaseMessage]` as internally the runnable is only called with a list.	2025-06-23 13:45:52 -04:00
Mikhail	6105a5841b	core: fix `get_buffer_string` output for structured message content (#31600 )	2025-06-20 23:21:50 +00:00
Bagatur	5271fd76f1	core[patch]: check before removing tags (#31691 )	2025-06-20 17:46:50 -04:00
ccurme	39a8a1121a	core: release 0.3.66 (#31690 )	2025-06-20 17:45:03 -04:00
Mohammad Mohtashim	7ff405077d	core[patch]: Returning always 2D Array for _cosine_similarity (#31528 ) - Description: Very simple change in `_cosine_similarity` which always 2D array. - Issue: #31497	2025-06-20 11:25:02 -04:00
Eugene Yurtsev	2842e0c8c1	core[patch]: Add doc-strings to tools/base.py (#31684 ) Add doc-strings	2025-06-20 11:16:57 -04:00
Christophe Bornet	7e046ea848	core: Cleanup Pydantic models and handle deprecation warnings (#30799 ) * Simplified Pydantic handling since Pydantic v1 is not supported anymore. * Replace use of deprecated v1 methods by corresponding v2 methods. * Remove use of other deprecated methods. * Activate mypy errors on deprecated methods use. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-06-20 10:42:52 -04:00
Nuno Campos	ddc850ca72	core: In LangChainTracer, send only the first token event (#31591 ) - only the first one is used for analytics	2025-06-12 14:04:23 -07:00
ccurme	b0f100af7e	core: release 0.3.65 (#31557 )	2025-06-10 19:39:50 +00:00

1 2 3 4 5 ...

954 Commits