langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-09-28 06:48:50 +00:00

Author	SHA1	Message	Date
Mason Daugherty	8e213c9f1a	fix(core): `AsyncCallbackHandler` docstring cleanup (#32897 ) plus IDE warning fixes	2025-09-10 21:31:45 -04:00
Yash Vishwanath Tobre	a8828b1bda	fix(core): raise `OutputParserException` for non-dict JSON outputs (#32236 ) Description: Raise a more descriptive OutputParserException when JSON parsing results in a non-dict type. This improves debugging and aligns behavior with expectations when using expected_keys. Issue: Fixes #32233 Twitter handle: @yashvtobre Testing: - Ran make format and make lint from the root directory; both passed cleanly. - Attempted make test but no such target exists in the root Makefile. - Executed tests directly via pytest targeting the relevant test file, confirming all tests pass except for unrelated async test failures outside the scope of this change. Notes: - No additional dependencies introduced. - Changes are backward compatible and isolated within the output parser module. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Mason Daugherty <mason@langchain.dev> Co-authored-by: Mason Daugherty <github@mdrxy.com>	2025-09-10 20:57:09 -04:00
Daniel Barker	25c34bd9b2	feat(core): allow custom Mermaid URL (#32831 ) - Description: Currently, `langchain_core.runnables.graph_mermaid.py` is hardcoded to use mermaid.ink to render graph diagrams. It would be nice to allow users to specify a custom URL, e.g. for self-hosted instances of the Mermaid server. - Issue: [Langchain Forum: allow custom mermaid API URL](https://forum.langchain.com/t/feature-request-allow-custom-mermaid-api-url/1472) - Dependencies: None - [X] Add tests and docs: Added unit tests using mock requests. - [X] Lint and test: Run `make format`, `make lint` and `make test`. Minimal example using the feature: ```python import os import operator from pathlib import Path from typing import Any, Annotated, TypedDict from langgraph.graph import StateGraph class State(TypedDict): messages: Annotated[list[dict[str, Any]], operator.add] def hello_node(state: State) -> State: return {"messages": [{"role": "assistant", "content": "pong!"}]} builder = StateGraph(State) builder.add_node("hello_node", hello_node) builder.add_edge("__start__", "hello_node") builder.add_edge("hello_node", "__end__") graph = builder.compile() # Run graph output = graph.invoke({"messages": [{"role": "user", "content": "ping?"}]}) # Draw graph Path("graph.png").write_bytes(graph.get_graph().draw_mermaid_png(base_url="https://custom-mermaid.ink")) ``` --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-09-10 17:14:50 -04:00
Jonathan Hill	2fed177d0b	fix(core): preserve `ToolMessage.status` field in `convert_to_messages` (#32840 )	2025-09-10 15:49:39 -04:00
Christophe Bornet	12921a94c5	test(core): reactivate commented tests in `test_indexing` (#32882 ) * These tests now pass * Commenting them is a [ruff ERA](https://docs.astral.sh/ruff/rules/commented-out-code/) violation	2025-09-10 11:14:14 -04:00
William FH	443f0ccb0e	release(core): 0.3.76 (#32877 )	2025-09-10 14:10:44 +00:00
William FH	f1d44d0f9d	fix(core): honor `enabled=false` in nested tracing (#31986 ) Co-authored-by: Mason Daugherty <mason@langchain.dev> Co-authored-by: Mason Daugherty <github@mdrxy.com>	2025-09-09 13:12:17 -07:00
Zhou Jing	dcc517b187	fix(core): ensure `InjectedToolCallId` always overrides LLM-generated values (#32766 )	2025-09-09 11:25:52 -04:00
Christophe Bornet	714f74a847	refactor(core): improve beta decorator (#32505 ) This is better than using a subclass as returning a `property` works with `ClassWithBetaMethods.beta_property.__doc__` Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-09-08 18:06:48 -04:00
PieterKok-jaam	33c7f230e0	feat(core): add `id` field to `Document` passed to filter for `InMemoryVectorStore` similarity search (#32688 ) Added an id field to the Document passed to filter for InMemoryVectorStore similarity search. This allows filtering by Document id and brings the input to the filter in line with the result returned by the vector similarity search. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-09-08 20:39:18 +00:00
Adithya1617	f5bd00d1f1	feat(core): support AWS Bedrock `document` content blocks in `msg_content_output` (#32799 )	2025-09-08 19:40:28 +00:00
Sadra Barikbin	3486d6c74d	feat(core): support for adding `PromptTemplate`s with formats other than `f-string` (#32253 ) Allow adding`PromptTemplate`s with formats other than `f-string`. Fixes #32151 --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-09-08 19:16:54 +00:00
Christophe Bornet	cc98fb9bee	chore(core): add ruff rule PLC0415 (#32351 ) See https://docs.astral.sh/ruff/rules/import-outside-top-level/ Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-09-08 14:15:04 -04:00
Christophe Bornet	16420cad71	chore(core): fix some pydocs to use google-style (#32764 ) Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-09-08 17:52:17 +00:00
Christophe Bornet	01fdeede50	chore(core): fix some ruff preview rules (#32785 ) Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-09-08 15:55:20 +00:00
Christophe Bornet	f4e83e0ad8	chore(core): fix some docstrings (from DOC preview rule) (#32833 ) * Add `Raises` sections * Add `Returns` sections * Add `Yields` sections --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-09-08 15:44:15 +00:00
Christophe Bornet	5840dad40b	chore(core): enable ruff docstring-code-format (#32834 ) See https://docs.astral.sh/ruff/settings/#format_docstring-code-format --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-09-08 15:13:50 +00:00
Christophe Bornet	e3b6c9bb66	chore(core): fix some mypy `warn_unreachable` issues (#32560 ) Found by setting `warn_unreachable: true` in mypy. --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-09-08 15:02:08 +00:00
Shahroz Ahmad	4828a85ab0	feat(core): add `web_search` in OpenAI tools list (#32738 )	2025-09-02 21:57:25 +00:00
ccurme	dbebe2ca97	release(core): 0.3.75 (#32693 )	2025-08-26 11:12:03 -04:00
ccurme	f33480c2cf	feat(core): trace response body on error (#32653 )	2025-08-25 14:28:19 -04:00
Mason Daugherty	1c55536ec1	chore(core): add note about backward compatibility for `tool_calls` in `additional_kwargs` in `JsonOutputKeyToolsParser`	2025-08-25 10:30:41 -04:00
Maitrey Talware	622337a297	docs(docs): fixed typos in documentations (#32661 ) Minor typo fixes. (Not linked to current open issues)	2025-08-25 10:02:53 -04:00
Christophe Bornet	02d6b9106b	chore(core): add mypy pydantic plugin (#32604 ) This helps to remove a bunch of mypy false positives.	2025-08-19 09:39:53 -04:00
William FH	b470c79f1d	refactor(core): Use duck typing for `_StreamingCallbackHandler` (#32535 ) It's used in langgraph and maybe elsewhere, so would be preferable if it could just be duck-typed	2025-08-19 05:41:07 -07:00
Mason Daugherty	a0331285d7	fix(core): Support no-args tools by defaulting args to empty dict (#32530 ) Supersedes #32408 Description: This PR ensures that tool calls without explicitly provided `args` will default to an empty dictionary (`{}`), allowing tools with no parameters (e.g. `def foo() -> str`) to be registered and invoked without validation errors. This change improves compatibility with agent frameworks that may omit the `args` field when generating tool calls. Issue: See [langgraph#5722](https://github.com/langchain-ai/langgraph/issues/5722) – LangGraph currently emits tool calls without `args`, which leads to validation errors when tools with no parameters are invoked. This PR ensures compatibility by defaulting `args` to `{}` when missing. Dependencies: None --------- Thank you for contributing to LangChain! Follow these steps to mark your pull request as ready for review. If any of these steps are not completed, your PR will not be considered for review. - [ ] PR title: Follows the format: {TYPE}({SCOPE}): {DESCRIPTION} - Examples: - feat(core): add multi-tenant support - fix(cli): resolve flag parsing error - docs(openai): update API usage examples - Allowed `{TYPE}` values: - feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert, release - Allowed `{SCOPE}` values (optional): - core, cli, langchain, standard-tests, docs, anthropic, chroma, deepseek, exa, fireworks, groq, huggingface, mistralai, nomic, ollama, openai, perplexity, prompty, qdrant, xai - Note: the `{DESCRIPTION}` must not start with an uppercase letter. - Once you've written the title, please delete this checklist item; do not include it in the PR. - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change. Include a [closing keyword](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword) if applicable to a relevant issue. - Issue: the issue # it fixes, if applicable (e.g. Fixes #123) - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, you must include: 1. A test for the integration, preferably unit tests that do not rely on network access, 2. An example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. We will not consider a PR unless these three are passing in CI. See [contribution guidelines](https://python.langchain.com/docs/contributing/) for more. Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to `pyproject.toml` files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. --------- Signed-off-by: jitokim <pigberger70@gmail.com> Co-authored-by: jito <pigberger70@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-08-14 20:28:36 +00:00
Mason Daugherty	ee4c2510eb	feat: port various nit changes from `wip-v0.4` (#32506 ) Lots of work that wasn't directly related to core improvements/messages/testing functionality	2025-08-11 15:09:08 -04:00
Christophe Bornet	f55186b38f	fix(core): fix beta decorator for properties (#32497 )	2025-08-11 12:43:53 -04:00
Mason Daugherty	c31236264e	chore: formatting across codebase (#32466 )	2025-08-08 10:20:10 -04:00
ccurme	6727d6e8c8	release(core): 0.3.74 (#32454 )	2025-08-07 16:39:01 -04:00
ccurme	ec2b34a02d	feat(openai): custom tools (#32449 )	2025-08-07 16:30:01 -04:00
ccurme	06d8754b0b	release(core): 0.3.73 (#32446 )	2025-08-07 09:03:53 -04:00
ccurme	6e108c1cb4	feat(core): zero-out token costs for cache hits (#32437 )	2025-08-07 08:49:34 -04:00
John Bledsoe	bc4251b9e0	fix(core): fix index checking when merging lists (#32431 ) Description: fix an issue I discovered when attempting to merge messages in which one message has an `index` key in its content dictionary and another does not.	2025-08-06 12:47:33 -04:00
Mason Daugherty	fbd5a238d8	fix(core): revert "fix: tool call streaming bug with inconsistent indices from Qwen3" (#32307 ) Reverts langchain-ai/langchain#32160 Original issue stems from using `ChatOpenAI` to interact with a `qwen` model. Recommended to use [langchain-qwq](https://python.langchain.com/docs/integrations/chat/qwq/) which is built for Qwen	2025-07-29 10:26:38 -04:00
Copilot	0b56c1bc4b	fix: tool call streaming bug with inconsistent indices from Qwen3 (#32160 ) Fixes a streaming bug where models like Qwen3 (using OpenAI interface) send tool call chunks with inconsistent indices, resulting in duplicate/erroneous tool calls instead of a single merged tool call. ## Problem When Qwen3 streams tool calls, it sends chunks with inconsistent `index` values: - First chunk: `index=1` with tool name and partial arguments - Subsequent chunks: `index=0` with `name=None`, `id=None` and argument continuation The existing `merge_lists` function only merges chunks when their `index` values match exactly, causing these logically related chunks to remain separate, resulting in multiple incomplete tool calls instead of one complete tool call. ```python # Before fix: Results in 1 valid + 1 invalid tool call chunk1 = AIMessageChunk(tool_call_chunks=[ {"name": "search", "args": '{"query":', "id": "call_123", "index": 1} ]) chunk2 = AIMessageChunk(tool_call_chunks=[ {"name": None, "args": ' "test"}', "id": None, "index": 0} ]) merged = chunk1 + chunk2 # Creates 2 separate tool calls # After fix: Results in 1 complete tool call merged = chunk1 + chunk2 # Creates 1 merged tool call: search({"query": "test"}) ``` ## Solution Enhanced the `merge_lists` function in `langchain_core/utils/_merge.py` with intelligent tool call chunk merging: 1. Preserves existing behavior: Same-index chunks still merge as before 2. Adds special handling: Tool call chunks with `name=None`/`id=None` that don't match any existing index are now merged with the most recent complete tool call chunk 3. Maintains backward compatibility: All existing functionality works unchanged 4. Targeted fix: Only affects tool call chunks, doesn't change behavior for other list items The fix specifically handles the pattern where: - A continuation chunk has `name=None` and `id=None` (indicating it's part of an ongoing tool call) - No matching index is found in existing chunks - There exists a recent tool call chunk with a valid name or ID to merge with ## Testing Added comprehensive test coverage including: - ✅ Qwen3-style chunks with different indices now merge correctly - ✅ Existing same-index behavior preserved - ✅ Multiple distinct tool calls remain separate - ✅ Edge cases handled (empty chunks, orphaned continuations) - ✅ Backward compatibility maintained Fixes #31511. <!-- START COPILOT CODING AGENT TIPS --> --- 💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click [here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to start the survey. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com> Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-28 22:31:41 +00:00
Copilot	ad88e5aaec	fix(core): resolve cache validation error by safely converting Generation to ChatGeneration objects (#32156 ) ## Problem ChatLiteLLM encounters a `ValidationError` when using cache on subsequent calls, causing the following error: ``` ValidationError(model='ChatResult', errors=[{'loc': ('generations', 0, 'type'), 'msg': "unexpected value; permitted: 'ChatGeneration'", 'type': 'value_error.const', 'ctx': {'given': 'Generation', 'permitted': ('ChatGeneration',)}}]) ``` This occurs because: 1. The cache stores `Generation` objects (with `type="Generation"`) 2. But `ChatResult` expects `ChatGeneration` objects (with `type="ChatGeneration"` and a required `message` field) 3. When cached values are retrieved, validation fails due to the type mismatch ## Solution Added graceful handling in both sync (`_generate_with_cache`) and async (`_agenerate_with_cache`) cache methods to: 1. Detect when cached values contain `Generation` objects instead of expected `ChatGeneration` objects 2. Convert them to `ChatGeneration` objects by wrapping the text content in an `AIMessage` 3. Preserve all original metadata (`generation_info`) 4. Allow `ChatResult` creation to succeed without validation errors ## Example ```python # Before: This would fail with ValidationError from langchain_community.chat_models import ChatLiteLLM from langchain_community.cache import SQLiteCache from langchain.globals import set_llm_cache set_llm_cache(SQLiteCache(database_path="cache.db")) llm = ChatLiteLLM(model_name="openai/gpt-4o", cache=True, temperature=0) print(llm.predict("test")) # Works fine (cache empty) print(llm.predict("test")) # Now works instead of ValidationError # After: Seamlessly handles both Generation and ChatGeneration objects ``` ## Changes - `libs/core/langchain_core/language_models/chat_models.py`: - Added `Generation` import from `langchain_core.outputs` - Enhanced cache retrieval logic in `_generate_with_cache` and `_agenerate_with_cache` methods - Added conversion from `Generation` to `ChatGeneration` objects when needed - `libs/core/tests/unit_tests/language_models/chat_models/test_cache.py`: - Added test case to validate the conversion logic handles mixed object types ## Impact - Backward Compatible: Existing code continues to work unchanged - Minimal Change: Only affects cache retrieval path, no API changes - Robust: Handles both legacy cached `Generation` objects and new `ChatGeneration` objects - Preserves Data: All original content and metadata is maintained during conversion Fixes #22389. <!-- START COPILOT CODING AGENT TIPS --> --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com> Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-28 22:28:16 +00:00
Mason Daugherty	a07d2c5016	refactor: remove references to unsupported model `claude-3-sonnet-20240229` (#32281 ) Addresses some (but not all) test issues brought about in #32280	2025-07-28 11:57:43 -04:00
Aleksandr Filippov	f0b6baa0ef	fix(core): track within-batch deduplication in indexing num_skipped count (#32273 ) Description: Fixes incorrect `num_skipped` count in the LangChain indexing API. The current implementation only counts documents that already exist in RecordManager (cross-batch duplicates) but fails to count documents removed during within-batch deduplication via `_deduplicate_in_order()`. This PR adds tracking of the original batch size before deduplication and includes the difference in `num_skipped`, ensuring that `num_added + num_skipped` equals the total number of input documents. Issue: Fixes incorrect document count reporting in indexing statistics Dependencies: None Fixes #32272 --------- Co-authored-by: Alex Feel <afilippov@spotware.com>	2025-07-28 09:58:51 -04:00
Mason Daugherty	96cbd90cba	fix: formatting issues in docstrings (#32265 ) Ensures proper reStructuredText formatting by adding the required blank line before closing docstring quotes, which resolves the "Block quote ends without a blank line; unexpected unindent" warning.	2025-07-27 23:37:47 -04:00
Mason Daugherty	f624ad489a	feat(docs): improve devx, fix `Makefile` targets (#32237 ) TL;DR much of the provided `Makefile` targets were broken, and any time I wanted to preview changes locally I either had to refer to a command Chester gave me or try waiting on a Vercel preview deployment. With this PR, everything should behave like normal. Significant updates to the `Makefile` and documentation files, focusing on improving usability, adding clear messaging, and fixing/enhancing documentation workflows. ### Updates to `Makefile`: #### Enhanced build and cleaning processes: - Added informative messages (e.g., "📚 Building LangChain documentation...") to makefile targets like `docs_build`, `docs_clean`, and `api_docs_build` for better user feedback during execution. - Introduced a `clean-cache` target to the `docs` `Makefile` to clear cached dependencies and ensure clean builds. #### Improved dependency handling: - Modified `install-py-deps` to create a `.venv/deps_installed` marker, preventing redundant/duplicate dependency installations and improving efficiency. #### Streamlined file generation and infrastructure setup: - Added caching for the LangServe README download and parallelized feature table generation - Added user-friendly completion messages for targets like `copy-infra` and `render`. #### Documentation server updates: - Enhanced the `start` target with messages indicating server start and URL for local documentation viewing. --- ### Documentation Improvements: #### Content clarity and consistency: - Standardized section titles for consistency across documentation files. [[1]](diffhunk://#diff-9b1a85ea8a9dcf79f58246c88692cd7a36316665d7e05a69141cfdc50794c82aL1-R1) [[2]](diffhunk://#diff-944008ad3a79d8a312183618401fcfa71da0e69c75803eff09b779fc8e03183dL1-R1) - Refined phrasing and formatting in sections like "Dependency management" and "Formatting and linting" for better readability. [[1]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L6-R6) [[2]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L84-R82) #### Enhanced workflows: - Updated instructions for building and viewing documentation locally, including tips for specifying server ports and handling API reference previews. [[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L60-R94) [[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L82-R126) - Expanded guidance on cleaning documentation artifacts and using linting tools effectively. [[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L82-R126) [[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L107-R142) #### API reference documentation: - Improved instructions for generating and formatting in-code documentation, highlighting best practices for docstring writing. [[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L107-R142) [[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L144-R186) --- ### Minor Changes: - Added support for a new package name (`langchain_v1`) in the API documentation generation script. - Fixed minor capitalization and formatting issues in documentation files. [[1]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L40-R40) [[2]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L166-R160) --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-25 14:49:03 -04:00
niceg	0d6f915442	fix: LLM mimicking Unicode responses due to forced Unicode conversion of non-ASCII characters. (#32222 ) fix: Fix LLM mimicking Unicode responses due to forced Unicode conversion of non-ASCII characters. - Description: This PR fixes an issue where the LLM would mimic Unicode responses due to forced Unicode conversion of non-ASCII characters in tool calls. The fix involves disabling the `ensure_ascii` flag in `json.dumps()` when converting tool calls to OpenAI format. - Issue: Fixes ↓↓↓ input： ```json {'role': 'assistant', 'tool_calls': [{'type': 'function', 'id': 'call_nv9trcehdpihr21zj9po19vq', 'function': {'name': 'create_customer', 'arguments': '{"customer_name": "你好啊集团"}'}}]} ``` output: ```json {'role': 'assistant', 'tool_calls': [{'type': 'function', 'id': 'call_nv9trcehdpihr21zj9po19vq', 'function': {'name': 'create_customer', 'arguments': '{"customer_name": "\\u4f60\\u597d\\u554a\\u96c6\\u56e2"}'}}]} ``` then: llm will mimic outputting unicode. Unicode's vast number of symbols can lengthen LLM responses, leading to slower performance. <img width="686" height="277" alt="image" src="https://github.com/user-attachments/assets/28f3b007-3964-4455-bee2-68f86ac1906d" /> --------- Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-24 17:01:31 -04:00
Mason Daugherty	d53ebf367e	fix(docs): capitalization, codeblock formatting, and hyperlinks, note blocks (#32235 ) widespread cleanup attempt	2025-07-24 16:55:04 -04:00
Mason Daugherty	bd3d6496f3	release(core): 0.3.72 (#32214 ) fixes #32170	2025-07-23 20:33:48 -04:00
jmaillefaud	fb5da8384e	fix(core): Dereference Refs for pydantic schema fails in tool schema generation (#32203 ) The `_dereference_refs_helper` in `langchain_core.utils.json_schema` incorrectly handled objects with a reference and other fields. Issue: #32170 # Description We change the check so that it accepts other keys in the object.	2025-07-23 20:28:27 -04:00
Mason Daugherty	a02ad3d192	docs: formatting cleanup (#32188 ) * formatting cleaning * make `init_chat_model` more prominent in list of guides	2025-07-22 15:46:15 -04:00
ccurme	0c4054a7fc	release(core): 0.3.71 (#32186 )	2025-07-22 15:44:36 -04:00
ccurme	ebf2e11bcb	fix(core): exclude api_key from tracing metadata (#32184 ) (standard param)	2025-07-22 15:32:12 -04:00
ccurme	8acfd677bc	fix(core): add type key when tracing in some cases (#31825 )	2025-07-22 18:08:16 +00:00
Copilot	18c64aed6d	feat(core): add `sanitize_for_postgres` utility to fix PostgreSQL NUL byte DataError (#32157 ) This PR fixes the PostgreSQL NUL byte issue that causes `psycopg.DataError` when inserting documents containing `\x00` bytes into PostgreSQL-based vector stores. ## Problem PostgreSQL text fields cannot contain NUL (0x00) bytes. When documents with such characters are processed by PGVector or langchain-postgres implementations, they fail with: ``` (psycopg.DataError) PostgreSQL text fields cannot contain NUL (0x00) bytes ``` This commonly occurs when processing PDFs, documents from various loaders, or text extracted by libraries like unstructured that may contain embedded NUL bytes. ## Solution Added `sanitize_for_postgres()` utility function to `langchain_core.utils.strings` that removes or replaces NUL bytes from text content. ### Key Features - Simple API: `sanitize_for_postgres(text, replacement="")` - Configurable: Replace NUL bytes with empty string (default) or space for readability - Comprehensive: Handles all problematic examples from the original issue - Well-tested: Complete unit tests with real-world examples - Backward compatible: No breaking changes, purely additive ### Usage Example ```python from langchain_core.utils import sanitize_for_postgres from langchain_core.documents import Document # Before: This would fail with DataError problematic_content = "Getting\x00Started with embeddings" # After: Clean the content before database insertion clean_content = sanitize_for_postgres(problematic_content) # Result: "GettingStarted with embeddings" # Or preserve readability with spaces readable_content = sanitize_for_postgres(problematic_content, " ") # Result: "Getting Started with embeddings" # Use in Document processing doc = Document(page_content=clean_content, metadata={...}) ``` ### Integration Pattern PostgreSQL vector store implementations should sanitize content before insertion: ```python def add_documents(self, documents: List[Document]) -> List[str]: # Sanitize documents before insertion sanitized_docs = [] for doc in documents: sanitized_content = sanitize_for_postgres(doc.page_content, " ") sanitized_doc = Document( page_content=sanitized_content, metadata=doc.metadata, id=doc.id ) sanitized_docs.append(sanitized_doc) return self._insert_documents_to_db(sanitized_docs) ``` ## Changes Made - Added `sanitize_for_postgres()` function in `langchain_core/utils/strings.py` - Updated `langchain_core/utils/__init__.py` to export the new function - Added comprehensive unit tests in `tests/unit_tests/utils/test_strings.py` - Validated against all examples from the original issue report ## Testing All tests pass, including: - Basic NUL byte removal and replacement - Multiple consecutive NUL bytes - Empty string handling - Real examples from the GitHub issue - Backward compatibility with existing string utilities This utility enables PostgreSQL integrations in both langchain-community and langchain-postgres packages to handle documents with NUL bytes reliably. Fixes #26033. <!-- START COPILOT CODING AGENT TIPS --> --- 💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click [here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to start the survey. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com> Co-authored-by: Mason Daugherty <github@mdrxy.com>	2025-07-21 20:33:20 -04:00

1 2 3 4 5 ...

993 Commits