langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-08-13 22:59:05 +00:00

Author	SHA1	Message	Date
Mason Daugherty	73c49d31d6	Merge branch 'wip-v0.4' into mdrxy/ollama_v1	2025-08-06 18:01:02 -04:00
Mason Daugherty	376f70be96	sync wip with master (#32436 ) Co-authored-by: Kanav Bansal <13186335+bansalkanav@users.noreply.github.com> Co-authored-by: Pranav Bhartiya <124018094+pranauww@users.noreply.github.com> Co-authored-by: Nelson Sproul <nelson.sproul@gmail.com> Co-authored-by: John Bledsoe <jmbledsoe@gmail.com>	2025-08-06 17:57:05 -04:00
Mason Daugherty	821527b97a	more id logic	2025-08-06 11:19:41 -04:00
Mason Daugherty	d5b26bc358	snapshot	2025-08-05 16:59:27 -04:00
Mason Daugherty	661ea97c1e	snapshot	2025-08-05 16:10:16 -04:00
Mason Daugherty	733da01bd4	Merge branch 'wip-v0.4' into mdrxy/ollama_v1	2025-08-05 16:03:24 -04:00
ccurme	e02eed5489	feat: standard outputs (#32287 ) Co-authored-by: Mason Daugherty <mason@langchain.dev> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Nuno Campos <nuno@langchain.dev>	2025-08-05 15:17:32 -04:00
Mason Daugherty	5b3ff1215e	Merge branch 'standard_outputs_copy' into mdrxy/ollama_v1	2025-08-05 15:16:23 -04:00
ccurme	56ee00cb1d	fix(core): rename output_version to message_version (#32412 )	2025-08-05 14:23:58 -04:00
Mason Daugherty	485b0b36ab	more `dumps()` tests	2025-08-05 10:50:07 -04:00
Mason Daugherty	551663d0b7	namespace refactor	2025-08-05 10:28:07 -04:00
Mason Daugherty	4651457c7e	Merge remote-tracking branch 'origin/standard_outputs_copy' into mdrxy/ollama_v1	2025-08-05 09:56:17 -04:00
Mason Daugherty	c709f85c27	snapshots	2025-08-05 09:55:13 -04:00
ccurme	c36b123c8c	fix(core): refactor new types into top-level v1 namespace (#32403 )	2025-08-05 09:21:31 -04:00
ccurme	deae8cc164	feat(core): support returning v1 ToolMessage in tools (#32397 )	2025-08-05 08:50:02 -04:00
Mason Daugherty	5c9ce7fd2b	remove outdated test	2025-08-04 23:47:17 -04:00
Mason Daugherty	f3c863447f	fix: core imports tests	2025-08-04 23:24:47 -04:00
Mason Daugherty	cc56b8dbd3	Merge branch 'standard_outputs_copy' into mdrxy/ollama_v1 + updates	2025-08-04 12:57:38 -04:00
ccurme	ff3153c04d	feat(core): move tool call chunks to content (v1) (#32358 )	2025-08-04 11:32:11 -04:00
Mason Daugherty	bc5c6751dc	fix test	2025-07-31 17:37:42 -04:00
Mason Daugherty	7a0c3e0482	fix: update snapshots	2025-07-31 17:27:37 -04:00
Mason Daugherty	525fa453be	fix: revert pydantic bump (#32355 )	2025-07-31 12:22:23 -04:00
Mason Daugherty	c88adfad70	fix: updatd snapshots	2025-07-31 11:21:40 -04:00
Mason Daugherty	44bd6fe837	feat(core): content block factories + ids + docs + tests (#32316 ) ## Benefits 1. Type Safety: Compile-time validation of required fields and proper type setting 2. Less Boilerplate: No need to manually set the `type` field or generate IDs 3. Input Validation: Runtime validation prevents common errors (e.g., base64 without MIME type) 4. Consistent Patterns: Standardized creation patterns across all block types 5. Better Developer Experience: Cleaner, more intuitive API than manual TypedDict construction. Also follows similar other patterns (e.g. `create_react_agent`, `init_chat_model`	2025-07-31 11:12:00 -04:00
ccurme	740d9d3e7e	fix(core): fix tracing for new message types in case of multiple messages (#32352 )	2025-07-31 10:47:23 -04:00
ccurme	642262f6fe	feat(core): widen input type for output parsers (#32332 )	2025-07-30 16:52:34 -04:00
Chester Curme	a0abb79f6d	Merge branch 'wip-v0.4' into standard_outputs_copy	2025-07-30 13:17:08 -04:00
ccurme	309d1a232a	fix(openai): fix tracing and typing on standard outputs branch (#32326 )	2025-07-30 13:02:15 -04:00
ccurme	8cf97e838c	fix(core): lint standard outputs branch (#32311 )	2025-07-29 15:38:45 -04:00
Mason Daugherty	fbd5a238d8	fix(core): revert "fix: tool call streaming bug with inconsistent indices from Qwen3" (#32307 ) Reverts langchain-ai/langchain#32160 Original issue stems from using `ChatOpenAI` to interact with a `qwen` model. Recommended to use [langchain-qwq](https://python.langchain.com/docs/integrations/chat/qwq/) which is built for Qwen	2025-07-29 10:26:38 -04:00
Mason Daugherty	0e287763cd	fix: lint	2025-07-28 18:49:43 -04:00
ccurme	c15e55b33c	feat(openai): v1 message format support (#32296 )	2025-07-28 18:42:26 -04:00
Copilot	0b56c1bc4b	fix: tool call streaming bug with inconsistent indices from Qwen3 (#32160 ) Fixes a streaming bug where models like Qwen3 (using OpenAI interface) send tool call chunks with inconsistent indices, resulting in duplicate/erroneous tool calls instead of a single merged tool call. ## Problem When Qwen3 streams tool calls, it sends chunks with inconsistent `index` values: - First chunk: `index=1` with tool name and partial arguments - Subsequent chunks: `index=0` with `name=None`, `id=None` and argument continuation The existing `merge_lists` function only merges chunks when their `index` values match exactly, causing these logically related chunks to remain separate, resulting in multiple incomplete tool calls instead of one complete tool call. ```python # Before fix: Results in 1 valid + 1 invalid tool call chunk1 = AIMessageChunk(tool_call_chunks=[ {"name": "search", "args": '{"query":', "id": "call_123", "index": 1} ]) chunk2 = AIMessageChunk(tool_call_chunks=[ {"name": None, "args": ' "test"}', "id": None, "index": 0} ]) merged = chunk1 + chunk2 # Creates 2 separate tool calls # After fix: Results in 1 complete tool call merged = chunk1 + chunk2 # Creates 1 merged tool call: search({"query": "test"}) ``` ## Solution Enhanced the `merge_lists` function in `langchain_core/utils/_merge.py` with intelligent tool call chunk merging: 1. Preserves existing behavior: Same-index chunks still merge as before 2. Adds special handling: Tool call chunks with `name=None`/`id=None` that don't match any existing index are now merged with the most recent complete tool call chunk 3. Maintains backward compatibility: All existing functionality works unchanged 4. Targeted fix: Only affects tool call chunks, doesn't change behavior for other list items The fix specifically handles the pattern where: - A continuation chunk has `name=None` and `id=None` (indicating it's part of an ongoing tool call) - No matching index is found in existing chunks - There exists a recent tool call chunk with a valid name or ID to merge with ## Testing Added comprehensive test coverage including: - ✅ Qwen3-style chunks with different indices now merge correctly - ✅ Existing same-index behavior preserved - ✅ Multiple distinct tool calls remain separate - ✅ Edge cases handled (empty chunks, orphaned continuations) - ✅ Backward compatibility maintained Fixes #31511. <!-- START COPILOT CODING AGENT TIPS --> --- 💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click [here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to start the survey. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com> Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-28 22:31:41 +00:00
Copilot	ad88e5aaec	fix(core): resolve cache validation error by safely converting Generation to ChatGeneration objects (#32156 ) ## Problem ChatLiteLLM encounters a `ValidationError` when using cache on subsequent calls, causing the following error: ``` ValidationError(model='ChatResult', errors=[{'loc': ('generations', 0, 'type'), 'msg': "unexpected value; permitted: 'ChatGeneration'", 'type': 'value_error.const', 'ctx': {'given': 'Generation', 'permitted': ('ChatGeneration',)}}]) ``` This occurs because: 1. The cache stores `Generation` objects (with `type="Generation"`) 2. But `ChatResult` expects `ChatGeneration` objects (with `type="ChatGeneration"` and a required `message` field) 3. When cached values are retrieved, validation fails due to the type mismatch ## Solution Added graceful handling in both sync (`_generate_with_cache`) and async (`_agenerate_with_cache`) cache methods to: 1. Detect when cached values contain `Generation` objects instead of expected `ChatGeneration` objects 2. Convert them to `ChatGeneration` objects by wrapping the text content in an `AIMessage` 3. Preserve all original metadata (`generation_info`) 4. Allow `ChatResult` creation to succeed without validation errors ## Example ```python # Before: This would fail with ValidationError from langchain_community.chat_models import ChatLiteLLM from langchain_community.cache import SQLiteCache from langchain.globals import set_llm_cache set_llm_cache(SQLiteCache(database_path="cache.db")) llm = ChatLiteLLM(model_name="openai/gpt-4o", cache=True, temperature=0) print(llm.predict("test")) # Works fine (cache empty) print(llm.predict("test")) # Now works instead of ValidationError # After: Seamlessly handles both Generation and ChatGeneration objects ``` ## Changes - `libs/core/langchain_core/language_models/chat_models.py`: - Added `Generation` import from `langchain_core.outputs` - Enhanced cache retrieval logic in `_generate_with_cache` and `_agenerate_with_cache` methods - Added conversion from `Generation` to `ChatGeneration` objects when needed - `libs/core/tests/unit_tests/language_models/chat_models/test_cache.py`: - Added test case to validate the conversion logic handles mixed object types ## Impact - Backward Compatible: Existing code continues to work unchanged - Minimal Change: Only affects cache retrieval path, no API changes - Robust: Handles both legacy cached `Generation` objects and new `ChatGeneration` objects - Preserves Data: All original content and metadata is maintained during conversion Fixes #22389. <!-- START COPILOT CODING AGENT TIPS --> --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com> Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-28 22:28:16 +00:00
Chester Curme	7166adce1f	Merge branch 'wip-v0.4' into standard_outputs_copy # Conflicts: # libs/core/langchain_core/messages/tool.py # libs/partners/openai/langchain_openai/chat_models/_compat.py # libs/partners/openai/langchain_openai/chat_models/base.py	2025-07-28 13:41:50 -04:00
ccurme	c55294ecb0	chore(core): add test for nested pydantic fields in schemas (#32285 )	2025-07-28 17:27:24 +00:00
Mason Daugherty	5e9eb19a83	chore: update branch with changes from master (#32277 ) Co-authored-by: Maxime Grenu <69890511+cluster2600@users.noreply.github.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: jmaillefaud <jonathan.maillefaud@evooq.ch> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: tanwirahmad <tanwirahmad@users.noreply.github.com> Co-authored-by: Christophe Bornet <cbornet@hotmail.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: niceg <79145285+growmuye@users.noreply.github.com> Co-authored-by: Chaitanya varma <varmac301@gmail.com> Co-authored-by: dishaprakash <57954147+dishaprakash@users.noreply.github.com> Co-authored-by: Chester Curme <chester.curme@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Kanav Bansal <13186335+bansalkanav@users.noreply.github.com> Co-authored-by: Aleksandr Filippov <71711753+alex-feel@users.noreply.github.com> Co-authored-by: Alex Feel <afilippov@spotware.com>	2025-07-28 10:39:41 -04:00
ccurme	3d9e694f73	feat(core): start on v1 chat model (#32276 ) Co-authored-by: Nuno Campos <nuno@langchain.dev>	2025-07-28 10:17:06 -04:00
Aleksandr Filippov	f0b6baa0ef	fix(core): track within-batch deduplication in indexing num_skipped count (#32273 ) Description: Fixes incorrect `num_skipped` count in the LangChain indexing API. The current implementation only counts documents that already exist in RecordManager (cross-batch duplicates) but fails to count documents removed during within-batch deduplication via `_deduplicate_in_order()`. This PR adds tracking of the original batch size before deduplication and includes the difference in `num_skipped`, ensuring that `num_added + num_skipped` equals the total number of input documents. Issue: Fixes incorrect document count reporting in indexing statistics Dependencies: None Fixes #32272 --------- Co-authored-by: Alex Feel <afilippov@spotware.com>	2025-07-28 09:58:51 -04:00
Mason Daugherty	96cbd90cba	fix: formatting issues in docstrings (#32265 ) Ensures proper reStructuredText formatting by adding the required blank line before closing docstring quotes, which resolves the "Block quote ends without a blank line; unexpected unindent" warning.	2025-07-27 23:37:47 -04:00
niceg	0d6f915442	fix: LLM mimicking Unicode responses due to forced Unicode conversion of non-ASCII characters. (#32222 ) fix: Fix LLM mimicking Unicode responses due to forced Unicode conversion of non-ASCII characters. - Description: This PR fixes an issue where the LLM would mimic Unicode responses due to forced Unicode conversion of non-ASCII characters in tool calls. The fix involves disabling the `ensure_ascii` flag in `json.dumps()` when converting tool calls to OpenAI format. - Issue: Fixes ↓↓↓ input： ```json {'role': 'assistant', 'tool_calls': [{'type': 'function', 'id': 'call_nv9trcehdpihr21zj9po19vq', 'function': {'name': 'create_customer', 'arguments': '{"customer_name": "你好啊集团"}'}}]} ``` output: ```json {'role': 'assistant', 'tool_calls': [{'type': 'function', 'id': 'call_nv9trcehdpihr21zj9po19vq', 'function': {'name': 'create_customer', 'arguments': '{"customer_name": "\\u4f60\\u597d\\u554a\\u96c6\\u56e2"}'}}]} ``` then: llm will mimic outputting unicode. Unicode's vast number of symbols can lengthen LLM responses, leading to slower performance. <img width="686" height="277" alt="image" src="https://github.com/user-attachments/assets/28f3b007-3964-4455-bee2-68f86ac1906d" /> --------- Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-24 17:01:31 -04:00
ccurme	e9b0b84675	feat: new message formats (v0.4) (#32208 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-07-23 13:30:21 -04:00
Chester Curme	78d036a093	Merge branch 'wip-v0.4' into standard_outputs	2025-07-23 09:34:20 -04:00
Chester Curme	6572656cd2	core: support both old and new data content blocks	2025-07-22 18:19:09 -04:00
Chester Curme	b1a02f971b	fix tests	2025-07-22 16:45:19 -04:00
ccurme	8acfd677bc	fix(core): add type key when tracing in some cases (#31825 )	2025-07-22 18:08:16 +00:00
Mason Daugherty	b24f90dabe	refactor(core): standard content blocks (#32085 )	2025-07-22 09:17:55 -04:00
Copilot	18c64aed6d	feat(core): add `sanitize_for_postgres` utility to fix PostgreSQL NUL byte DataError (#32157 ) This PR fixes the PostgreSQL NUL byte issue that causes `psycopg.DataError` when inserting documents containing `\x00` bytes into PostgreSQL-based vector stores. ## Problem PostgreSQL text fields cannot contain NUL (0x00) bytes. When documents with such characters are processed by PGVector or langchain-postgres implementations, they fail with: ``` (psycopg.DataError) PostgreSQL text fields cannot contain NUL (0x00) bytes ``` This commonly occurs when processing PDFs, documents from various loaders, or text extracted by libraries like unstructured that may contain embedded NUL bytes. ## Solution Added `sanitize_for_postgres()` utility function to `langchain_core.utils.strings` that removes or replaces NUL bytes from text content. ### Key Features - Simple API: `sanitize_for_postgres(text, replacement="")` - Configurable: Replace NUL bytes with empty string (default) or space for readability - Comprehensive: Handles all problematic examples from the original issue - Well-tested: Complete unit tests with real-world examples - Backward compatible: No breaking changes, purely additive ### Usage Example ```python from langchain_core.utils import sanitize_for_postgres from langchain_core.documents import Document # Before: This would fail with DataError problematic_content = "Getting\x00Started with embeddings" # After: Clean the content before database insertion clean_content = sanitize_for_postgres(problematic_content) # Result: "GettingStarted with embeddings" # Or preserve readability with spaces readable_content = sanitize_for_postgres(problematic_content, " ") # Result: "Getting Started with embeddings" # Use in Document processing doc = Document(page_content=clean_content, metadata={...}) ``` ### Integration Pattern PostgreSQL vector store implementations should sanitize content before insertion: ```python def add_documents(self, documents: List[Document]) -> List[str]: # Sanitize documents before insertion sanitized_docs = [] for doc in documents: sanitized_content = sanitize_for_postgres(doc.page_content, " ") sanitized_doc = Document( page_content=sanitized_content, metadata=doc.metadata, id=doc.id ) sanitized_docs.append(sanitized_doc) return self._insert_documents_to_db(sanitized_docs) ``` ## Changes Made - Added `sanitize_for_postgres()` function in `langchain_core/utils/strings.py` - Updated `langchain_core/utils/__init__.py` to export the new function - Added comprehensive unit tests in `tests/unit_tests/utils/test_strings.py` - Validated against all examples from the original issue report ## Testing All tests pass, including: - Basic NUL byte removal and replacement - Multiple consecutive NUL bytes - Empty string handling - Real examples from the GitHub issue - Backward compatibility with existing string utilities This utility enables PostgreSQL integrations in both langchain-community and langchain-postgres packages to handle documents with NUL bytes reliably. Fixes #26033. <!-- START COPILOT CODING AGENT TIPS --> --- 💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click [here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to start the survey. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com> Co-authored-by: Mason Daugherty <github@mdrxy.com>	2025-07-21 20:33:20 -04:00
Mohammad Mohtashim	095f4a7c28	fix(core): fix `parse_result`in case of self.first_tool_only with multiple keys matching for JsonOutputKeyToolsParser (#32106 ) * Description: Updated `parse_result` logic to handle cases where `self.first_tool_only` is `True` and multiple matching keys share the same function name. Instead of returning the first match prematurely, the method now prioritizes filtering results by the specified key to ensure correct selection. * Issue: #32100 --------- Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-21 12:50:22 -04:00
Isaac Francisco	98bfd57a76	fix(core): better error message for empty var names (#32073 ) Previously, we hit an index out of range error with empty variable names (accessing tag[0]), now we through a slightly nicer error --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-18 17:00:02 -04:00

1 2 3 4 5 ...

608 Commits