**Description:** fix an issue I discovered when attempting to merge
messages in which one message has an `index` key in its content
dictionary and another does not.
Fixes a streaming bug where models like Qwen3 (using OpenAI interface)
send tool call chunks with inconsistent indices, resulting in
duplicate/erroneous tool calls instead of a single merged tool call.
## Problem
When Qwen3 streams tool calls, it sends chunks with inconsistent `index`
values:
- First chunk: `index=1` with tool name and partial arguments
- Subsequent chunks: `index=0` with `name=None`, `id=None` and argument
continuation
The existing `merge_lists` function only merges chunks when their
`index` values match exactly, causing these logically related chunks to
remain separate, resulting in multiple incomplete tool calls instead of
one complete tool call.
```python
# Before fix: Results in 1 valid + 1 invalid tool call
chunk1 = AIMessageChunk(tool_call_chunks=[
{"name": "search", "args": '{"query":', "id": "call_123", "index": 1}
])
chunk2 = AIMessageChunk(tool_call_chunks=[
{"name": None, "args": ' "test"}', "id": None, "index": 0}
])
merged = chunk1 + chunk2 # Creates 2 separate tool calls
# After fix: Results in 1 complete tool call
merged = chunk1 + chunk2 # Creates 1 merged tool call: search({"query": "test"})
```
## Solution
Enhanced the `merge_lists` function in `langchain_core/utils/_merge.py`
with intelligent tool call chunk merging:
1. **Preserves existing behavior**: Same-index chunks still merge as
before
2. **Adds special handling**: Tool call chunks with
`name=None`/`id=None` that don't match any existing index are now merged
with the most recent complete tool call chunk
3. **Maintains backward compatibility**: All existing functionality
works unchanged
4. **Targeted fix**: Only affects tool call chunks, doesn't change
behavior for other list items
The fix specifically handles the pattern where:
- A continuation chunk has `name=None` and `id=None` (indicating it's
part of an ongoing tool call)
- No matching index is found in existing chunks
- There exists a recent tool call chunk with a valid name or ID to merge
with
## Testing
Added comprehensive test coverage including:
- ✅ Qwen3-style chunks with different indices now merge correctly
- ✅ Existing same-index behavior preserved
- ✅ Multiple distinct tool calls remain separate
- ✅ Edge cases handled (empty chunks, orphaned continuations)
- ✅ Backward compatibility maintained
Fixes#31511.
<!-- START COPILOT CODING AGENT TIPS -->
---
💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Ensures proper reStructuredText formatting by adding the required blank
line before closing docstring quotes, which resolves the "Block quote
ends without a blank line; unexpected unindent" warning.
**TL;DR much of the provided `Makefile` targets were broken, and any
time I wanted to preview changes locally I either had to refer to a
command Chester gave me or try waiting on a Vercel preview deployment.
With this PR, everything should behave like normal.**
Significant updates to the `Makefile` and documentation files, focusing
on improving usability, adding clear messaging, and fixing/enhancing
documentation workflows.
### Updates to `Makefile`:
#### Enhanced build and cleaning processes:
- Added informative messages (e.g., "📚 Building LangChain
documentation...") to makefile targets like `docs_build`, `docs_clean`,
and `api_docs_build` for better user feedback during execution.
- Introduced a `clean-cache` target to the `docs` `Makefile` to clear
cached dependencies and ensure clean builds.
#### Improved dependency handling:
- Modified `install-py-deps` to create a `.venv/deps_installed` marker,
preventing redundant/duplicate dependency installations and improving
efficiency.
#### Streamlined file generation and infrastructure setup:
- Added caching for the LangServe README download and parallelized
feature table generation
- Added user-friendly completion messages for targets like `copy-infra`
and `render`.
#### Documentation server updates:
- Enhanced the `start` target with messages indicating server start and
URL for local documentation viewing.
---
### Documentation Improvements:
#### Content clarity and consistency:
- Standardized section titles for consistency across documentation
files.
[[1]](diffhunk://#diff-9b1a85ea8a9dcf79f58246c88692cd7a36316665d7e05a69141cfdc50794c82aL1-R1)
[[2]](diffhunk://#diff-944008ad3a79d8a312183618401fcfa71da0e69c75803eff09b779fc8e03183dL1-R1)
- Refined phrasing and formatting in sections like "Dependency
management" and "Formatting and linting" for better readability.
[[1]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L6-R6)
[[2]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L84-R82)
#### Enhanced workflows:
- Updated instructions for building and viewing documentation locally,
including tips for specifying server ports and handling API reference
previews.
[[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L60-R94)
[[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L82-R126)
- Expanded guidance on cleaning documentation artifacts and using
linting tools effectively.
[[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L82-R126)
[[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L107-R142)
#### API reference documentation:
- Improved instructions for generating and formatting in-code
documentation, highlighting best practices for docstring writing.
[[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L107-R142)
[[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L144-R186)
---
### Minor Changes:
- Added support for a new package name (`langchain_v1`) in the API
documentation generation script.
- Fixed minor capitalization and formatting issues in documentation
files.
[[1]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L40-R40)
[[2]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L166-R160)
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The `_dereference_refs_helper` in `langchain_core.utils.json_schema`
incorrectly handled objects with a reference and other fields.
**Issue**: #32170
# Description
We change the check so that it accepts other keys in the object.
This PR fixes the PostgreSQL NUL byte issue that causes
`psycopg.DataError` when inserting documents containing `\x00` bytes
into PostgreSQL-based vector stores.
## Problem
PostgreSQL text fields cannot contain NUL (0x00) bytes. When documents
with such characters are processed by PGVector or langchain-postgres
implementations, they fail with:
```
(psycopg.DataError) PostgreSQL text fields cannot contain NUL (0x00) bytes
```
This commonly occurs when processing PDFs, documents from various
loaders, or text extracted by libraries like unstructured that may
contain embedded NUL bytes.
## Solution
Added `sanitize_for_postgres()` utility function to
`langchain_core.utils.strings` that removes or replaces NUL bytes from
text content.
### Key Features
- **Simple API**: `sanitize_for_postgres(text, replacement="")`
- **Configurable**: Replace NUL bytes with empty string (default) or
space for readability
- **Comprehensive**: Handles all problematic examples from the original
issue
- **Well-tested**: Complete unit tests with real-world examples
- **Backward compatible**: No breaking changes, purely additive
### Usage Example
```python
from langchain_core.utils import sanitize_for_postgres
from langchain_core.documents import Document
# Before: This would fail with DataError
problematic_content = "Getting\x00Started with embeddings"
# After: Clean the content before database insertion
clean_content = sanitize_for_postgres(problematic_content)
# Result: "GettingStarted with embeddings"
# Or preserve readability with spaces
readable_content = sanitize_for_postgres(problematic_content, " ")
# Result: "Getting Started with embeddings"
# Use in Document processing
doc = Document(page_content=clean_content, metadata={...})
```
### Integration Pattern
PostgreSQL vector store implementations should sanitize content before
insertion:
```python
def add_documents(self, documents: List[Document]) -> List[str]:
# Sanitize documents before insertion
sanitized_docs = []
for doc in documents:
sanitized_content = sanitize_for_postgres(doc.page_content, " ")
sanitized_doc = Document(
page_content=sanitized_content,
metadata=doc.metadata,
id=doc.id
)
sanitized_docs.append(sanitized_doc)
return self._insert_documents_to_db(sanitized_docs)
```
## Changes Made
- Added `sanitize_for_postgres()` function in
`langchain_core/utils/strings.py`
- Updated `langchain_core/utils/__init__.py` to export the new function
- Added comprehensive unit tests in
`tests/unit_tests/utils/test_strings.py`
- Validated against all examples from the original issue report
## Testing
All tests pass, including:
- Basic NUL byte removal and replacement
- Multiple consecutive NUL bytes
- Empty string handling
- Real examples from the GitHub issue
- Backward compatibility with existing string utilities
This utility enables PostgreSQL integrations in both langchain-community
and langchain-postgres packages to handle documents with NUL bytes
reliably.
Fixes#26033.
<!-- START COPILOT CODING AGENT TIPS -->
---
💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Previously, we hit an index out of range error with empty variable names
(accessing tag[0]), now we through a slightly nicer error
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Fixes#32042
## Summary
Fixes a critical bug in JSON Schema reference resolution that prevented
correctly dereferencing numeric components in JSON pointer paths,
specifically for list indices in `anyOf`, `oneOf`, and `allOf` arrays.
## Changes
- Fixed `_retrieve_ref` function in
`libs/core/langchain_core/utils/json_schema.py` to properly handle
numeric components
- Added comprehensive test function `test_dereference_refs_list_index()`
in `libs/core/tests/unit_tests/utils/test_json_schema.py`
- Resolved line length formatting issues
- Improved type checking and index validation for list and dictionary
references
## Key Improvements
- Correctly handles list index references in JSON pointer paths
- Maintains backward compatibility with existing dictionary numeric key
functionality
- Adds robust error handling for out-of-bounds and invalid indices
- Passes all test cases covering various reference scenarios
## Test Coverage
- Verified fix for `#/properties/payload/anyOf/1/properties/startDate`
reference
- Tested edge cases including out-of-bounds and negative indices
- Ensured no regression in existing reference resolution functionality
Resolves the reported issue with JSON Schema reference dereferencing for
list indices.
---------
Co-authored-by: open-swe-dev[bot] <open-swe-dev@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
* Simplified Pydantic handling since Pydantic v1 is not supported
anymore.
* Replace use of deprecated v1 methods by corresponding v2 methods.
* Remove use of other deprecated methods.
* Activate mypy errors on deprecated methods use.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Add image generation tool to the list of well known tools. This is needed for changes in the ChatOpenAI client.
TODO: Some of this logic needs to be moved from core directly into the client as changes in core should not be required to add a new tool to the openai chat client.
**Issue:**[
#309070](https://github.com/langchain-ai/langchain/issues/30970)
**Cause**
Arg type in python code
```
arg: Union[SubSchema1, SubSchema2]
```
is translated to `anyOf` in **json schema**
```
"anyOf" : [{sub schema 1 ...}, {sub schema 1 ...}]
```
The value of anyOf is a list sub schemas.
The bug is caused since the sub schemas inside `anyOf` list is not taken
care of.
The location where the issue happens is `convert_to_openai_function`
function -> `_recursive_set_additional_properties_false` function, that
recursively adds `"additionalProperties": false` to json schema which is
[required by OpenAI's strict function
calling](https://platform.openai.com/docs/guides/structured-outputs?api-mode=responses#additionalproperties-false-must-always-be-set-in-objects)
**Solution:**
This PR fixes this issue by iterating each sub schema inside `anyOf`
list.
A unit test is added.
**Twitter handle:** shengboma
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Most easily reviewed with the "hide whitespace" option toggled.
Seeing 10-50% speed ups in import time for common structures 🚀
The general purpose of this PR is to lazily import structures within
`langchain_core.XXX_module.__init__.py` so that we're not eagerly
importing expensive dependencies (`pydantic`, `requests`, etc).
Analysis of flamegraphs generated with `importtime` motivated these
changes. For example, the one below demonstrates that importing
`HumanMessage` accidentally triggered imports for `importlib.metadata`,
`requests`, etc.
There's still much more to do on this front, and we can start digging
into our own internal code for optimizations now that we're less
concerned about external imports.
<img width="1210" alt="Screenshot 2025-04-11 at 1 10 54 PM"
src="https://github.com/user-attachments/assets/112a3fe7-24a9-4294-92c1-d5ae64df839e"
/>
I've tracked the improvements with some local benchmarks:
## `pytest-benchmark` results
| Name | Before (s) | After (s) | Delta (s) | % Change |
|-----------------------------|------------|-----------|-----------|----------|
| Document | 2.8683 | 1.2775 | -1.5908 | -55.46% |
| HumanMessage | 2.2358 | 1.1673 | -1.0685 | -47.79% |
| ChatPromptTemplate | 5.5235 | 2.9709 | -2.5526 | -46.22% |
| Runnable | 2.9423 | 1.7793 | -1.163 | -39.53% |
| InMemoryVectorStore | 3.1180 | 1.8417 | -1.2763 | -40.93% |
| RunnableLambda | 2.7385 | 1.8745 | -0.864 | -31.55% |
| tool | 5.1231 | 4.0771 | -1.046 | -20.42% |
| CallbackManager | 4.2263 | 3.4099 | -0.8164 | -19.32% |
| LangChainTracer | 3.8394 | 3.3101 | -0.5293 | -13.79% |
| BaseChatModel | 4.3317 | 3.8806 | -0.4511 | -10.41% |
| PydanticOutputParser | 3.2036 | 3.2995 | 0.0959 | 2.99% |
| InMemoryRateLimiter | 0.5311 | 0.5995 | 0.0684 | 12.88% |
Note the lack of change for `InMemoryRateLimiter` and
`PydanticOutputParser` is just random noise, I'm getting comparable
numbers locally.
## Local CodSpeed results
We're still working on configuring CodSpeed on CI. The local usage
produced similar results.
Looks like `pyupgrade` was already used here but missed some docs and
tests.
This helps to keep our docs looking professional and up to date.
Eventually, we should lint / format our inline docs.
This pull request includes various changes to the `langchain_core`
library, focusing on improving compatibility with different versions of
Pydantic. The primary change involves replacing checks for Pydantic
major versions with boolean flags, which simplifies the code and
improves readability.
This also solves ruff rule checks for
[RUF048](https://docs.astral.sh/ruff/rules/map-int-version-parsing/) and
[PLR2004](https://docs.astral.sh/ruff/rules/magic-value-comparison/).
Key changes include:
### Compatibility Improvements:
*
[`libs/core/langchain_core/output_parsers/json.py`](diffhunk://#diff-5add0cf7134636ae4198a1e0df49ee332ae0c9123c3a2395101e02687c717646L22-R24):
Replaced `PYDANTIC_MAJOR_VERSION` with `IS_PYDANTIC_V1` to check for
Pydantic version 1.
*
[`libs/core/langchain_core/output_parsers/pydantic.py`](diffhunk://#diff-2364b5b4aee01c462aa5dbda5dc3a877dcd20f29df173ad540dc8adf8b192361L14-R14):
Updated version checks from `PYDANTIC_MAJOR_VERSION` to `IS_PYDANTIC_V2`
in the `PydanticOutputParser` class.
[[1]](diffhunk://#diff-2364b5b4aee01c462aa5dbda5dc3a877dcd20f29df173ad540dc8adf8b192361L14-R14)
[[2]](diffhunk://#diff-2364b5b4aee01c462aa5dbda5dc3a877dcd20f29df173ad540dc8adf8b192361L27-R27)
### Utility Enhancements:
*
[`libs/core/langchain_core/utils/pydantic.py`](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896R23):
Introduced `IS_PYDANTIC_V1` and `IS_PYDANTIC_V2` flags and deprecated
the `get_pydantic_major_version` function. Updated various functions to
use these flags instead of version numbers.
[[1]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896R23)
[[2]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896R42-R78)
[[3]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L90-R89)
[[4]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L104-R101)
[[5]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L120-R122)
[[6]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L135-R132)
[[7]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L149-R151)
[[8]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L164-R161)
[[9]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L248-R250)
[[10]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L330-R335)
[[11]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L356-R357)
[[12]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L393-R390)
[[13]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L403-R400)
### Test Updates:
*
[`libs/core/tests/unit_tests/output_parsers/test_openai_tools.py`](diffhunk://#diff-694cc0318edbd6bbca34f53304934062ad59ba9f5a788252ce6c5f5452489d67L19-R22):
Updated tests to use `IS_PYDANTIC_V1` and `IS_PYDANTIC_V2` for version
checks.
[[1]](diffhunk://#diff-694cc0318edbd6bbca34f53304934062ad59ba9f5a788252ce6c5f5452489d67L19-R22)
[[2]](diffhunk://#diff-694cc0318edbd6bbca34f53304934062ad59ba9f5a788252ce6c5f5452489d67L532-R535)
[[3]](diffhunk://#diff-694cc0318edbd6bbca34f53304934062ad59ba9f5a788252ce6c5f5452489d67L567-R570)
[[4]](diffhunk://#diff-694cc0318edbd6bbca34f53304934062ad59ba9f5a788252ce6c5f5452489d67L602-R605)
*
[`libs/core/tests/unit_tests/prompts/test_chat.py`](diffhunk://#diff-3e60e744842086a4f3c4b21bc83e819c3435720eab210078e77e2430fb8c7e84R7):
Replaced version tuple checks with `PYDANTIC_VERSION` comparisons.
[[1]](diffhunk://#diff-3e60e744842086a4f3c4b21bc83e819c3435720eab210078e77e2430fb8c7e84R7)
[[2]](diffhunk://#diff-3e60e744842086a4f3c4b21bc83e819c3435720eab210078e77e2430fb8c7e84L35-R38)
[[3]](diffhunk://#diff-3e60e744842086a4f3c4b21bc83e819c3435720eab210078e77e2430fb8c7e84L924-R927)
[[4]](diffhunk://#diff-3e60e744842086a4f3c4b21bc83e819c3435720eab210078e77e2430fb8c7e84L935-R938)
*
[`libs/core/tests/unit_tests/runnables/test_graph.py`](diffhunk://#diff-99a290330ef40103d0ce02e52e21310d6fadea142bfdea13c94d23fc81c0bb5dR3):
Simplified version checks using `PYDANTIC_VERSION`.
[[1]](diffhunk://#diff-99a290330ef40103d0ce02e52e21310d6fadea142bfdea13c94d23fc81c0bb5dR3)
[[2]](diffhunk://#diff-99a290330ef40103d0ce02e52e21310d6fadea142bfdea13c94d23fc81c0bb5dL15-R18)
[[3]](diffhunk://#diff-99a290330ef40103d0ce02e52e21310d6fadea142bfdea13c94d23fc81c0bb5dL234-L239)
*
[`libs/core/tests/unit_tests/runnables/test_runnable.py`](diffhunk://#diff-06bed920c0dad0cfd41d57a8d9e47a7b56832409649c10151061a791860d5bb5L18-R20):
Introduced `PYDANTIC_VERSION_AT_LEAST_29` and
`PYDANTIC_VERSION_AT_LEAST_210` for more readable version checks.
[[1]](diffhunk://#diff-06bed920c0dad0cfd41d57a8d9e47a7b56832409649c10151061a791860d5bb5L18-R20)
[[2]](diffhunk://#diff-06bed920c0dad0cfd41d57a8d9e47a7b56832409649c10151061a791860d5bb5L92-R99)
[[3]](diffhunk://#diff-06bed920c0dad0cfd41d57a8d9e47a7b56832409649c10151061a791860d5bb5L230-R233)
[[4]](diffhunk://#diff-06bed920c0dad0cfd41d57a8d9e47a7b56832409649c10151061a791860d5bb5L652-R655)
See https://docs.astral.sh/ruff/rules/#flake8-type-checking-tc
Some fixes done for TC001,TC002 and TC003 but these rules are excluded
since they don't play well with Pydantic.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>