Ensures proper reStructuredText formatting by adding the required blank
line before closing docstring quotes, which resolves the "Block quote
ends without a blank line; unexpected unindent" warning.
**TL;DR much of the provided `Makefile` targets were broken, and any
time I wanted to preview changes locally I either had to refer to a
command Chester gave me or try waiting on a Vercel preview deployment.
With this PR, everything should behave like normal.**
Significant updates to the `Makefile` and documentation files, focusing
on improving usability, adding clear messaging, and fixing/enhancing
documentation workflows.
### Updates to `Makefile`:
#### Enhanced build and cleaning processes:
- Added informative messages (e.g., "📚 Building LangChain
documentation...") to makefile targets like `docs_build`, `docs_clean`,
and `api_docs_build` for better user feedback during execution.
- Introduced a `clean-cache` target to the `docs` `Makefile` to clear
cached dependencies and ensure clean builds.
#### Improved dependency handling:
- Modified `install-py-deps` to create a `.venv/deps_installed` marker,
preventing redundant/duplicate dependency installations and improving
efficiency.
#### Streamlined file generation and infrastructure setup:
- Added caching for the LangServe README download and parallelized
feature table generation
- Added user-friendly completion messages for targets like `copy-infra`
and `render`.
#### Documentation server updates:
- Enhanced the `start` target with messages indicating server start and
URL for local documentation viewing.
---
### Documentation Improvements:
#### Content clarity and consistency:
- Standardized section titles for consistency across documentation
files.
[[1]](diffhunk://#diff-9b1a85ea8a9dcf79f58246c88692cd7a36316665d7e05a69141cfdc50794c82aL1-R1)
[[2]](diffhunk://#diff-944008ad3a79d8a312183618401fcfa71da0e69c75803eff09b779fc8e03183dL1-R1)
- Refined phrasing and formatting in sections like "Dependency
management" and "Formatting and linting" for better readability.
[[1]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L6-R6)
[[2]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L84-R82)
#### Enhanced workflows:
- Updated instructions for building and viewing documentation locally,
including tips for specifying server ports and handling API reference
previews.
[[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L60-R94)
[[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L82-R126)
- Expanded guidance on cleaning documentation artifacts and using
linting tools effectively.
[[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L82-R126)
[[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L107-R142)
#### API reference documentation:
- Improved instructions for generating and formatting in-code
documentation, highlighting best practices for docstring writing.
[[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L107-R142)
[[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L144-R186)
---
### Minor Changes:
- Added support for a new package name (`langchain_v1`) in the API
documentation generation script.
- Fixed minor capitalization and formatting issues in documentation
files.
[[1]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L40-R40)
[[2]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L166-R160)
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This PR addresses the common issue where users struggle to pass custom
parameters to OpenAI-compatible APIs like LM Studio, vLLM, and others.
The problem occurs when users try to use `model_kwargs` for custom
parameters, which causes API errors.
## Problem
Users attempting to pass custom parameters (like LM Studio's `ttl`
parameter) were getting errors:
```python
# ❌ This approach fails
llm = ChatOpenAI(
base_url="http://localhost:1234/v1",
model="mlx-community/QwQ-32B-4bit",
model_kwargs={"ttl": 5} # Causes TypeError: unexpected keyword argument 'ttl'
)
```
## Solution
The `extra_body` parameter is the correct way to pass custom parameters
to OpenAI-compatible APIs:
```python
# ✅ This approach works correctly
llm = ChatOpenAI(
base_url="http://localhost:1234/v1",
model="mlx-community/QwQ-32B-4bit",
extra_body={"ttl": 5} # Custom parameters go in extra_body
)
```
## Changes Made
1. **Enhanced Documentation**: Updated the `extra_body` parameter
docstring with comprehensive examples for LM Studio, vLLM, and other
providers
2. **Added Documentation Section**: Created a new "OpenAI-compatible
APIs" section in the main class docstring with practical examples
3. **Unit Tests**: Added tests to verify `extra_body` functionality
works correctly:
- `test_extra_body_parameter()`: Verifies custom parameters are included
in request payload
- `test_extra_body_with_model_kwargs()`: Ensures `extra_body` and
`model_kwargs` work together
4. **Clear Guidance**: Documented when to use `extra_body` vs
`model_kwargs`
## Examples Added
**LM Studio with TTL (auto-eviction):**
```python
ChatOpenAI(
base_url="http://localhost:1234/v1",
api_key="lm-studio",
model="mlx-community/QwQ-32B-4bit",
extra_body={"ttl": 300} # Auto-evict after 5 minutes
)
```
**vLLM with custom sampling:**
```python
ChatOpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY",
model="meta-llama/Llama-2-7b-chat-hf",
extra_body={
"use_beam_search": True,
"best_of": 4
}
)
```
## Why This Works
- `model_kwargs` parameters are passed directly to the OpenAI client's
`create()` method, causing errors for non-standard parameters
- `extra_body` parameters are included in the HTTP request body, which
is exactly what OpenAI-compatible APIs expect for custom parameters
Fixes#32115.
<!-- START COPILOT CODING AGENT TIPS -->
---
💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
## Summary
- Fixed redundant word "done" in SECURITY.md line 69
- Fixed grammar errors in Fireworks README.md line 77: "how it fares
compares" → "how it compares" and "in terms just" → "in terms of"
## Test plan
- [x] Verified changes improve readability and correct grammar
- [x] No functional changes, documentation only
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Multiple models were
[retired](https://docs.anthropic.com/en/docs/about-claude/model-deprecations#model-status)
yesterday.
Tests remain broken until we figure out what to do with the legacy
Anthropic LLM integration— currently uses their (legacy) text
completions API, for which there appear to be no remaining supported
models.
* Adding support for more Chroma client options (`HttpClient` and
`CloundClient`). This includes adding arguments necessary for
instantiating these clients.
* Adding support for Chroma's new persisted collection configuration (we
moved index configuration into this new construct).
* Delegate `Settings` configuration to Chroma's client constructors.
## Problem
When using `ChatOllama` with `create_react_agent`, agents would
sometimes terminate prematurely with empty responses when Ollama
returned `done_reason: 'load'` responses with no content. This caused
agents to return empty `AIMessage` objects instead of actual generated
text.
```python
from langchain_ollama import ChatOllama
from langgraph.prebuilt import create_react_agent
from langchain_core.messages import HumanMessage
llm = ChatOllama(model='qwen2.5:7b', temperature=0)
agent = create_react_agent(model=llm, tools=[])
result = agent.invoke(HumanMessage('Hello'), {"configurable": {"thread_id": "1"}})
# Before fix: AIMessage(content='', response_metadata={'done_reason': 'load'})
# Expected: AIMessage with actual generated content
```
## Root Cause
The `_iterate_over_stream` and `_aiterate_over_stream` methods treated
any response with `done: True` as final, regardless of `done_reason`.
When Ollama returns `done_reason: 'load'` with empty content, it
indicates the model was loaded but no actual generation occurred - this
should not be considered a complete response.
## Solution
Modified the streaming logic to skip responses when:
- `done: True`
- `done_reason: 'load'`
- Content is empty or contains only whitespace
This ensures agents only receive actual generated content while
preserving backward compatibility for load responses that do contain
content.
## Changes
- **`_iterate_over_stream`**: Skip empty load responses instead of
yielding them
- **`_aiterate_over_stream`**: Apply same fix to async streaming
- **Tests**: Added comprehensive test cases covering all edge cases
## Testing
All scenarios now work correctly:
- ✅ Empty load responses are skipped (fixes original issue)
- ✅ Load responses with actual content are preserved (backward
compatibility)
- ✅ Normal stop responses work unchanged
- ✅ Streaming behavior preserved
- ✅ `create_react_agent` integration fixed
Fixes#31482.
<!-- START COPILOT CODING AGENT TIPS -->
---
💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
**Description:**
This PR makes argument parsing for Ollama tool calls more robust. Some
LLMs—including Ollama—may return arguments as Python-style dictionaries
with single quotes (e.g., `{'a': 1}`), which are not valid JSON and
previously caused parsing to fail.
The updated `_parse_json_string` method in
`langchain_ollama.chat_models` now attempts standard JSON parsing and,
if that fails, falls back to `ast.literal_eval` for safe evaluation of
Python-style dictionaries. This improves interoperability with LLMs and
fixes a common usability issue for tool-based agents.
**Issue:**
Closes#30910
**Dependencies:**
None
**Tests:**
- Added new unit tests for double-quoted JSON, single-quoted dicts,
mixed quoting, and malformed/failure cases.
- All tests pass locally, including new coverage for single-quoted
inputs.
**Notes:**
- No breaking changes.
- No new dependencies introduced.
- Code is formatted and linted (`ruff format`, `ruff check`).
- If maintainers have suggestions for further improvements, I’m happy to
revise!
Thank you for maintaining LangChain! Looking forward to your feedback.
The `num_gpu` parameter in `OllamaEmbeddings` was not being passed to
the Ollama client in the async embedding method, causing GPU
acceleration settings to be ignored when using async operations.
## Problem
The issue was in the `aembed_documents` method where the `options`
parameter (containing `num_gpu` and other configuration) was missing:
```python
# Sync method (working correctly)
return self._client.embed(
self.model, texts, options=self._default_params, keep_alive=self.keep_alive
)["embeddings"]
# Async method (missing options parameter)
return (
await self._async_client.embed(
self.model, texts, keep_alive=self.keep_alive # ❌ No options!
)
)["embeddings"]
```
This meant that when users specified `num_gpu=4` (or any other GPU
configuration), it would work with sync calls but be ignored with async
calls.
## Solution
Added the missing `options=self._default_params` parameter to the async
embed call to match the sync version:
```python
# Fixed async method
return (
await self._async_client.embed(
self.model,
texts,
options=self._default_params, # ✅ Now includes num_gpu!
keep_alive=self.keep_alive,
)
)["embeddings"]
```
## Validation
- ✅ Added unit test to verify options are correctly passed in both sync
and async methods
- ✅ All existing tests continue to pass
- ✅ Manual testing confirms `num_gpu` parameter now works correctly
- ✅ Code passes linting and formatting checks
The fix ensures that GPU configuration works consistently across both
synchronous and asynchronous embedding operations.
Fixes#32059.
<!-- START COPILOT CODING AGENT TIPS -->
---
💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Description
The Perplexity chat model already returns a search_results field, but
LangChain dropped it when mapping Perplexity responses to
additional_kwargs.
This patch adds "search_results" to the allowed attribute lists in both
_stream and _generate, so downstream code can access it just like
images, citations, or related_questions.
Dependencies
None. The change is purely internal; no new imports or optional
dependencies required.
https://community.perplexity.ai/t/new-feature-search-results-field-with-richer-metadata/398
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
## Description
When ChatDeepSeek invokes a tool that returns a list, it results in an
openai.UnprocessableEntityError due to a failure in deserializing the
JSON body.
The root of the problem is that ChatDeepSeek uses BaseChatOpenAI
internally, but the APIs are not identical: OpenAI v1/chat/completions
accepts arrays as tool results, but Deepseek API does not.
As a solution added `_get_request_payload` method to ChatDeepSeek, which
inherits the behavior from BaseChatOpenAI but adds a step to stringify
tool message content in case the content is an array. I also add a unit
test for this.
From the linked issue you can find the full reproducible example the
reporter of the issue provided. After the changes it works as expected.
Source: [Deepseek
docs](https://api-docs.deepseek.com/api/create-chat-completion/)

Source: [OpenAI
docs](https://platform.openai.com/docs/api-reference/chat/create)

## Issue
Fixes#31394
## Dependencies:
No new dependencies.
## Twitter handle:
Don't have one.
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
* update model validation due to change in [Ollama
client](https://github.com/ollama/ollama) - ensure you are running the
latest version (0.9.6) to use `validate_model_on_init`
* add code example and fix formatting for ChatOllama reasoning
* ensure that setting `reasoning` in invocation kwargs overrides
class-level setting
* tests