The old `before_model_jump_to` classvar approach was quite clunky, this
is nicer imo and easier to document. Also moving from `jump_to` to
`can_jump_to` which is more idiomatic.
Before:
```py
class MyMiddleware(AgentMiddleware):
before_model_jump_to: ClassVar[list[JumpTo]] = ["end"]
def before_model(state, runtime) -> dict[str, Any]:
return {"jump_to": "end"}
```
After
```py
class MyMiddleware(AgentMiddleware):
@hook_config(can_jump_to=["end"])
def before_model(state, runtime) -> dict[str, Any]:
return {"jump_to": "end"}
```
This makes branching **much** more simple internally and helps greatly
w/ type safety for users. It just allows for one signature on hooks
instead of multiple.
Opened after https://github.com/langchain-ai/langchain/pull/33164
ballooned more than expected, w/ branching for:
* sync vs async
* runtime vs no runtime (this is self imposed)
**This also removes support for nodes w/o `runtime` in the signature.**
We can always go back and add support for nodes w/o `runtime`.
I think @christian-bromann's idea to re-export `runtime` from
langchain's agents might make sense due to the abundance of imports
here.
Check out the value of the change based on this diff:
https://github.com/langchain-ai/langchain/pull/33176
The async embed function does not properly handle HTTP errors.
For instance with large batches, Mistral AI returns `Too many inputs in
request, split into more batches.` in a 400 error.
This leads to a KeyError in `response.json()["data"]` l.288
This PR fixes the issue by:
- calling `response.raise_for_status()` before returning
- adding a retry similarly to what is done in the synchronous
counterpart `embed_documents`
I also added an integration test, but willing to move it to unit tests
if more relevant.
- **Description:** Changing the key from `response` to
`structured_response` for middleware agent to keep it sync with agent
without middleware. This a breaking change.
- **Issue:** #33154
Porting the [planning
middleware](39c0138d0f/src/deepagents/middleware.py (L21))
over from deepagents.
Also adding the ability to configure:
* System prompt
* Tool description
```py
from langchain.agents.middleware.planning import PlanningMiddleware
from langchain.agents import create_agent
agent = create_agent("openai:gpt-4o", middleware=[PlanningMiddleware()])
result = await agent.invoke({"messages": [HumanMessage("Help me refactor my codebase")]})
print(result["todos"]) # Array of todo items with status tracking
```
Multiple improvements to HITL flow:
* On a `response` type resume, we should still append the tool call to
the last AIMessage (otherwise we have a ToolResult without a
corresponding ToolCall)
* When all interrupts have `response` types (so there's no pending tool
calls), we should jump back to the first node (instead of end) as we
enforced in the previous `post_model_hook_router`
* Added comments to `model_to_tools` router so clarify all of the
potential exit conditions
Additionally:
* Lockfile update to use latest LG alpha release
* Added test for `jump_to` behaving ephemerally, this was fixed in LG
but surfaced as a bug w/ `jump_to`.
* Bump version to v1.0.0a10 to prep for alpha release
---------
Co-authored-by: Sydney Runkle <sydneymarierunkle@gmail.com>
Co-authored-by: Sydney Runkle <54324534+sydney-runkle@users.noreply.github.com>
Remove redundant/outdated `@pytest.mark.requires("jinja2")` decorator
Pytest marks (like `@pytest.mark.requires(...)`) applied directly to
fixtures have no effect and are deprecated.
Excluded pydantic_v1 module from import testing
Acceptable since this pydantic_v1 is explicitly deprecated. Testing its
importability at this stage serves little purpose since users should
migrate away from it.
## Summary
Adds test coverage for the `stringify_value` utility function to handle
complex nested data structures that weren't previously tested.
## Changes
- Added `test_stringify_value_nested_structures()` to `test_strings.py`
- Tests nested dictionaries within lists
- Tests mixed-type lists with various data types
- Verifies proper stringification of complex nested structures
## Why This Matters
- Fills a gap in test coverage for edge cases
- Ensures `stringify_value` handles complex data structures correctly
- Improves confidence in string utility functions used throughout the
codebase
- Low risk addition that strengthens existing test suite
## Testing
```bash
uv run --group test pytest libs/core/tests/unit_tests/utils/test_strings.py::test_stringify_value_nested_structures -v
```
This test addition follows the project's testing patterns and adds
meaningful coverage without introducing any breaking changes.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Enhance the pull request workflows by updating the `pull_request_target`
types and ensuring safety by avoiding checkout of the PR's head. Update
the action to use a specific commit from the archived repository.
**Description:** Right now, we interrupt even if the provided ToolConfig
has all false values. We should ignore ToolConfigs which do not have at
least one value marked as true (just as we would if tool_name: False was
passed into the dict).
# Main Changes
1. Adding decorator utilities for dynamically defining middleware with
single hook functions (see an example below for dynamic system prompt)
2. Adding better conditional edge drawing with jump configuration
attached to middleware. Can be registered w/ the decorator new
decorator!
## Decorator Utilities
```py
from langchain.agents.middleware_agent import create_agent, AgentState, ModelRequest
from langchain.agents.middleware.types import modify_model_request
from langchain_core.messages import HumanMessage
from langgraph.checkpoint.memory import InMemorySaver
@modify_model_request
def modify_system_prompt(request: ModelRequest, state: AgentState) -> ModelRequest:
request.system_prompt = (
"You are a helpful assistant."
f"Please record the number of previous messages in your response: {len(state['messages'])}"
)
return request
agent = create_agent(
model="openai:gpt-4o-mini",
middleware=[modify_system_prompt]
).compile(checkpointer=InMemorySaver())
```
## Visualization and Routing improvements
We now require that middlewares define the valid jumps for each hook.
If using the new decorator syntax, this can be done with:
```py
@before_model(jump_to=["__end__"])
@after_model(jump_to=["tools", "__end__"])
```
If using the subclassing syntax, you can use these two class vars:
```py
class MyMiddlewareAgentMiddleware):
before_model_jump_to = ["__end__"]
after_model_jump_to = ["tools", "__end__"]
```
Open for debate if we want to bundle these in a single jump map / config
for a middleware. Easy to migrate later if we decide to add more hooks.
We will need to **really clearly document** that these must be
explicitly set in order to enable conditional edges.
Notice for the below case, `Middleware2` does actually enable jumps.
<table>
<thead>
<tr>
<th>Before (broken), adding conditional edges unconditionally</th>
<th>After (fixed), adding conditional edges sparingly</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<img width="619" height="508" alt="Screenshot 2025-09-23 at 10 23 23 AM"
src="https://github.com/user-attachments/assets/bba2d098-a839-4335-8e8c-b50dd8090959"
/>
</td>
<td>
<img width="469" height="490" alt="Screenshot 2025-09-23 at 10 23 13 AM"
src="https://github.com/user-attachments/assets/717abf0b-fc73-4d5f-9313-b81247d8fe26"
/>
</td>
</tr>
</tbody>
</table>
<details>
<summary>Snippet for the above</summary>
```py
from typing import Any
from langchain.agents.tool_node import InjectedState
from langgraph.runtime import Runtime
from langchain.agents.middleware.types import AgentMiddleware, AgentState
from langchain.agents.middleware_agent import create_agent
from langchain_core.tools import tool
from typing import Annotated
from langchain_core.messages import HumanMessage
from typing_extensions import NotRequired
@tool
def simple_tool(input: str) -> str:
"""A simple tool."""
return "successful tool call"
class Middleware1(AgentMiddleware):
"""Custom middleware that adds a simple tool."""
tools = [simple_tool]
def before_model(self, state: AgentState, runtime: Runtime) -> None:
return None
def after_model(self, state: AgentState, runtime: Runtime) -> None:
return None
class Middleware2(AgentMiddleware):
before_model_jump_to = ["tools", "__end__"]
def before_model(self, state: AgentState, runtime: Runtime) -> None:
return None
def after_model(self, state: AgentState, runtime: Runtime) -> None:
return None
class Middleware3(AgentMiddleware):
def before_model(self, state: AgentState, runtime: Runtime) -> None:
return None
def after_model(self, state: AgentState, runtime: Runtime) -> None:
return None
builder = create_agent(
model="openai:gpt-4o-mini",
middleware=[Middleware1(), Middleware2(), Middleware3()],
system_prompt="You are a helpful assistant.",
)
agent = builder.compile()
```
</details>
## More Examples
### Guardrails `after_model`
<img width="379" height="335" alt="Screenshot 2025-09-23 at 10 40 09 AM"
src="https://github.com/user-attachments/assets/45bac7dd-398e-45d1-ae58-6ecfa27dfc87"
/>
<details>
<summary>Code</summary>
```py
from langchain.agents.middleware_agent import create_agent, AgentState, ModelRequest
from langchain.agents.middleware.types import after_model
from langchain_core.messages import HumanMessage, AIMessage
from langgraph.checkpoint.memory import InMemorySaver
from typing import cast, Any
@after_model(jump_to=["model", "__end__"])
def after_model_hook(state: AgentState) -> dict[str, Any]:
"""Check the last AI message for safety violations."""
last_message_content = cast(AIMessage, state["messages"][-1]).content.lower()
print(last_message_content)
unsafe_keywords = ["pineapple"]
if any(keyword in last_message_content for keyword in unsafe_keywords):
# Jump back to model to regenerate response
return {"jump_to": "model", "messages": [HumanMessage("Please regenerate your response, and don't talk about pineapples. You can talk about apples instead.")]}
return {"jump_to": "__end__"}
# Create agent with guardrails middleware
agent = create_agent(
model="openai:gpt-4o-mini",
middleware=[after_model_hook],
system_prompt="Keep your responses to one sentence please!"
).compile()
# Test with potentially unsafe input
result = agent.invoke(
{"messages": [HumanMessage("Tell me something about pineapples")]},
)
for msg in result["messages"]:
print(msg.pretty_print())
"""
================================ Human Message =================================
Tell me something about pineapples
None
================================== Ai Message ==================================
Pineapples are tropical fruits known for their sweet, tangy flavor and distinctive spiky exterior.
None
================================ Human Message =================================
Please regenerate your response, and don't talk about pineapples. You can talk about apples instead.
None
================================== Ai Message ==================================
Apples are popular fruits that come in various varieties, known for their crisp texture and sweetness, and are often used in cooking and baking.
None
"""
```
</details>
Mostly adding a descriptive frontmatter to workflow files. Also address
some formatting and outdated artifacts
No functional changes outside of
[d5457c3](d5457c39ee),
[90708a0](90708a0d99),
and
[338c82d](338c82d21e)
The file-based and title-based labeler workflows were conflicting,
causing the bot to add and remove identical labels in the same
operation. Hopefully this fixes
- Removes Codespell from deps, docs, and `Makefile`s
- Python version requirements in all `pyproject.toml` files now use the
`~=` (compatible release) specifier
- All dependency groups and main dependencies now use explicit lower and
upper bounds, reducing potential for breaking changes
We want state schema as the input schema to middleware nodes because the
conditional edges after these nodes need access to the full state.
Also, we just generally want all state passed to middleware nodes, so we
should be specifying this explicitly. If we don't, the state annotations
used by users in their node signatures are used (so they might be
missing fields).
# Changes
## Adds support for `DynamicSystemPromptMiddleware`
```py
from langchain.agents.middleware import DynamicSystemPromptMiddleware
from langgraph.runtime import Runtime
from typing_extensions import TypedDict
class Context(TypedDict):
user_name: str
def system_prompt(state: AgentState, runtime: Runtime[Context]) -> str:
user_name = runtime.context.get("user_name", "n/a")
return f"You are a helpful assistant. Always address the user by their name: {user_name}"
middleware = DynamicSystemPromptMiddleware(system_prompt)
```
## Adds support for `runtime` in middleware hooks
```py
class AgentMiddleware(Generic[StateT, ContextT]):
def modify_model_request(
self,
request: ModelRequest,
state: StateT,
runtime: Runtime[ContextT], # Optional runtime parameter
) -> ModelRequest:
# upgrade model if runtime.context.subscription is `top-tier` or whatever
```
## Adds support for omitting state attributes from input / output
schemas
```py
from typing import Annotated, NotRequired
from langchain.agents.middleware.types import PrivateStateAttr, OmitFromInput, OmitFromOutput
class CustomState(AgentState):
# Private field - not in input or output schemas
internal_counter: NotRequired[Annotated[int, PrivateStateAttr]]
# Input-only field - not in output schema
user_input: NotRequired[Annotated[str, OmitFromOutput]]
# Output-only field - not in input schema
computed_result: NotRequired[Annotated[str, OmitFromInput]]
```
## Additionally
* Removes filtering of state before passing into middleware hooks
Typing is not foolproof here, still need to figure out some of the
generics stuff w/ state and context schema extensions for middleware.
TODO:
* More docs for middleware, should hold off on this until other prios
like MCP and deepagents are met
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
## Summary
This PR fixes several bugs and improves the example code in
`BaseChatMessageHistory` docstring that would prevent it from working
correctly.
### Bugs Fixed
- **Critical bug**: Fixed `json.dump(messages, f)` →
`json.dump(serialized, f)` - was using wrong variable
- **NameError**: Fixed bare variable references to use
`self.storage_path` and `self.session_id`
- **Missing imports**: Added required imports (`json`, `os`, message
converters) to make example runnable
### Improvements
- Added missing type hints following project standards (`messages() ->
list[BaseMessage]`, `clear() -> None`)
- Added robust error handling with `FileNotFoundError` exception
handling
- Added directory creation with `os.makedirs(exist_ok=True)` to prevent
path errors
- Improved performance: `json.load(f)` instead of `json.loads(f.read())`
- Added explicit UTF-8 encoding to all file operations
- Updated stores.py to use modern union syntax (`int | None` vs
`Optional[int]`)
### Test Plan
- [x] Code passes linting (`ruff check`)
- [x] Example code now has all required imports and proper syntax
- [x] Fixed variable references prevent runtime errors
- [x] Follows project's type annotation standards
The example code in the docstring is now fully functional and follows
LangChain's coding standards.
---------
Co-authored-by: sadiqkhzn <sadiqkhzn@users.noreply.github.com>
- **Description:** Updated the dead/unreachable links to Docling from
the additional resources section of the langchain-docling docs
- **Issue:** Fixes langchain-ai/docs/issues/574
- **Dependencies:** None
# Main changes / new features
## Better support for parallel tool calls
1. Support for multiple tool calls requiring human input
2. Support for combination of tool calls requiring human input + those
that are auto-approved
3. Support structured output w/ tool calls requiring human input
4. Support structured output w/ standard tool calls
## Shortcut for allowed actions
Adds a shortcut where tool config can be specified as a `bool`, meaning
"all actions allowed"
```py
HumanInTheLoopMiddleware(tool_configs={"expensive_tool": True})
```
## A few design decisions here
* We only raise one interrupt w/ all `HumanInterrupt`s, currently we
won't be able to execute all tools until all of these are resolved. This
isn't super blocking bc we can't re-invoke the model until all tools
have finished execution. That being said, if you have a long running
auto-approved tool, this could slow things down.
## TODOs
* Ideally, we would rename `accept` -> `approve`
* Ideally, we would rename `respond` -> `reject`
* Docs update (@sydney-runkle to own)
* In another PR I'd like to refactor testing to have one file for each
prebuilt middleware :)
Fast follow to https://github.com/langchain-ai/langchain/pull/32962
which was deemed as too breaking
Adds documentation for the integration langchain-scraperapi, which
contains 3 tools using the ScraperAPI service.
The tools give AI agents the ability to
Scrape the web and return HTML/text/markdown
Perform Google search and return json output
Perform Amazon search and return json output
For reference, here is the official repo for langchain_scraperapi:
https://github.com/scraperapi/langchain-scraperapi
Replaced `input_message` parameter with a directly called tuple, e.g.
`{"messages": [("user", "What is my name?")]}`
Before, the memory function wasn't working with the agent, using the
format of the input_message parameter.
Specifically, on page [Build an
Agent#adding-in-memory](https://python.langchain.com/docs/tutorials/agents/#adding-in-memory)
In the previous code, the query "What's my name?" wasn't working, as the
agent could not recall memory correctly.
<img width="860" height="679" alt="image"
src="https://github.com/user-attachments/assets/dfbca21e-ffe9-4645-a810-3be7a46d81d5"
/>
This PR improves navigation in the summarization how-to section by
adding
cross-links from the single-call guide to the related map-reduce and
refine
guides. This mirrors the docs style guide’s emphasis on clear
cross-references
and should help readers discover the appropriate pattern for longer
texts.
- Source edited: docs/docs/how_to/summarize_stuff.ipynb
- Links added:
- /docs/how_to/summarize_map_reduce/
- /docs/how_to/summarize_refine/
Type: docs-only (no code changes)
Description:
Add a docstring to _load_map_reduce_chain in chains/summarize/ to
explain the purpose of the prompt argument and document function
parameters. This addresses an existing TODO in the codebase.
Issue:
N/A (documentation improvement only)
Dependencies:
None
**Description:**
Add a docstring to `_load_stuff_chain` in `chains/summarize/` to explain
the purpose of the `prompt` argument and document function parameters.
This addresses an existing TODO in the codebase.
**Issue:**
N/A (documentation improvement only)
**Dependencies:**
None
Bumps [CodSpeedHQ/action](https://github.com/codspeedhq/action) from 3
to 4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/codspeedhq/action/releases">CodSpeedHQ/action's
releases</a>.</em></p>
<blockquote>
<h2>v4.0.0</h2>
<h2>💥 BREAKING</h2>
<p>It's now required to explicitly set the runner mode to
<code>instrumentation</code> or <code>walltime</code> using either:</p>
<ul>
<li>the <code>mode</code> argument</li>
<li>or the <code>CODSPEED_RUNNER_MODE</code> environment variable</li>
</ul>
<blockquote>
<p>[!TIP]
Before, this variable was automatically set to
<code>instrumentation</code> on every runner except for <a
href="https://codspeed.io/docs/instruments/walltime">CodSpeed macro
runners</a> where it was set to <code>walltime</code> by default.</p>
</blockquote>
<p>Find more details in <a
href="https://codspeed.io/docs/instruments">the instruments
documentation</a>.</p>
<h2>Details</h2>
<h3><!-- raw HTML omitted -->🚀 Features</h3>
<ul>
<li>Make perf profiling enabled by default by <a
href="https://github.com/GuillaumeLagrange"><code>@GuillaumeLagrange</code></a>
in <a
href="https://redirect.github.com/CodSpeedHQ/runner/pull/110">#110</a></li>
<li>Make the runner mode argument required by <a
href="https://github.com/GuillaumeLagrange"><code>@GuillaumeLagrange</code></a></li>
<li>Use introspected node in walltime mode by <a
href="https://github.com/GuillaumeLagrange"><code>@GuillaumeLagrange</code></a>
in <a
href="https://redirect.github.com/CodSpeedHQ/runner/pull/108">#108</a></li>
<li>Add instrumented go shell script by <a
href="https://github.com/not-matthias"><code>@not-matthias</code></a>
in <a
href="https://redirect.github.com/CodSpeedHQ/runner/pull/102">#102</a></li>
</ul>
<h3><!-- raw HTML omitted -->🐛 Bug Fixes</h3>
<ul>
<li>Compute proper load bias by <a
href="https://github.com/not-matthias"><code>@not-matthias</code></a>
in <a
href="https://redirect.github.com/CodSpeedHQ/runner/pull/107">#107</a></li>
<li>Increase timeout for first perf ping by <a
href="https://github.com/GuillaumeLagrange"><code>@GuillaumeLagrange</code></a></li>
<li>Prevent running with valgrind by <a
href="https://github.com/not-matthias"><code>@not-matthias</code></a>
in <a
href="https://redirect.github.com/CodSpeedHQ/runner/pull/106">#106</a></li>
</ul>
<h3><!-- raw HTML omitted -->🏗️ Refactor</h3>
<ul>
<li>Change go-runner binary name by <a
href="https://github.com/not-matthias"><code>@not-matthias</code></a>
in <a
href="https://redirect.github.com/CodSpeedHQ/runner/pull/111">#111</a></li>
</ul>
<p><strong>Full Runner Changelog</strong>: <a
href="https://github.com/CodSpeedHQ/runner/blob/main/CHANGELOG.md">https://github.com/CodSpeedHQ/runner/blob/main/CHANGELOG.md</a></p>
<h2>v3.8.1</h2>
<h2>What's Changed</h2>
<h3><!-- raw HTML omitted -->🐛 Bug Fixes</h3>
<ul>
<li>Don't show error when libpython is not found by <a
href="https://github.com/not-matthias"><code>@not-matthias</code></a></li>
</ul>
<h3><!-- raw HTML omitted -->🏗️ Refactor</h3>
<ul>
<li>Improve conditional compilation in
<code>get_pipe_open_options</code> by <a
href="https://github.com/art049"><code>@art049</code></a> in <a
href="https://redirect.github.com/CodSpeedHQ/runner/pull/100">#100</a></li>
</ul>
<h3><!-- raw HTML omitted -->⚙️ Internals</h3>
<ul>
<li>Change log level to warn for venv_compat error by <a
href="https://github.com/not-matthias"><code>@not-matthias</code></a>
in <a
href="https://redirect.github.com/CodSpeedHQ/runner/pull/104">#104</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/CodSpeedHQ/action/compare/v3.8.0...v3.8.1">https://github.com/CodSpeedHQ/action/compare/v3.8.0...v3.8.1</a>
<strong>Full Runner Changelog</strong>: <a
href="https://github.com/CodSpeedHQ/runner/blob/main/CHANGELOG.md">https://github.com/CodSpeedHQ/runner/blob/main/CHANGELOG.md</a></p>
<h2>v3.8.0</h2>
<h2>What's Changed</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="653fdc30e6"><code>653fdc3</code></a>
Release v4.0.1 🚀</li>
<li><a
href="4da7be1bda"><code>4da7be1</code></a>
chore: bump runner version to 4.0.1</li>
<li><a
href="172d6c5630"><code>172d6c5</code></a>
chore: make the comment about input validation more discrete</li>
<li><a
href="d15e1ce813"><code>d15e1ce</code></a>
chore: improve the release script</li>
<li><a
href="6eeb021fd0"><code>6eeb021</code></a>
Release v4.0.0 🚀</li>
<li><a
href="74312dabbe"><code>74312da</code></a>
chore: improve the release script</li>
<li><a
href="8a17a350a8"><code>8a17a35</code></a>
ci: add modes to the matrix</li>
<li><a
href="8e3f02a649"><code>8e3f02a</code></a>
feat: make the mode argument required</li>
<li><a
href="97c7a6f5fc"><code>97c7a6f</code></a>
chore: bump runner version to 4.0.0</li>
<li><a
href="8a4cadd026"><code>8a4cadd</code></a>
chore: point the changelog to the runner</li>
<li>See full diff in <a
href="https://github.com/codspeedhq/action/compare/v3...v4">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
## Description
This PR adds documentation for the new ZeusDB vector store integration
with LangChain.
## Motivation
ZeusDB is a high-performance vector database (Python/Rust backend)
designed for AI applications that need fast similarity search and
real-time vector ops. This integration brings ZeusDB's capabilities to
the LangChain ecosystem, giving developers another production-oriented
option for vector storage and retrieval.
**Key Features:**
- **User-Friendly Python API**: Intuitive interface that integrates
seamlessly with Python ML workflows
- **High Performance**: Powered by a robust Rust backend for
lightning-fast vector operations
- **Enterprise Logging**: Comprehensive logging capabilities for
monitoring and debugging production systems
- **Advanced Features**: Includes product quantization and persistence
capabilities
- **AI-Optimized**: Purpose-built for modern AI applications and RAG
pipelines
## Changes
- Added provider documentation:
`docs/docs/integrations/providers/zeusdb.mdx` (installation, setup).
- Added vector store documentation:
`docs/docs/integrations/vectorstores/zeusdb.ipynb` (quickstart for
creating/querying a ZeusDBVectorStore).
- Registered langchain-zeusdb in `libs/packages.yml` for discovery.
## Target users
- AI/ML engineers building RAG pipelines
- Data scientists working with large document collections
- Developers needing high-throughput vector search
- Teams requiring near real-time vector operations
## Testing
- Followed LangChain's "How to add standard tests to an integration"
guidance.
- Code passes format, lint, and test checks locally.
- Tested with LangChain Core 0.3.74
- Works with Python 3.10 to 3.13
## Package Information
**PyPI:** https://pypi.org/project/langchain-zeusdb
**Github:** https://github.com/ZeusDB/langchain-zeusdb
## Summary
- Add comprehensive type hints to the MyInMemoryStore example code in
BaseStore docstring
- Improve documentation quality and educational value for developers
- Align with LangChain's coding standards requiring type hints on all
Python code
## Changes Made
- Added return type annotations to all methods (__init__, mget, mset,
mdelete, yield_keys)
- Added parameter type annotations using proper generic types (Sequence,
Iterator)
- Added instance variable type annotation for the store attribute
- Used modern Python union syntax (str | None) for optional types
## Test Plan
- Verified Python syntax validity with ast.parse()
- No functional changes to actual code, only documentation improvements
- Example code now follows best practices and coding standards
This change improves the educational value of the example code and
ensures consistency with LangChain's requirement that "All Python code
MUST include type hints and return types" as specified in the
development guidelines.
---------
Co-authored-by: sadiqkhzn <sadiqkhzn@users.noreply.github.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
**Description:**
Introduces documentation notebooks for AI/ML API integration covering
the following use cases:
- Chat models (`ChatAimlapi`)
- Text completion models (`AimlapiLLM`)
- Provider usage examples
- Text embedding models (`AimlapiEmbeddings`)
Additionally, adds the `langchain-aimlapi` package entry to
`libs/packages.yml` for package management.
This PR aims to provide a comprehensive starting point for developers
integrating AI/ML API models with LangChain via the new
`langchain-aimlapi` package.
**Issue:** N/A
**Dependencies:** None
**Twitter handle:** @aimlapi
---
### **To-Do Before Submitting PR:**
* [x] Run `make format`
* [x] Run `make lint`
* [x] Confirm all documentation notebooks are in
`docs/docs/integrations/`
* [x] Double-check `libs/packages.yml` has the correct repo path
* [x] Confirm no `pyproject.toml` modifications were made unnecessarily
Co-authored-by: Mason Daugherty <mason@langchain.dev>
**Description:**
This PR updates the free searches per month from **100** to **250** and
renames SerpAPI to [SerpApi](https://serpapi.com/) to prevent confusion.
Add import API keys and enhance usage instructions in the Jupyter
notebook
**Issue:** N/A
**Dependencies:** N/A
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. **We will not consider
a PR unless these three are passing in CI.** See [contribution
guidelines](https://python.langchain.com/docs/contributing/) for more.
**Description:**
This PR updated links to the latest Anthropic documentation. Changes
include revised links for model overview, tool usage, web search tool,
text editor tool, and more.
**Issue:**
N/A
**Dependencies:**
None
**Twitter handle:**
N/A
- **Description:** The `langchain-yugabytedb` package implementations of
core LangChain abstractions using `YugabyteDB` Distributed SQL Database.
YugabyteDB is a cloud-native distributed PostgreSQL-compatible database
that combines strong consistency with ultra-resilience, seamless
scalability, geo-distribution, and highly flexible data locality to
deliver business-critical, transactional applications.
[YugabyteDB](https://www.yugabyte.com/ai/) combines the power of the
`pgvector` PostgreSQL extension with an inherently distributed
architecture. This future-proofed foundation helps you build GenAI
applications using RAG retrieval that demands high-performance vector
search.
- [ ] **tests and docs**:
1. `langchain-yugabytedb`
[github](https://github.com/yugabyte/langchain-yugabytedb) repo.
2. YugabyteDB VectorStore example notebook showing its use. It lives in
`langchain/docs/docs/integrations/vectorstores/yugabytedb.ipynb`
directory.
3. Running `langchain-yugabytedb` unit tests
- Setting up a Development Environment
This document details how to set up a local development environment that
will
allow you to contribute changes to the project.
Acquire sources and create virtualenv.
```shell
git clone https://github.com/yugabyte/langchain-yugabytedb
cd langchain-yugabytedb
uv venv --python=3.13
source .venv/bin/activate
```
Install package in editable mode.
```shell
uv pip install pipx
pipx install poetry
poetry install
uv pip install pytest pytest_asyncio pytest-timeout langchain-core langchain_tests sqlalchemy psycopg psycopg-binary numpy pgvector
```
Start YugabyteDB RF-1 Universe.
```shell
docker run -d --name yugabyte_node01 --hostname yugabyte01 \
-p 7000:7000 -p 9000:9000 -p 15433:15433 -p 5433:5433 -p 9042:9042 \
yugabytedb/yugabyte:2.25.2.0-b359 bin/yugabyted start --background=false \
--master_flags="allowed_preview_flags_csv=ysql_yb_enable_advisory_locks,ysql_yb_enable_advisory_locks=true" \
--tserver_flags="allowed_preview_flags_csv=ysql_yb_enable_advisory_locks,ysql_yb_enable_advisory_locks=true"
docker exec -it yugabyte_node01 bin/ysqlsh -h yugabyte01 -c "CREATE extension vector;"
```
Invoke test cases.
```shell
pytest -vvv tests/unit_tests/yugabytedb_tests
```
Thank you for contributing to LangChain! Follow these steps to mark your
pull request as ready for review. **If any of these steps are not
completed, your PR will not be considered for review.**
- [x] **feat(docs)**: add Bigtable Key-value store doc
- [X] **feat(docs)**: add Bigtable Vector store doc
This PR adds a doc for Bigtable and LangChain Key-value store
integration. It contains guides on how to add, delete, get, and yield
key-value pairs from Bigtable Key-value Store for LangChain.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. **We will not consider
a PR unless these three are passing in CI.** See [contribution
guidelines](https://python.langchain.com/docs/contributing/) for more.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to `pyproject.toml` files (even
optional ones) unless they are **required** for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
# feat(integrations): Add Timbr tools integration
## DESCRIPTION
This PR adds comprehensive documentation and integration support for
Timbr's semantic layer tools in LangChain.
[Timbr](https://timbr.ai/) provides an ontology-driven semantic layer
that enables natural language querying of databases through
business-friendly concepts. It connects raw data to governed business
measures for consistent access across BI, APIs, and AI applications.
[`langchain-timbr`](https://pypi.org/project/langchain-timbr/) is a
Python SDK that extends
[LangChain](https://github.com/WPSemantix/Timbr-GenAI/tree/main/LangChain)
and
[LangGraph](https://github.com/WPSemantix/Timbr-GenAI/tree/main/LangGraph)
with custom agents, chains, and nodes for seamless integration with the
Timbr semantic layer. It enables converting natural language prompts
into optimized semantic-SQL queries and executing them directly against
your data.
**What's Added:**
- Complete integration documentation for `langchain-timbr` package
- Tool documentation page with usage examples and API reference
**Integration Components:**
- `IdentifyTimbrConceptChain` - Identify relevant concepts from user
prompts
- `GenerateTimbrSqlChain` - Generate SQL queries from natural language
- `ValidateTimbrSqlChain` - Validate queries against knowledge graph
schemas
- `ExecuteTimbrQueryChain` - Execute queries against semantic databases
- `GenerateAnswerChain` - Generate human-readable answers from results
## Documentation Added
- `/docs/integrations/providers/timbr.mdx` - Provider overview and
configuration
- `/docs/integrations/tools/timbr.ipynb` - Comprehensive tool usage
examples
## Links
- [PyPI Package](https://pypi.org/project/langchain-timbr/)
- [GitHub Repository](https://github.com/WPSemantix/langchain-timbr)
- [Official
Documentation](https://docs.timbr.ai/doc/docs/integration/langchain-sdk/)
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Thank you for contributing to LangChain! Follow these steps to mark your
pull request as ready for review. **If any of these steps are not
completed, your PR will not be considered for review.**
**Description:**
Add documentation for Qwen integration in LangChain, including setup
instructions, usage examples, and configuration details. Update related
qwq documentation to reflect current best practices and improve clarity
for users.
This PR enhances the documentation ecosystem by:
- Adding a new guide for integrating Qwen models
- Updating outdated or incomplete qwq documentation
- Improving structure and readability of relevant sections
**Issue:** N/A
**Dependencies:** None
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
**Description:** Adds documentation for ZenRows integration with
LangChain, including provider overview and detailed tool documentation.
ZenRows is an enterprise-grade web scraping solution that enables
LangChain agents to extract web content at scale with advanced features
like JavaScript rendering, anti-bot bypass, geo-targeting, and multiple
output formats.
This PR includes:
- Provider documentation
(`docs/docs/integrations/providers/zenrows.ipynb`)
- Tool documentation
(`docs/docs/integrations/tools/zenrows_universal_scraper.ipynb`)
- Complete usage examples and API reference links
**Issue:** N/A
**Dependencies:**
- [langchain-zenrows](https://github.com/ZenRows-Hub/langchain-zenrows)
package (external, available on
[PyPI](https://pypi.org/project/langchain-zenrows/))
- No changes to core LangChain dependencies
**LinkedIn handle:** https://www.linkedin.com/company/zenrows/
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Adding Oracle Generative AI as one of the providers for langchain.
Updated the old examples in the documentation with the new working
examples.
---------
Co-authored-by: Vishal Karwande <vishalkarwande@Vishals-MacBook-Pro.local>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
**Description:** Fixes a small typo in `_get_document_with_hash` inside
`libs/core/langchain_core/indexing/api.py`.
**Issue:** N/A (no related issue)
**Dependencies:** None
Especially helpful for the text splitters tests where we're installing
pytorch (expensive and slow slow slow). Should speed up CI by 5-10 mins.
w/o caches, CI taking 20 minutes 😨
w/ caches, CI taking 3 minutes
Taking advantage of [partial
runs](https://codspeed.io/docs/features/partial-runs)!
This should save us minutes on every CI job, we only run codspeed for
libs w/ changes and this doesn't affect benchmarking drops
Oversight when moving back to basic function call for
`modify_model_request` rather than implementation as its own node.
Basic test right now failing on main, passing on this branch
Revealed a gap in testing. Will write up a more robust test suite for
basic middleware features.
### Description
* Replace the Mermaid graph node label escaping logic
(`_escape_node_label`) with `_to_safe_id`, which converts a string into
a unique, Mermaid-compatible node id. Ensures nodes with special
characters always render correctly.
**Before**
* Invalid characters (e.g. `开`) replaced with `_`. Causes collisions
between nodes with names that are the same length and contain all
non-safe characters:
```python
_escape_node_label("开") # '_'
_escape_node_label("始") # '_' same as above, but different character passed in. not a unique mapping.
```
**After**
```python
_to_safe_id("开") # \5f00
_to_safe_id("始") # \59cb unique!
```
### Tests
* Rename `test_graph_mermaid_escape_node_label()` to
`test_graph_mermaid_to_safe_id()` and update function logic to use
`_to_safe_id`
* Add `test_graph_mermaid_special_chars()`
### Issue
Fixeslangchain-ai/langgraph#6036
Reusable workflows are not currently supported by PyPI's Trusted
Publishing
functionality, and are subject to breakage. Users are strongly
encouraged
to avoid using reusable workflows for Trusted Publishing until support
becomes official. Please, do not report bugs if this breaks.
Description: Fixes a bug in RunnableRetry where .batch / .abatch could
return misordered outputs (e.g. inputs [0,1,2] yielding [1,1,2]) when
some items succeeded on an earlier attempt and others were retried. Root
cause: successful results were stored keyed by the index within the
shrinking “pending” subset rather than the original input index, causing
collisions and reordered/duplicated outputs after retries. Fix updates
_batch and _abatch to:
- Track remaining original indices explicitly.
- Call underlying batch/abatch only on remaining inputs.
- Map results back to original indices.
- Preserve final ordering by reconstructing outputs in original
positional order.
Issue: Fixes#21326
Tests:
- Added regression tests: test_retry_batch_preserves_order and
test_async_retry_batch_preserves_order asserting correct ordering after
a single controlled failure + retry.
- Existing retry tests still pass.
Dependencies:
- None added or changed.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
This removes langchain-experimental from api reference.
We do not recommend it to users for production use cases, so let's also
deprecate it from documentation
**Description:** Fixes infinite recursion issue in JSON schema
dereferencing when objects contain both $ref and other properties (e.g.,
nullable, description, additionalProperties). This was causing Apollo
MCP server schemas to hang indefinitely during tool binding.
**Problem:**
- Commit fb5da8384 changed the condition from `set(obj.keys()) ==
{"$ref"}` to `"$ref" in set(obj.keys())`
- This caused objects with $ref + other properties to be treated as pure
$ref nodes
- Result: other properties were lost and infinite recursion occurred
with complex schemas
**Solution:**
- Restore pure $ref detection for objects with only $ref key
- Add proper handling for mixed $ref objects that preserves all
properties
- Merge resolved reference content with other properties
- Maintain cycle detection to prevent infinite recursion
**Impact:**
- Fixes Apollo MCP server schema integration
- Resolves tool binding infinite recursion with complex GraphQL schemas
- Preserves backward compatibility with existing functionality
- No performance impact - actually improves handling of complex schemas
**Issue:** Fixes#32511
**Dependencies:** None
**Testing:**
- Added comprehensive unit tests covering mixed $ref scenarios
- All existing tests pass (1326 passed, 0 failed)
- Tested with realistic Apollo GraphQL schemas
- Stress tested with 100 iterations of complex schemas
**Verification:**
- ✅ `make format` - All files properly formatted
- ✅ `make lint` - All linting checks pass
- ✅ `make test` - All 1326 unit tests pass
- ✅ No breaking changes - full backwards compatibility maintained
---------
Co-authored-by: Marcus <marcus@Marcus-M4-MAX.local>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
On Friday, October 10th, the moonshotai/kimi-k2-instruct model will be
decommissioned in favor of the latest version,
moonshotai/kimi-k2-instruct-0905.
Until then, requests to moonshotai/kimi-k2-instruct will automatically
be routed to moonshotai/kimi-k2-instruct-0905.
# Description
This PR fixes a bug in _recursive_set_additional_properties_false used
in function_calling.convert_to_openai_function.
Previously, schemas with "additionalProperties=True" were not correctly
overridden when strict validation was expected, which could lead to
invalid OpenAI function schemas.
The updated implementation ensures that:
- Any schema with "additionalProperties" already set will now be forced
to False under strict mode.
- Recursive traversal of properties, items, and anyOf is preserved.
- Function signature remains unchanged for backward compatibility.
# Issue
When using tool calling in OpenAI structured output strict mode
(strict=True), 400: "Invalid schema for response_format XXXXX
'additionalProperties' is required to be supplied and to be false" error
raises for the parameter that contains dict type. OpenAI requires
additionalProperties to be set to False.
Some PRs try to resolved the issue.
- PR #25169 introduced _recursive_set_additional_properties_false to
recursively set additionalProperties=False.
- PR #26287 fixed handling of empty parameter tools for OpenAI function
generation.
- PR #30971 added support for Union type arguments in strict mode of
OpenAI function calling / structured output.
Despite these improvements, since Pydantic 2.11, it will always add
`additionalProperties: True` for arbitrary dictionary schemas dict or
Any (https://pydantic.dev/articles/pydantic-v2-11-release#changes).
Schemas that already had additionalProperties=True in such cases were
not being overridden, which this PR addresses to ensure strict mode
behaves correctly in all cases.
# Dependencies
No Changes
---------
Co-authored-by: Zhong, Yu <yzhong@freewheel.com>
This PR adds a new cookbook demonstrating how to build a RAG pipeline
with LangChain and track + evaluate it using MLflow.
Currently not much documentation on LangChain MLflow integration, hope
this can help folks trying to monitor and evaluate their LangChain
applications.
- ArXiv document loader
- In Memory vector store
- LCEL rag pipeline
- MLflow tracing
- MLflow evaluation
Issue:
N/A
Dependencies:
N/A
**Description:**
Updates the Confident AI integration documentation to use modern
patterns and improve code quality. This change:
- Replaces deprecated `DeepEvalCallbackHandler` with the new
`CallbackHandler` from `deepeval.integrations.langchain`
- Updates installation and authentication instructions to match current
best practices
- Adds modern integration examples using LangChain's latest patterns
- Removes deprecated metrics and outdated code examples
- Updates code samples to follow current best practices
The changes make the documentation more maintainable and ensure users
follow the recommended integration patterns.
**Issue:** Fixes#32444
**Dependencies:**
- deepeval
- langchain
- langchain-openai
**Twitter handle:** @Muwinuddin
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Description:
Added "Method Two: Quick Setup (Linux)" section to prerequisites,
providing a curl-based installation method for deploying JaguarDB
without Docker. Retained original Docker setup instructions for
flexibility.
Thank you for contributing to LangChain! Follow these steps to mark your
pull request as ready for review. **If any of these steps are not
completed, your PR will not be considered for review.**
- **Description:** Aerospike Vector Store has been retired. It is no
longer supported so It should no longer be documented on the Langchain
site.
- **Add tests and docs**: Removes docs for retired Aerospike vector
store.
- **Lint and test**: NA
Added a short section to the Weaviate integration docs showing how to
connect to an existing collection (reuse an index) with
`WeaviateVectorStore`. This helps clarify required parameters
(`index_name`, `text_key`) when loading a pre-existing store, which was
previously missing.
Thank you for contributing to LangChain! Follow these steps to mark your
pull request as ready for review. **If any of these steps are not
completed, your PR will not be considered for review.**
### Description
Added a short section to the Weaviate integration docs showing how to
connect to an existing collection (reuse an index) with
`WeaviateVectorStore`. This helps clarify required parameters
(`index_name`, `text_key`) when loading a pre-existing store, which was
previously missing.
### Issue
Fixeslangchain-ai/langchain-weaviate#197
### Dependencies
None
Thank you for contributing to LangChain! Follow these steps to mark your
pull request as ready for review. **If any of these steps are not
completed, your PR will not be considered for review.**
- [x] **PR title**: Follows the format: {TYPE}({SCOPE}): {DESCRIPTION}
- Examples:
- feat(core): add multi-tenant support
- fix(cli): resolve flag parsing error
- docs(openai): update API usage examples
- Allowed `{TYPE}` values:
- feat, fix, docs, style, refactor, perf, test, build, ci, chore,
revert, release
- Allowed `{SCOPE}` values (optional):
- core, cli, langchain, standard-tests, docs, anthropic, chroma,
deepseek, exa, fireworks, groq, huggingface, mistralai, nomic, ollama,
openai, perplexity, prompty, qdrant, xai
- Note: the `{DESCRIPTION}` must not start with an uppercase letter.
- Once you've written the title, please delete this checklist item; do
not include it in the PR.
- [x] **PR message**:
- **Description:** Fixing the import path for `WatsonxToolkit` in
examples after releasing `lnagchain-ibm==0.3.17`
- [ ] **Add tests and docs**: If you're adding a new integration, you
must include:
1. A test for the integration, preferably unit tests that do not rely on
network access,
2. An example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. **We will not consider
a PR unless these three are passing in CI.** See [contribution
guidelines](https://python.langchain.com/docs/contributing/) for more.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to `pyproject.toml` files (even
optional ones) unless they are **required** for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
### Description
This PR is primarily aimed at updating some usage methods in the
`modelscope.mdx` file.
Specifically, it changes from `ModelScopeLLM` to `ModelScopeEndpoint`.
### Relevant PR
The relevant PR link is:
https://github.com/langchain-ai/langchain/pull/28941
**Description:**
Raise a more descriptive OutputParserException when JSON parsing results
in a non-dict type. This improves debugging and aligns behavior with
expectations when using expected_keys.
**Issue:**
Fixes#32233
**Twitter handle:**
@yashvtobre
**Testing:**
- Ran make format and make lint from the root directory; both passed
cleanly.
- Attempted make test but no such target exists in the root Makefile.
- Executed tests directly via pytest targeting the relevant test file,
confirming all tests pass except for unrelated async test failures
outside the scope of this change.
**Notes:**
- No additional dependencies introduced.
- Changes are backward compatible and isolated within the output parser
module.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
- **Description:** Currently,
`langchain_core.runnables.graph_mermaid.py` is hardcoded to use
mermaid.ink to render graph diagrams. It would be nice to allow users to
specify a custom URL, e.g. for self-hosted instances of the Mermaid
server.
- **Issue:** [Langchain Forum: allow custom mermaid API
URL](https://forum.langchain.com/t/feature-request-allow-custom-mermaid-api-url/1472)
- **Dependencies:** None
- [X] **Add tests and docs**: Added unit tests using mock requests.
- [X] **Lint and test**: Run `make format`, `make lint` and `make test`.
Minimal example using the feature:
```python
import os
import operator
from pathlib import Path
from typing import Any, Annotated, TypedDict
from langgraph.graph import StateGraph
class State(TypedDict):
messages: Annotated[list[dict[str, Any]], operator.add]
def hello_node(state: State) -> State:
return {"messages": [{"role": "assistant", "content": "pong!"}]}
builder = StateGraph(State)
builder.add_node("hello_node", hello_node)
builder.add_edge("__start__", "hello_node")
builder.add_edge("hello_node", "__end__")
graph = builder.compile()
# Run graph
output = graph.invoke({"messages": [{"role": "user", "content": "ping?"}]})
# Draw graph
Path("graph.png").write_bytes(graph.get_graph().draw_mermaid_png(base_url="https://custom-mermaid.ink"))
```
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- Beta isn't needed for search result tests anymore
- Add TODO for other tests to come back when generally available
- Regenerate remote MCP snapshot after some testing (now the same, but
fresher)
- Bump deps
This pull request introduces a failing unit test to reproduce the bug
reported in issue #32028.
The test asserts the expected behavior: `BaseCallbackManager.merge()`
should combine `handlers` and `inheritable_handlers` independently,
without mixing them. This test will fail on the current codebase and is
intended to guide the fix and prevent future regressions.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
The Ollama chat model adapter does not support all of the possible
message content formats. That leads to Ollama model adapter crashing on
some messages from different models (e.g. Gemini 2.5 Flash).
These changes should fix one known scenario - when `content` is a list
containing a string.
This allows to use PEP604 syntax for `ToolNode` error handlers
```python
def error_handler(e: ValueError | ToolException) -> str:
return "error"
ToolNode(my_tool, handle_tool_errors=error_handler).invoke(...)
```
Without this change, this fails with `AttributeError: 'types.UnionType'
object has no attribute '__mro__'`
This is better than using a subclass as returning a `property` works
with `ClassWithBetaMethods.beta_property.__doc__`
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Added an id field to the Document passed to filter for
InMemoryVectorStore similarity search. This allows filtering by Document
id and brings the input to the filter in line with the result returned
by the vector similarity search.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- stars badge redundant (look at the top of the page)
- remove version badge since we have many pkgs (and it was only showing
core) -- also, just look at the releases tab to the right of the readme
- **Description:** The vectorstore standard-test mistakenly assumes that
the store's `get_by_ids` respects the order of the provided `ids`. This
is not the case (as the base class docstring states). This PR fixes
those tests that would fail otherwise (see issue #32820 for details,
repro and all). Fixes#32820
- **Issue:** Fixes#32820
- **Dependencies:** none
Co-authored-by: Stefano Lottini <stefano.lottini@ibm.com>
## Overview
Adding new `AgentMiddleware` primitive that supports `before_model`,
`after_model`, and `prepare_model_request` hooks.
This is very exciting! It makes our `create_agent` prebuilt much more
extensible + capable. Still in alpha and subject to change.
This is different than the initial
[implementation](https://github.com/langchain-ai/langgraph/tree/nc/25aug/agent)
in that it:
* Fills in gaps w/ missing features, for ex -- new structured output,
optionality of tools + system prompt, sync and async model requests,
provider builtin tools
* Exposes private state extensions for middleware, enabling things like
model call tracking, etc
* Middleware can register tools
* Uses a `TypedDict` for `AgentState` -- dataclass subclassing is tricky
w/ required values + required decorators
* Addition of `model_settings` to `ModelRequest` so that we can pass
through things to bind (like cache kwargs for anthropic middleware)
## TODOs
### top prio
- [x] add middleware support to existing agent
- [x] top prio middlewares
- [x] summarization node
- [x] HITL
- [x] prompt caching
other ones
- [x] model call limits
- [x] tool calling limits
- [ ] usage (requires output state)
### secondary prio
- [x] improve typing for state updates from middleware (not working
right now w/ simple `AgentUpdate` and `AgentJump`, at least in Python)
- [ ] add support for public state (input / output modifications via
pregel channel mods) -- to be tackled in another PR
- [x] testing!
### docs
See https://github.com/langchain-ai/docs/pull/390
- [x] high level docs about middleware
- [x] summarization node
- [x] HITL
- [x] prompt caching
## open questions
Lots of open questions right now, many of them inlined as comments for
the short term, will catalog some more significant ones here.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
**Description:**
Remove a character in tool_calling.ipynb that causes a grammatical error
Verification: Local docs build passed after fix
**Issue:**
None (direct hotfix for rendering issue identified during documentation
review)
**Dependencies:**
None
**Description:** This PR fixes the broken Anthropic model example in the
documentation introduction page and adds a comment field to display
model version warnings in code blocks. The changes ensure that users can
successfully run the example code and are reminded to check for the
latest model versions.
**Issue:** https://github.com/langchain-ai/langchain/issues/32806
**Changes made:**
- Update Anthropic model from broken "claude-3-5-sonnet-latest" to
working "claude-3-7-sonnet-20250219"
- Add comment field to display model version warnings in code blocks
- Improve user experience by providing working examples and version
guidance
**Dependencies:** None required
Fixes#32747
SpaCy integration test fixture was trying to use pip to download the
SpaCy language model (`en_core_web_sm`), but uv-managed environments
don't include pip by default. Fail test if not installed as opposed to
downloading.
Removed a period in bulleted list for consistency
Thank you for contributing to LangChain! Follow these steps to mark your
pull request as ready for review. **If any of these steps are not
completed, your PR will not be considered for review.**
- [ ] **PR title**: Follows the format: {TYPE}({SCOPE}): {DESCRIPTION}
- Examples:
- feat(core): add multi-tenant support
- fix(cli): resolve flag parsing error
- docs(openai): update API usage examples
- Allowed `{TYPE}` values:
- feat, fix, docs, style, refactor, perf, test, build, ci, chore,
revert, release
- Allowed `{SCOPE}` values (optional):
- core, cli, langchain, standard-tests, docs, anthropic, chroma,
deepseek, exa, fireworks, groq, huggingface, mistralai, nomic, ollama,
openai, perplexity, prompty, qdrant, xai
- Note: the `{DESCRIPTION}` must not start with an uppercase letter.
- Once you've written the title, please delete this checklist item; do
not include it in the PR.
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change. Include a [closing
keyword](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)
if applicable to a relevant issue.
- **Issue:** the issue # it fixes, if applicable (e.g. Fixes#123)
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, you
must include:
1. A test for the integration, preferably unit tests that do not rely on
network access,
2. An example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. **We will not consider
a PR unless these three are passing in CI.** See [contribution
guidelines](https://python.langchain.com/docs/contributing/) for more.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to `pyproject.toml` files (even
optional ones) unless they are **required** for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
Completed the sentence by adding a period ".", in sync with other points
>> Click "Propose changes"
to
>> Click "Propose changes".
Thank you for contributing to LangChain! Follow these steps to mark your
pull request as ready for review. **If any of these steps are not
completed, your PR will not be considered for review.**
- [ ] **PR title**: Follows the format: {TYPE}({SCOPE}): {DESCRIPTION}
- Examples:
- feat(core): add multi-tenant support
- fix(cli): resolve flag parsing error
- docs(openai): update API usage examples
- Allowed `{TYPE}` values:
- feat, fix, docs, style, refactor, perf, test, build, ci, chore,
revert, release
- Allowed `{SCOPE}` values (optional):
- core, cli, langchain, standard-tests, docs, anthropic, chroma,
deepseek, exa, fireworks, groq, huggingface, mistralai, nomic, ollama,
openai, perplexity, prompty, qdrant, xai
- Note: the `{DESCRIPTION}` must not start with an uppercase letter.
- Once you've written the title, please delete this checklist item; do
not include it in the PR.
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change. Include a [closing
keyword](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)
if applicable to a relevant issue.
- **Issue:** the issue # it fixes, if applicable (e.g. Fixes#123)
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, you
must include:
1. A test for the integration, preferably unit tests that do not rely on
network access,
2. An example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. **We will not consider
a PR unless these three are passing in CI.** See [contribution
guidelines](https://python.langchain.com/docs/contributing/) for more.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to `pyproject.toml` files (even
optional ones) unless they are **required** for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
Update `langchain-core` dependency min from `>=0.3.63` to `>=0.3.75`.
### Motivation
- We located the `langchain-core` package locally in the monorepo and
need to align `langchain-tests` with the new minimum version.
### Overview
Preparing the `1.0.0a1` release of `langchain-tests` to align with
`langchain-core` version `1.0.0a1`.
### Changes
- Bump package version to `1.0.0a1`
- Relax `langchain-core` requirement from `<1.0.0,>=0.3.63` to
`<2.0.0,>=0.3.63`
### Motivation
All main LangChain packages are now publishing `1.0.0a` prereleases.
`langchain-tests` needs a matching prerelease so downstreams can install
tests alongside the 1.0 series without conflicts.
### Tests
- Verified installation and tests against both `0.3.75` and `1.0.0a1`.
Description:
Added the content= keyword when creating SystemMessage and HumanMessage
in the messages list, making it consistent with the API reference.
### Summary
This PR updates the sentence on the "How-to guides" landing page to
replace smart (curly) quotes with straight quotes in the phrase:
> "How do I...?"
### Why This Change?
- Ensures formatting consistency across documentation
- Avoids encoding or rendering issues with smart quotes
- Matches standard Markdown and inline code formatting
This is a small change, but improves clarity and polish on a key landing
page.
Change "Linkedin" to "LinkedIn" to be consistent with LinkedIn's
spelling.
Thank you for contributing to LangChain! Follow these steps to mark your
pull request as ready for review. **If any of these steps are not
completed, your PR will not be considered for review.**
- [x] **Add tests and docs**: If you're adding a new integration, you
must include:
1. A test for the integration, preferably unit tests that do not rely on
network access,
2. An example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. **We will not consider
a PR unless these three are passing in CI.** See [contribution
guidelines](https://python.langchain.com/docs/contributing/) for more.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to `pyproject.toml` files (even
optional ones) unless they are **required** for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
Adding `create_react_agent` and introducing `langchain.agents`!
## Enhanced Structured Output
`create_react_agent` supports coercion of outputs to structured data
types like `pydantic` models, dataclasses, typed dicts, or JSON schemas
specifications.
### Structural Changes
In langgraph < 1.0, `create_react_agent` implemented support for
structured output via an additional LLM call to the model after the
standard model / tool calling loop finished. This introduced extra
expense and was unnecessary.
This new version implements structured output support in the main loop,
allowing a model to choose between calling tools or generating
structured output (or both).
The same basic pattern for structured output generation works:
```py
from langchain.agents import create_react_agent
from langchain_core.messages import HumanMessage
from pydantic import BaseModel
class Weather(BaseModel):
temperature: float
condition: str
def weather_tool(city: str) -> str:
"""Get the weather for a city."""
return f"it's sunny and 70 degrees in {city}"
agent = create_react_agent("openai:gpt-4o-mini", tools=[weather_tool], response_format=Weather)
print(repr(result["structured_response"]))
#> Weather(temperature=70.0, condition='sunny')
```
### Advanced Configuration
The new API exposes two ways to configure how structured output is
generated. Under the hood, LangChain will attempt to pick the best
approach if not explicitly specified. That is, if provider native
support is available for a given model, that takes priority over
artificial tool calling.
1. Artificial tool calling (the default for most models)
LangChain generates a tool (or tools) under the hood that match the
schema of your response format. When the model calls those tools,
LangChain coerces the args to the desired format. Note, LangChain does
not validate outputs adhering to JSON schema specifications.
<details>
<summary>Extended example</summary>
```py
from langchain.agents import create_react_agent
from langchain_core.messages import HumanMessage
from langchain.agents.structured_output import ToolStrategy
from pydantic import BaseModel
class Weather(BaseModel):
temperature: float
condition: str
def weather_tool(city: str) -> str:
"""Get the weather for a city."""
return f"it's sunny and 70 degrees in {city}"
agent = create_react_agent(
"openai:gpt-4o-mini",
tools=[weather_tool],
response_format=ToolStrategy(
schema=Weather, tool_message_content="Final Weather result generated"
),
)
result = agent.invoke({"messages": [HumanMessage("What's the weather in Tokyo?")]})
for message in result["messages"]:
message.pretty_print()
"""
================================ Human Message =================================
What's the weather in Tokyo?
================================== Ai Message ==================================
Tool Calls:
weather_tool (call_Gg933BMHMwck50Q39dtBjXm7)
Call ID: call_Gg933BMHMwck50Q39dtBjXm7
Args:
city: Tokyo
================================= Tool Message =================================
Name: weather_tool
it's sunny and 70 degrees in Tokyo
================================== Ai Message ==================================
Tool Calls:
Weather (call_9xOkYUM7PuEXl9DQq9sWGv5l)
Call ID: call_9xOkYUM7PuEXl9DQq9sWGv5l
Args:
temperature: 70
condition: sunny
================================= Tool Message =================================
Name: Weather
Final Weather result generated
"""
print(repr(result["structured_response"]))
#> Weather(temperature=70.0, condition='sunny')
```
</details>
2. Provider implementations (limited to OpenAI, Groq)
Some providers support structured output generating directly. For those
cases, we offer the `ProviderStrategy` hint:
<details>
<summary>Extended example</summary>
```py
from langchain.agents import create_react_agent
from langchain_core.messages import HumanMessage
from langchain.agents.structured_output import ProviderStrategy
from pydantic import BaseModel
class Weather(BaseModel):
temperature: float
condition: str
def weather_tool(city: str) -> str:
"""Get the weather for a city."""
return f"it's sunny and 70 degrees in {city}"
agent = create_react_agent(
"openai:gpt-4o-mini",
tools=[weather_tool],
response_format=ProviderStrategy(Weather),
)
result = agent.invoke({"messages": [HumanMessage("What's the weather in Tokyo?")]})
for message in result["messages"]:
message.pretty_print()
"""
================================ Human Message =================================
What's the weather in Tokyo?
================================== Ai Message ==================================
Tool Calls:
weather_tool (call_OFJq1FngIXS6cvjWv5nfSFZp)
Call ID: call_OFJq1FngIXS6cvjWv5nfSFZp
Args:
city: Tokyo
================================= Tool Message =================================
Name: weather_tool
it's sunny and 70 degrees in Tokyo
================================== Ai Message ==================================
{"temperature":70,"condition":"sunny"}
Weather(temperature=70.0, condition='sunny')
"""
print(repr(result["structured_response"]))
#> Weather(temperature=70.0, condition='sunny')
```
Note! The final tool message has the custom content provided by the dev.
</details>
Prompted output was previously supported and is no longer supported via
the `response_format` argument to `create_react_agent`. If there's
significant demand for this, we'd be happy to engineer a solution.
## Error Handling
`create_react_agent` now exposes an API for managing errors associated
with structured output generation. There are two common problems with
structured output generation (w/ artificial tool calling):
1. **Parsing error** -- the model generates data that doesn't match the
desired structure for the output
2. **Multiple tool calls error** -- the model generates 2 or more tool
calls associated with structured output schemas
A developer can control the desired behavior for this via the
`handle_errors` arg to `ToolStrategy`.
<details>
<summary>Extended example</summary>
```py
from langchain_core.messages import HumanMessage
from pydantic import BaseModel
from langchain.agents import create_react_agent
from langchain.agents.structured_output import StructuredOutputValidationError, ToolStrategy
class Weather(BaseModel):
temperature: float
condition: str
def weather_tool(city: str) -> str:
"""Get the weather for a city."""
return f"it's sunny and 70 degrees in {city}"
def handle_validation_error(error: Exception) -> str:
if isinstance(error, StructuredOutputValidationError):
return (
f"Please call the {error.tool_name} call again with the correct arguments. "
f"Your mistake was: {error.source}"
)
raise error
agent = create_react_agent(
"openai:gpt-5",
tools=[weather_tool],
response_format=ToolStrategy(
schema=Weather,
handle_errors=handle_validation_error,
),
)
```
</details>
## Error Handling for Tool Calling
Tools fail for two main reasons:
1. **Invocation failure** -- the args generated by the model for the
tool are incorrect (missing, incompatible data types, etc)
2. **Execution failure** -- the tool execution itself fails due to a
developer error, network error, or some other exception.
By default, when tool **invocation** fails, the react agent will return
an artificial `ToolMessage` to the model asking it to correct its
mistakes and retry.
Now, when tool **execution** fails, the react agent raises the
`ToolException` by default instead of asking the model to retry. This
helps to avoid looping that should be avoided due to the aforementioned
issues.
Developers can configure their desired behavior for retries / error
handling via the `handle_tool_errors` arg to `ToolNode`.
## Pre-Bound Models
`create_react_agent` no longer supports inputs to `model` that have been
pre-bound w/ tools or other configuration. To properly support
structured output generation, the agent itself needs the power to bind
tools + structured output kwargs.
This also makes the devx cleaner - it's always expected that `model` is
an instance of `BaseChatModel` (or `str` that we coerce into a chat
model instance).
Dynamic model functions can return a pre-bound model **IF** structured
output is not also used. Dynamic model functions can then bind tools /
structured output logic.
## Import Changes
Users should now use `create_react_agent` from `langchain.agents`
instead of `langgraph.prebuilts`.
Other imports have a similar migration path, `ToolNode` and `AgentState`
for example.
* `chat_agent_executor.py` -> `react_agent.py`
Some notes:
1. Disabled blockbuster + some linting in `langchain/agents` -- beyond
ideal, but necessary to get this across the line for the alpha. We
should re-enable before official release.
- **Description:** Updated Docker command to use ClickHouse 25.7 (has
`vector_similarity` index support). Added `CLICKHOUSE_SKIP_USER_SETUP=1`
env param to [bypass default user
setup](https://clickhouse.com/docs/install/docker#managing-default-user)
and allow external network access. There was also a bug where if you try
to access results using `similarity_search_with_relevance_scores`, they
need to unpacked first.
- **Issue:** Fixes#32094 if someone following tutorial with default
Clickhouse configurations.
# Description
Updated documentation to reflect Microsoft’s rebranding of Azure AI
Studio to Azure AI Foundry. This ensures consistency with current Azure
terminology across the docs.
# Issue
N/A
# Dependencies
None
The async version of the test should use the `ayield_keys` method
instead of `yield_keys`.
Otherwise tools such as `blockbuster` may trigger on a blocking call.
**Description:**
Fixed corrupted text in the code cell output of the documentation
notebook. The code cell itself was correct, but the saved output
contained garbage text.
**Issue:**
The saved output in the documentation notebook contained garbage/typo
text in the table name.
**Dependencies:**
None
Having vercel attempt to deploy on each commit (even if unrelated to
docs) was getting annoying. Options:
- `[skip-preview]`
- `[no-preview]`
- `[skip-deploy]`
Full example: `fix(core): resolve memory leak [no-preview]`
* Create usage metadata on
[`message_delta`](https://docs.anthropic.com/en/docs/build-with-claude/streaming#event-types)
instead of at the beginning. Consequently, token counts are not included
during streaming but instead at the end. This allows for accurate
reporting of server-side tool usage (important for billing)
* Add some clarifying comments
* Fix some outstanding Pylance warnings
* Remove unnecessary `text` popping in thinking blocks
* Also now correctly reports `input_cache_read`/`input_cache_creation`
as a result
When citations are returned from streaming, they include a `file_id:
null` field in their `content_block_location` structure.
When these citations are passed back to the API in subsequent messages,
the API rejects them with "Extra inputs are not permitted" for the
`file_id` field.
**Description:**
Corrected LangGraph documentation link (changed to “guides”), and added
a link to LangGraph JS how-to guides for clarity.
**Issue:**
N/A
**Dependencies:**
None
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
The appropriate `ToolNode` attribute for error handling is called
`handle_tool_errors` instead of `handle_tool_error`.
For further info see [ToolNode source code in
LangGraph](https://github.com/langchain-ai/langgraph/blob/main/libs/prebuilt/langgraph/prebuilt/tool_node.py#L255)
**Twitter handle:** gitaroktato
- [x] **Add tests and docs**: If you're adding a new integration, you
must include:
1. A test for the integration, preferably unit tests that do not rely on
network access,
2. An example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. **We will not consider
a PR unless these three are passing in CI.** See [contribution
guidelines](https://python.langchain.com/docs/contributing/) for more.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to `pyproject.toml` files (even
optional ones) unless they are **required** for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
## Description
This PR adds support for custom header patterns in
`MarkdownHeaderTextSplitter`, allowing users to define non-standard
Markdown header formats (like `**Header**`) and specify their hierarchy
levels.
**Issue:** Fixes#22738
**Dependencies:** None - this change has no new dependencies
**Key Changes:**
- Added optional `custom_header_patterns` parameter to support
non-standard header formats
- Enable splitting on patterns like `**Header**` and `***Header***`
- Maintain full backward compatibility with existing usage
- Added comprehensive tests for custom and mixed header scenarios
## Example Usage
```python
from langchain_text_splitters import MarkdownHeaderTextSplitter
headers_to_split_on = [
("**", "Chapter"),
("***", "Section"),
]
custom_header_patterns = {
"**": 1, # Level 1 headers
"***": 2, # Level 2 headers
}
splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=headers_to_split_on,
custom_header_patterns=custom_header_patterns,
)
# Now **Chapter 1** is treated as a level 1 header
# And ***Section 1.1*** is treated as a level 2 header
```
## Testing
- ✅ Added unit tests for custom header patterns
- ✅ Added tests for mixed standard and custom headers
- ✅ All existing tests pass (backward compatibility maintained)
- ✅ Linting and formatting checks pass
---
The implementation provides a flexible solution while maintaining the
simplicity of the existing API. Users can continue using the splitter
exactly as before, with the new functionality being entirely opt-in
through the `custom_header_patterns` parameter.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Claude <noreply@anthropic.com>
Supersedes #32461
Fixed incorrect input token reporting during streaming when tools are
used. Previously, input tokens were counted at `message_start` before
tool execution, leading to inaccurate counts. Now input tokens are
properly deferred until `message_delta` (completion), aligning with
Anthropic's billing model and SDK expectations.
**Before Fix:**
- Streaming with tools: Input tokens = 0 ❌
- Non-streaming with tools: Input tokens = 472 ✅
**After Fix:**
- Streaming with tools: Input tokens = 472 ✅
- Non-streaming with tools: Input tokens = 472 ✅
Aligns with Anthropic's SDK expectations. The SDK handles input token
updates in `message_delta` events:
```python
# https://github.com/anthropics/anthropic-sdk-python/blob/main/src/anthropic/lib/streaming/_messages.py
if event.usage.input_tokens is not None:
current_snapshot.usage.input_tokens = event.usage.input_tokens
```
Supersedes #32544
Changes to the `trimmer` behavior resulted in the call `"What math
problem was asked?"` to no longer see the relevant query due to the
number of the queries' tokens. Adjusted to not trigger trimming the
relevant part of the message history. Also, add print to the trimmer to
increase observability on what is leaving the context window.
Add note to trimming tut & format links as inline
Thank you for contributing to LangChain! Follow these steps to mark your
pull request as ready for review. **If any of these steps are not
completed, your PR will not be considered for review.**
- [x] **PR title**: Follows the format: {TYPE}({SCOPE}): {DESCRIPTION}
- Examples:
- feat(core): add multi-tenant support
- fix(cli): resolve flag parsing error
- docs(openai): update API usage examples
- Allowed `{TYPE}` values:
- feat, fix, docs, style, refactor, perf, test, build, ci, chore,
revert, release
- Allowed `{SCOPE}` values (optional):
- core, cli, langchain, standard-tests, docs, anthropic, chroma,
deepseek, exa, fireworks, groq, huggingface, mistralai, nomic, ollama,
openai, perplexity, prompty, qdrant, xai
- Note: the `{DESCRIPTION}` must not start with an uppercase letter.
- Once you've written the title, please delete this checklist item; do
not include it in the PR.
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change. Include a [closing
keyword](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)
if applicable to a relevant issue.
- **Issue:** the issue # it fixes, if applicable (e.g. Fixes#123)
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, you
must include:
1. A test for the integration, preferably unit tests that do not rely on
network access,
2. An example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. **We will not consider
a PR unless these three are passing in CI.** See [contribution
guidelines](https://python.langchain.com/docs/contributing/) for more.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to `pyproject.toml` files (even
optional ones) unless they are **required** for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Enhance the integrations table by adding the `js:
'@langchain/community'` reference for several packages and updating the
titles of specific integrations to avoid improper capitalization
Supersedes #32408
Description:
This PR ensures that tool calls without explicitly provided `args` will
default to an empty dictionary (`{}`), allowing tools with no parameters
(e.g. `def foo() -> str`) to be registered and invoked without
validation errors. This change improves compatibility with agent
frameworks that may omit the `args` field when generating tool calls.
Issue:
See
[langgraph#5722](https://github.com/langchain-ai/langgraph/issues/5722)
–
LangGraph currently emits tool calls without `args`, which leads to
validation errors
when tools with no parameters are invoked. This PR ensures compatibility
by defaulting
`args` to `{}` when missing.
Dependencies:
None
---------
Thank you for contributing to LangChain! Follow these steps to mark your
pull request as ready for review. **If any of these steps are not
completed, your PR will not be considered for review.**
- [ ] **PR title**: Follows the format: {TYPE}({SCOPE}): {DESCRIPTION}
- Examples:
- feat(core): add multi-tenant support
- fix(cli): resolve flag parsing error
- docs(openai): update API usage examples
- Allowed `{TYPE}` values:
- feat, fix, docs, style, refactor, perf, test, build, ci, chore,
revert, release
- Allowed `{SCOPE}` values (optional):
- core, cli, langchain, standard-tests, docs, anthropic, chroma,
deepseek, exa, fireworks, groq, huggingface, mistralai, nomic, ollama,
openai, perplexity, prompty, qdrant, xai
- Note: the `{DESCRIPTION}` must not start with an uppercase letter.
- Once you've written the title, please delete this checklist item; do
not include it in the PR.
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change. Include a [closing
keyword](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)
if applicable to a relevant issue.
- **Issue:** the issue # it fixes, if applicable (e.g. Fixes#123)
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, you
must include:
1. A test for the integration, preferably unit tests that do not rely on
network access,
2. An example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. **We will not consider
a PR unless these three are passing in CI.** See [contribution
guidelines](https://python.langchain.com/docs/contributing/) for more.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to `pyproject.toml` files (even
optional ones) unless they are **required** for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
---------
Signed-off-by: jitokim <pigberger70@gmail.com>
Co-authored-by: jito <pigberger70@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
**Description**
Corrected a typo in the Ollama chatbot example output in
`docs/docs/integrations/chat/ollama.ipynb` where `"got-oss"` was
mistakenly used instead of `"gpt-oss"`.
No functional changes to code; documentation-only update.
All notebook outputs were cleared to keep the diff minimal.
**Issue**
N/A
**Dependencies**
None
**Twitter handle**
N/A
Thank you for contributing to LangChain! Follow these steps to mark your
pull request as ready for review. **If any of these steps are not
completed, your PR will not be considered for review.**
- [x] **PR title**: Follows the format: {TYPE}({SCOPE}): {DESCRIPTION}
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
fix#30146
- [x] **Add tests and docs**: If you're adding a new integration, you
must include:
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. **We will not consider
a PR unless these three are passing in CI.** See [contribution
guidelines](https://python.langchain.com/docs/contributing/) for more.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to `pyproject.toml` files (even
optional ones) unless they are **required** for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
```python
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-3-5-haiku-latest")
caching_llm = llm.bind(cache_control={"type": "ephemeral"})
caching_llm.invoke(
[
HumanMessage("..."),
AIMessage("..."),
HumanMessage("..."), # <-- final message / content block gets cache annotation
]
)
```
Potentially useful given's Anthropic's [incremental
caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#continuing-a-multi-turn-conversation)
capabilities:
> During each turn, we mark the final block of the final message with
cache_control so the conversation can be incrementally cached. The
system will automatically lookup and use the longest previously cached
prefix for follow-up messages.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
This commit removes redundant integration info from details page,
additionally, changing reference from "DigitalOcean GradientAI" to
"DigitalOcean Gradient™ AI" and updating the setup instructions
accordingly.
**Description:**
Two broken links were reported by another LangChain employee. This PR
fixes those links.
Fixed and tested locally.
**Dependencies:**
None
This PR adds documentation for integrating [TrueFoundry’s AI
Gateway](https://www.truefoundry.com/ai-gateway) with Langfuse using the
Langraph OpenAI SDK.
The integration sends requests through TrueFoundry’s AI Gateway for
unified governance, observability, and routing, while Langraph runs on
the client side to capture execution traces and telemetry.
- Issue: N/A
- Dependencies: None
- Twitter - https://x.com/truefoundry
tests - Not applicable
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- **Description:** Integrated the Scrapeless package to enable Langchain
users to seamlessly incorporate Scrapeless into their agents.
- **Dependencies:** None
- **Twitter handle:** [Scrapelessteam](https://x.com/Scrapelessteam)
- [x] **Add tests and docs**: If you're adding a new integration, you
must include:
1. A test for the integration, preferably unit tests that do not rely on
network access,
2. An example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See [contribution
guidelines](https://python.langchain.com/docs/contributing/) for more.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to `pyproject.toml` files (even
optional ones) unless they are **required** for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
# Description
This PR updates the docs for the
[langchain-anchorbrowser](https://pypi.org/project/langchain-anchorbrowser/)
package. It adds a few tools
[Anchor Browser](https://anchorbrowser.io/?utm=langchain) is the
platform for AI Agentic browser automation, which solves the challenge
of automating workflows for web applications that lack APIs or have
limited API coverage. It simplifies the creation, deployment, and
management of browser-based automations, transforming complex web
interactions into simple API endpoints.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
This PR introduces a new Google partner guide for MCP Toolbox. The
primary goal of this new documentation is to enhance the discoverability
of MCP Toolbox for developers working within the Google ecosystem,
providing them with a clear and direct path to using our tools.
> [!IMPORTANT]
> This PR contains link to a page which is added in #32344. This will
cause deployment failure until that PR is merged.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
This PR introduces a new integration guide for MCP Toolbox. The primary
goal of this new documentation is to enhance the discoverability of MCP
Toolbox for developers working within the LangChain ecosystem, providing
them with a clear and direct path to using our tools.
This approach was chosen to provide users with a practical, hands-on
example that they can easily follow.
> [!NOTE]
> The page added in this PR is linked to from a section in Google
partners page added in #32356.
---------
Co-authored-by: Lauren Hirata Singh <lauren@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
In [Rag Part 1
Tutorial](https://python.langchain.com/docs/tutorials/rag/), when QDrant
vector store is selected, the sample code does not work
It fails with error `ValueError: Collection test not found`
So, this fix is creating that collection and ensuring its dimension size
is matching the selection the embedding size of the selected LLM Model
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
```messages_to_pass = [
HumanMessage(content="What's the capital of France?"),
AIMessage(content="The capital of France is Paris."),
HumanMessage(content="And what about Germany?")
]
formatted_prompt = prompt_template.invoke({"msgs": messages_to_pass})
print(formatted_prompt)```
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
**Description:**
I've added a small clarification to the chatbot tutorial. The tutorial
mentions setting the `LANGSMITH_API_KEY`, but doesn't explain how a new
user can get the key from the website. This change adds a brief note to
guide them to the Settings page.
P.S. This is my first pull request, so I'm excited to learn and
contribute!
**Issue:**
N/A
**Dependencies:**
N/A
**Twitter handle:**
@sohamactive
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Closes#32320
This PR updates the `langgraph_agentic_rag.ipynb` notebook to clarify
that LangGraph does not automatically prepend a `SystemMessage`. A
markdown note and an inline Python comment have been added to guide
users to explicitly include a `SystemMessage` when needed.
This improves documentation for developers working with LangGraph-based
agents and avoids confusion about system-level behavior not being
applied.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Bumps
[actions/download-artifact](https://github.com/actions/download-artifact)
from 4 to 5.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/download-artifact/releases">actions/download-artifact's
releases</a>.</em></p>
<blockquote>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@nebuk89</code></a> in <a
href="https://redirect.github.com/actions/download-artifact/pull/407">actions/download-artifact#407</a></li>
<li>BREAKING fix: inconsistent path behavior for single artifact
downloads by ID by <a
href="https://github.com/GrantBirki"><code>@GrantBirki</code></a> in <a
href="https://redirect.github.com/actions/download-artifact/pull/416">actions/download-artifact#416</a></li>
</ul>
<h2>v5.0.0</h2>
<h3>🚨 Breaking Change</h3>
<p>This release fixes an inconsistency in path behavior for single
artifact downloads by ID. <strong>If you're downloading single artifacts
by ID, the output path may change.</strong></p>
<h4>What Changed</h4>
<p>Previously, <strong>single artifact downloads</strong> behaved
differently depending on how you specified the artifact:</p>
<ul>
<li><strong>By name</strong>: <code>name: my-artifact</code> → extracted
to <code>path/</code> (direct)</li>
<li><strong>By ID</strong>: <code>artifact-ids: 12345</code> → extracted
to <code>path/my-artifact/</code> (nested)</li>
</ul>
<p>Now both methods are consistent:</p>
<ul>
<li><strong>By name</strong>: <code>name: my-artifact</code> → extracted
to <code>path/</code> (unchanged)</li>
<li><strong>By ID</strong>: <code>artifact-ids: 12345</code> → extracted
to <code>path/</code> (fixed - now direct)</li>
</ul>
<h4>Migration Guide</h4>
<h5>✅ No Action Needed If:</h5>
<ul>
<li>You download artifacts by <strong>name</strong></li>
<li>You download <strong>multiple</strong> artifacts by ID</li>
<li>You already use <code>merge-multiple: true</code> as a
workaround</li>
</ul>
<h5>⚠️ Action Required If:</h5>
<p>You download <strong>single artifacts by ID</strong> and your
workflows expect the nested directory structure.</p>
<p><strong>Before v5 (nested structure):</strong></p>
<pre lang="yaml"><code>- uses: actions/download-artifact@v4
with:
artifact-ids: 12345
path: dist
# Files were in: dist/my-artifact/
</code></pre>
<blockquote>
<p>Where <code>my-artifact</code> is the name of the artifact you
previously uploaded</p>
</blockquote>
<p><strong>To maintain old behavior (if needed):</strong></p>
<pre lang="yaml"><code></tr></table>
</code></pre>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="634f93cb29"><code>634f93c</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/download-artifact/issues/416">#416</a>
from actions/single-artifact-id-download-path</li>
<li><a
href="b19ff43027"><code>b19ff43</code></a>
refactor: resolve download path correctly in artifact download tests
(mainly ...</li>
<li><a
href="e262cbee4a"><code>e262cbe</code></a>
bundle dist</li>
<li><a
href="bff23f9308"><code>bff23f9</code></a>
update docs</li>
<li><a
href="fff8c148a8"><code>fff8c14</code></a>
fix download path logic when downloading a single artifact by id</li>
<li><a
href="448e3f862a"><code>448e3f8</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/download-artifact/issues/407">#407</a>
from actions/nebuk89-patch-1</li>
<li><a
href="47225c44b3"><code>47225c4</code></a>
Update README.md</li>
<li>See full diff in <a
href="https://github.com/actions/download-artifact/compare/v4...v5">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
**Description:**
In the `docs/docs/how_to/structured_output.ipynb` notebook, an
`AIMessage` within the tool-calling few-shot example was missing the
`name="example_assistant"` parameter. This was inconsistent with the
other `AIMessage` instances in the same list.
This change adds the missing `name` parameter to ensure all examples in
the section are consistent, improving the clarity and correctness of the
documentation.
**Issue:** N/A
**Dependencies:** N/A
While trying the line People.schema got a warning.
```The `schema` method is deprecated; use `model_json_schema` instead```
So made the changes and now working file.
Thank you for contributing to LangChain! Follow these steps to mark your pull request as ready for review. **If any of these steps are not completed, your PR will not be considered for review.**
- [ ] **PR title**: Follows the format: {TYPE}({SCOPE}): {DESCRIPTION}
- Examples:
- feat(core): add multi-tenant support
- fix(cli): resolve flag parsing error
- docs(openai): update API usage examples
- Allowed `{TYPE}` values:
- feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert, release
- Allowed `{SCOPE}` values (optional):
- core, cli, langchain, standard-tests, docs, anthropic, chroma, deepseek, exa, fireworks, groq, huggingface, mistralai, nomic, ollama, openai, perplexity, prompty, qdrant, xai
- Note: the `{DESCRIPTION}` must not start with an uppercase letter.
- Once you've written the title, please delete this checklist item; do not include it in the PR.
- [ ] **PR message**: ***Delete this entire checklist*** and replace with
- **Description:** a description of the change. Include a [closing keyword](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword) if applicable to a relevant issue.
- **Issue:** the issue # it fixes, if applicable (e.g. Fixes#123)
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, you must include:
1. A test for the integration, preferably unit tests that do not rely on network access,
2. An example notebook showing its use. It lives in `docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. **We will not consider a PR unless these three are passing in CI.** See [contribution guidelines](https://python.langchain.com/docs/contributing/) for more.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to `pyproject.toml` files (even optional ones) unless they are **required** for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
Description:
Corrected the guide title from "How deal with high cardinality
categoricals" to "How to deal with high-cardinality categoricals".
- Added missing "to" for grammatical correctness.
- Hyphenated "high-cardinality" for standard compound adjective usage.
Issue:
N/A
Dependencies:
None
Twitter handle:
https://x.com/mishraravibhush
**Description**
Updated the quick setup instructions for JaguarDB in the documentation.
Replaced the outdated Docker image `jaguardb/jaguardb_with_http` with
the current recommended image `jaguardb/jaguardb` for pulling and
running the server.
Not all retrievers use `k` as param name to set the number of results to
return. Even in LangChain itself. Eg:
bc4251b9e0/libs/core/langchain_core/indexing/in_memory.py (L31)
So it's helpful to be able to change it for a given retriever.
The change also adds hints to disable the tests if the retriever doesn't
support setting the param in the constructor or in the invoke method
(for instance, the `InMemoryDocumentIndex` in the link supports in the
constructor but not in the invoke method).
This change is backward compatible.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
**Description:** fix an issue I discovered when attempting to merge
messages in which one message has an `index` key in its content
dictionary and another does not.
**Description:** This PR improves the contribution setup guide by adding
comprehensive Windows-specific instructions. The changes address a
common pain point for Windows contributors who don't have `make`
installed by default, making the LangChain contribution process more
accessible across different operating systems.
The main improvements include:
- Added a dedicated "Windows Users" section with multiple installation
options for `make` (Chocolatey, Scoop, WSL)
- Provided direct `uv` commands as alternatives to all `make` commands
throughout the setup guide
- Included Windows-specific instructions for testing, formatting,
linting, and spellchecking
- Enhanced the documentation to be more inclusive for Windows developers
This change makes it easier for Windows users to contribute to LangChain
without requiring additional tool installation, while maintaining the
existing workflow for users who already have `make` available.
**Issue:** This addresses the common barrier Windows users face when
trying to contribute to LangChain due to missing `make` commands.
**Dependencies:** None required - this is purely a documentation
improvement.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
## **Description:**
Updated incorrect package names across multiple integration docs by
replacing underscores with hyphens to reflect their actual names on
PyPI. This aligns with the actual PyPI package names and prevents
potential confusion or installation issues.
## **Issue:** N/A
## **Dependencies:** None
## **Twitter handle:** N/A
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
langchain-gradientai is Digitalocean's integration with Langchain. It
will help users to build langchain applications using Digitalocean's
GradientAI platform.
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Description:
Fixed minor typos in the `google_imagen.ipynb` integration notebook
related to image generation prompt formatting. No functional changes
were made — just a documentation correction to improve clarity.
## **Description:**
Updated incorrect package names in `FeatureTables.js` by replacing
underscores with hyphens to reflect their actual names on PyPI. This
aligns with the actual PyPI package names and prevents potential
confusion or installation issues.
The following package names were corrected:
- `langchain_aws` ➝ `langchain-aws`
- `langchain_community` ➝ `langchain-community`
- `langchain_elasticsearch` ➝ `langchain-elasticsearch`
- `langchain_google_community` ➝ `langchain-google-community`
## **Issue:** N/A
## **Dependencies:** None
## **Twitter handle:** N/A
Description: Documentation is inconsistent with API docs.
Current documentation implies that to use the integration you must have
credentials configured AND store the path to a service account JSON
file.
API docs explain that you must only complete EITHER of the steps
regarding credentials.
I have updated the docs to make them consistent with the API wording.
## **Description:**
Refactored multiple entries in `kv_store_feat_table.py` to ensure that
all vector store metadata is accurate, consistent, and aligned with
LangChain's latest documentation structure and PyPI naming standards.
**Key improvements across all updated entries:**
- Updated `class` links to point to their respective **docs-based
integration pages** (e.g., `/docs/integrations/stores/...`) instead of
raw API reference URLs.
- Corrected `package` display names to use **hyphenated PyPI-compliant
names** (e.g., `langchain-astradb` instead of `langchain_astradb`).
- Updated `package` links to point to the **specific class-level API
references** (e.g., `/api_reference/.../storage/...ClassName.html`) for
precision.
These improvements enhance:
- Navigation experience for users
- Alignment with PyPI and docs naming conventions
- Clarity across LangChain’s integrations documentation
## **Issue:** N/A
## **Dependencies:** None
## **Twitter handle:** N/A
docs(alpha_vantage): add link for ALPHAVANTAGE_API_KEY generation in
integration notebook
**Description:**
This PR updates the `docs/docs/integrations/tools/alpha_vantage.ipynb`
integration notebook to help users locate the API key registration page
for Alpha Vantage. The following markdown line was added:
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
## **Description:**
This PR updates the internal documentation link for the RAG tutorials to
reflect the updated path. Previously, the link pointed to the root
`/docs/tutorials/`, which was generic. It now correctly routes to the
RAG-specific tutorial page for the following vector store docs.
1. AstraDBVectorStore
2. Clickhouse
3. CouchbaseSearchVectorStore
4. DatabricksVectorSearch
5. ElasticsearchStore
6. FAISS
7. Milvus
8. MongoDBAtlasVectorSearch
9. openGauss
10. PGVector
11. PGVectorStore
12. PineconeVectorStore
13. QdrantVectorStore
14. Redis
15. SQLServer
## **Issue:** N/A
## **Dependencies:** None
## **Twitter handle:** N/A
Fixes a streaming bug where models like Qwen3 (using OpenAI interface)
send tool call chunks with inconsistent indices, resulting in
duplicate/erroneous tool calls instead of a single merged tool call.
## Problem
When Qwen3 streams tool calls, it sends chunks with inconsistent `index`
values:
- First chunk: `index=1` with tool name and partial arguments
- Subsequent chunks: `index=0` with `name=None`, `id=None` and argument
continuation
The existing `merge_lists` function only merges chunks when their
`index` values match exactly, causing these logically related chunks to
remain separate, resulting in multiple incomplete tool calls instead of
one complete tool call.
```python
# Before fix: Results in 1 valid + 1 invalid tool call
chunk1 = AIMessageChunk(tool_call_chunks=[
{"name": "search", "args": '{"query":', "id": "call_123", "index": 1}
])
chunk2 = AIMessageChunk(tool_call_chunks=[
{"name": None, "args": ' "test"}', "id": None, "index": 0}
])
merged = chunk1 + chunk2 # Creates 2 separate tool calls
# After fix: Results in 1 complete tool call
merged = chunk1 + chunk2 # Creates 1 merged tool call: search({"query": "test"})
```
## Solution
Enhanced the `merge_lists` function in `langchain_core/utils/_merge.py`
with intelligent tool call chunk merging:
1. **Preserves existing behavior**: Same-index chunks still merge as
before
2. **Adds special handling**: Tool call chunks with
`name=None`/`id=None` that don't match any existing index are now merged
with the most recent complete tool call chunk
3. **Maintains backward compatibility**: All existing functionality
works unchanged
4. **Targeted fix**: Only affects tool call chunks, doesn't change
behavior for other list items
The fix specifically handles the pattern where:
- A continuation chunk has `name=None` and `id=None` (indicating it's
part of an ongoing tool call)
- No matching index is found in existing chunks
- There exists a recent tool call chunk with a valid name or ID to merge
with
## Testing
Added comprehensive test coverage including:
- ✅ Qwen3-style chunks with different indices now merge correctly
- ✅ Existing same-index behavior preserved
- ✅ Multiple distinct tool calls remain separate
- ✅ Edge cases handled (empty chunks, orphaned continuations)
- ✅ Backward compatibility maintained
Fixes#31511.
<!-- START COPILOT CODING AGENT TIPS -->
---
💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
## Problem
ChatLiteLLM encounters a `ValidationError` when using cache on
subsequent calls, causing the following error:
```
ValidationError(model='ChatResult', errors=[{'loc': ('generations', 0, 'type'), 'msg': "unexpected value; permitted: 'ChatGeneration'", 'type': 'value_error.const', 'ctx': {'given': 'Generation', 'permitted': ('ChatGeneration',)}}])
```
This occurs because:
1. The cache stores `Generation` objects (with `type="Generation"`)
2. But `ChatResult` expects `ChatGeneration` objects (with
`type="ChatGeneration"` and a required `message` field)
3. When cached values are retrieved, validation fails due to the type
mismatch
## Solution
Added graceful handling in both sync (`_generate_with_cache`) and async
(`_agenerate_with_cache`) cache methods to:
1. **Detect** when cached values contain `Generation` objects instead of
expected `ChatGeneration` objects
2. **Convert** them to `ChatGeneration` objects by wrapping the text
content in an `AIMessage`
3. **Preserve** all original metadata (`generation_info`)
4. **Allow** `ChatResult` creation to succeed without validation errors
## Example
```python
# Before: This would fail with ValidationError
from langchain_community.chat_models import ChatLiteLLM
from langchain_community.cache import SQLiteCache
from langchain.globals import set_llm_cache
set_llm_cache(SQLiteCache(database_path="cache.db"))
llm = ChatLiteLLM(model_name="openai/gpt-4o", cache=True, temperature=0)
print(llm.predict("test")) # Works fine (cache empty)
print(llm.predict("test")) # Now works instead of ValidationError
# After: Seamlessly handles both Generation and ChatGeneration objects
```
## Changes
- **`libs/core/langchain_core/language_models/chat_models.py`**:
- Added `Generation` import from `langchain_core.outputs`
- Enhanced cache retrieval logic in `_generate_with_cache` and
`_agenerate_with_cache` methods
- Added conversion from `Generation` to `ChatGeneration` objects when
needed
-
**`libs/core/tests/unit_tests/language_models/chat_models/test_cache.py`**:
- Added test case to validate the conversion logic handles mixed object
types
## Impact
- **Backward Compatible**: Existing code continues to work unchanged
- **Minimal Change**: Only affects cache retrieval path, no API changes
- **Robust**: Handles both legacy cached `Generation` objects and new
`ChatGeneration` objects
- **Preserves Data**: All original content and metadata is maintained
during conversion
Fixes#22389.
<!-- START COPILOT CODING AGENT TIPS -->
---
💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
**Description:** Fixes incorrect `num_skipped` count in the LangChain
indexing API. The current implementation only counts documents that
already exist in RecordManager (cross-batch duplicates) but fails to
count documents removed during within-batch deduplication via
`_deduplicate_in_order()`.
This PR adds tracking of the original batch size before deduplication
and includes the difference in `num_skipped`, ensuring that `num_added +
num_skipped` equals the total number of input documents.
**Issue:** Fixes incorrect document count reporting in indexing
statistics
**Dependencies:** None
Fixes#32272
---------
Co-authored-by: Alex Feel <afilippov@spotware.com>
Ensures proper reStructuredText formatting by adding the required blank
line before closing docstring quotes, which resolves the "Block quote
ends without a blank line; unexpected unindent" warning.
- **Description:** This PR updates the internal documentation link for
the RAG tutorials to reflect the updated path. Previously, the link
pointed to the root `/docs/tutorials/`, which was generic. It now
correctly routes to the RAG-specific tutorial page.
- **Issue:** N/A
- **Dependencies:** None
- **Twitter handle:** N/A
> × No solution found when resolving dependencies:
╰─▶ Because only langchain-neo4j==0.5.0 is available and
langchain-neo4j==0.5.0 depends on neo4j-graphrag>=1.9.0, we can conclude
that all versions of langchain-neo4j depend on neo4j-graphrag>=1.9.0.
And because only neo4j-graphrag<=1.9.0 is available and
neo4j-graphrag==1.9.0 depends on pypdf>=5.1.0,<6.0.0, we can conclude
that all versions of langchain-neo4j depend on pypdf>=5.1.0,<6.0.0.
And because langchain-upstage==0.6.0 depends on pypdf>=4.2.0,<5.0.0
and only langchain-upstage==0.6.0 is available, we can conclude that
all versions of langchain-neo4j and all versions of langchain-upstage
are incompatible.
And because you require langchain-neo4j and langchain-upstage, we can
conclude that your requirements are unsatisfiable.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
**TL;DR much of the provided `Makefile` targets were broken, and any
time I wanted to preview changes locally I either had to refer to a
command Chester gave me or try waiting on a Vercel preview deployment.
With this PR, everything should behave like normal.**
Significant updates to the `Makefile` and documentation files, focusing
on improving usability, adding clear messaging, and fixing/enhancing
documentation workflows.
### Updates to `Makefile`:
#### Enhanced build and cleaning processes:
- Added informative messages (e.g., "📚 Building LangChain
documentation...") to makefile targets like `docs_build`, `docs_clean`,
and `api_docs_build` for better user feedback during execution.
- Introduced a `clean-cache` target to the `docs` `Makefile` to clear
cached dependencies and ensure clean builds.
#### Improved dependency handling:
- Modified `install-py-deps` to create a `.venv/deps_installed` marker,
preventing redundant/duplicate dependency installations and improving
efficiency.
#### Streamlined file generation and infrastructure setup:
- Added caching for the LangServe README download and parallelized
feature table generation
- Added user-friendly completion messages for targets like `copy-infra`
and `render`.
#### Documentation server updates:
- Enhanced the `start` target with messages indicating server start and
URL for local documentation viewing.
---
### Documentation Improvements:
#### Content clarity and consistency:
- Standardized section titles for consistency across documentation
files.
[[1]](diffhunk://#diff-9b1a85ea8a9dcf79f58246c88692cd7a36316665d7e05a69141cfdc50794c82aL1-R1)
[[2]](diffhunk://#diff-944008ad3a79d8a312183618401fcfa71da0e69c75803eff09b779fc8e03183dL1-R1)
- Refined phrasing and formatting in sections like "Dependency
management" and "Formatting and linting" for better readability.
[[1]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L6-R6)
[[2]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L84-R82)
#### Enhanced workflows:
- Updated instructions for building and viewing documentation locally,
including tips for specifying server ports and handling API reference
previews.
[[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L60-R94)
[[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L82-R126)
- Expanded guidance on cleaning documentation artifacts and using
linting tools effectively.
[[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L82-R126)
[[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L107-R142)
#### API reference documentation:
- Improved instructions for generating and formatting in-code
documentation, highlighting best practices for docstring writing.
[[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L107-R142)
[[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L144-R186)
---
### Minor Changes:
- Added support for a new package name (`langchain_v1`) in the API
documentation generation script.
- Fixed minor capitalization and formatting issues in documentation
files.
[[1]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L40-R40)
[[2]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L166-R160)
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Thank you for contributing to LangChain!
- **Adding documentation for PGVectorStore**:
docs: Adding documentation for the new PGVectorStore as a part of
langchain-postgres
- **Add docs**: The notebook for PGVectorStore is now added to the
directory `docs/docs/integrations`.
As a part of this change, we've also updated the VectorStore features
table and VectorStoreTabs
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
**Description:**
Fixes a bug in the file callback test where ANSI escape codes were
causing test failures. The improved test now properly handles ANSI
escape sequences by:
- Using exact string comparison instead of substring checking
- Applying the `strip_ansi` function consistently to all file contents
- Adding descriptive assertion messages
- Maintaining test coverage and backward compatibility
The changes ensure tests pass reliably even when terminal control
sequences are present in the output
**Issue:** Fixes#32150
**Dependencies:** None required - uses existing dependencies only.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
This PR addresses the common issue where users struggle to pass custom
parameters to OpenAI-compatible APIs like LM Studio, vLLM, and others.
The problem occurs when users try to use `model_kwargs` for custom
parameters, which causes API errors.
## Problem
Users attempting to pass custom parameters (like LM Studio's `ttl`
parameter) were getting errors:
```python
# ❌ This approach fails
llm = ChatOpenAI(
base_url="http://localhost:1234/v1",
model="mlx-community/QwQ-32B-4bit",
model_kwargs={"ttl": 5} # Causes TypeError: unexpected keyword argument 'ttl'
)
```
## Solution
The `extra_body` parameter is the correct way to pass custom parameters
to OpenAI-compatible APIs:
```python
# ✅ This approach works correctly
llm = ChatOpenAI(
base_url="http://localhost:1234/v1",
model="mlx-community/QwQ-32B-4bit",
extra_body={"ttl": 5} # Custom parameters go in extra_body
)
```
## Changes Made
1. **Enhanced Documentation**: Updated the `extra_body` parameter
docstring with comprehensive examples for LM Studio, vLLM, and other
providers
2. **Added Documentation Section**: Created a new "OpenAI-compatible
APIs" section in the main class docstring with practical examples
3. **Unit Tests**: Added tests to verify `extra_body` functionality
works correctly:
- `test_extra_body_parameter()`: Verifies custom parameters are included
in request payload
- `test_extra_body_with_model_kwargs()`: Ensures `extra_body` and
`model_kwargs` work together
4. **Clear Guidance**: Documented when to use `extra_body` vs
`model_kwargs`
## Examples Added
**LM Studio with TTL (auto-eviction):**
```python
ChatOpenAI(
base_url="http://localhost:1234/v1",
api_key="lm-studio",
model="mlx-community/QwQ-32B-4bit",
extra_body={"ttl": 300} # Auto-evict after 5 minutes
)
```
**vLLM with custom sampling:**
```python
ChatOpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY",
model="meta-llama/Llama-2-7b-chat-hf",
extra_body={
"use_beam_search": True,
"best_of": 4
}
)
```
## Why This Works
- `model_kwargs` parameters are passed directly to the OpenAI client's
`create()` method, causing errors for non-standard parameters
- `extra_body` parameters are included in the HTTP request body, which
is exactly what OpenAI-compatible APIs expect for custom parameters
Fixes#32115.
<!-- START COPILOT CODING AGENT TIPS -->
---
💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Further clean up of namespace:
- Removed prompts (we'll re-add in a separate commit)
- Remove LocalFileStore until we can review whether all the
implementation details are necessary
- Remove message processing logic from memory (we'll figure out where to
expose it)
- Remove `Tool` primitive (should be sufficient to use `BaseTool` for
typing purposes)
- Remove utilities to create kv stores. Unclear if they've had much
usage outside MultiparentRetriever
This PR adds scaffolding for langchain 1.0 entry package.
Most contents have been removed.
Currently remaining entrypoints for:
* chat models
* embedding models
* memory -> trimming messages, filtering messages and counting tokens
[we may remove this]
* prompts -> we may remove some prompts
* storage: primarily to support cache backed embeddings, may remove the
kv store
* tools -> report tool primitives
Things to be added:
* Selected agent implementations
* Selected workflows
* Common primitives: messages, Document
* Primitives for type hinting: BaseChatModel, BaseEmbeddings
* Selected retrievers
* Selected text splitters
Things to be removed:
* Globals needs to be removed (needs an update in langchain core)
Todos:
* TBD indexing api (requires sqlalchemy which we don't want as a
dependency)
* Be explicit about public/private interfaces (e.g., likely rename
chat_models.base.py to something more internal)
* Remove dockerfiles
* Update module doc-strings and README.md
The `_dereference_refs_helper` in `langchain_core.utils.json_schema`
incorrectly handled objects with a reference and other fields.
**Issue**: #32170
# Description
We change the check so that it accepts other keys in the object.
## Summary
- Fixed redundant word "done" in SECURITY.md line 69
- Fixed grammar errors in Fireworks README.md line 77: "how it fares
compares" → "how it compares" and "in terms just" → "in terms of"
## Test plan
- [x] Verified changes improve readability and correct grammar
- [x] No functional changes, documentation only
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Multiple models were
[retired](https://docs.anthropic.com/en/docs/about-claude/model-deprecations#model-status)
yesterday.
Tests remain broken until we figure out what to do with the legacy
Anthropic LLM integration— currently uses their (legacy) text
completions API, for which there appear to be no remaining supported
models.
* Adding support for more Chroma client options (`HttpClient` and
`CloundClient`). This includes adding arguments necessary for
instantiating these clients.
* Adding support for Chroma's new persisted collection configuration (we
moved index configuration into this new construct).
* Delegate `Settings` configuration to Chroma's client constructors.
## Problem
When using `ChatOllama` with `create_react_agent`, agents would
sometimes terminate prematurely with empty responses when Ollama
returned `done_reason: 'load'` responses with no content. This caused
agents to return empty `AIMessage` objects instead of actual generated
text.
```python
from langchain_ollama import ChatOllama
from langgraph.prebuilt import create_react_agent
from langchain_core.messages import HumanMessage
llm = ChatOllama(model='qwen2.5:7b', temperature=0)
agent = create_react_agent(model=llm, tools=[])
result = agent.invoke(HumanMessage('Hello'), {"configurable": {"thread_id": "1"}})
# Before fix: AIMessage(content='', response_metadata={'done_reason': 'load'})
# Expected: AIMessage with actual generated content
```
## Root Cause
The `_iterate_over_stream` and `_aiterate_over_stream` methods treated
any response with `done: True` as final, regardless of `done_reason`.
When Ollama returns `done_reason: 'load'` with empty content, it
indicates the model was loaded but no actual generation occurred - this
should not be considered a complete response.
## Solution
Modified the streaming logic to skip responses when:
- `done: True`
- `done_reason: 'load'`
- Content is empty or contains only whitespace
This ensures agents only receive actual generated content while
preserving backward compatibility for load responses that do contain
content.
## Changes
- **`_iterate_over_stream`**: Skip empty load responses instead of
yielding them
- **`_aiterate_over_stream`**: Apply same fix to async streaming
- **Tests**: Added comprehensive test cases covering all edge cases
## Testing
All scenarios now work correctly:
- ✅ Empty load responses are skipped (fixes original issue)
- ✅ Load responses with actual content are preserved (backward
compatibility)
- ✅ Normal stop responses work unchanged
- ✅ Streaming behavior preserved
- ✅ `create_react_agent` integration fixed
Fixes#31482.
<!-- START COPILOT CODING AGENT TIPS -->
---
💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
## **Description:**
This PR updates the internal documentation link for the RAG tutorials to
reflect the updated path. Previously, the link pointed to the root
`/docs/tutorials/`, which was generic. It now correctly routes to the
RAG-specific tutorial page for the following text-embedding models.
1. DatabricksEmbeddings
2. IBM watsonx.ai
3. OpenAIEmbeddings
4. NomicEmbeddings
5. CohereEmbeddings
6. MistralAIEmbeddings
7. FireworksEmbeddings
8. TogetherEmbeddings
9. LindormAIEmbeddings
10. ModelScopeEmbeddings
11. ClovaXEmbeddings
12. NetmindEmbeddings
13. SambaNovaCloudEmbeddings
14. SambaStudioEmbeddings
15. ZhipuAIEmbeddings
## **Issue:** N/A
## **Dependencies:** None
## **Twitter handle:** N/A
This PR addresses deprecation warnings users encounter when using
LangChain tools with Pydantic v2:
```
PydanticDeprecatedSince20: The `schema` method is deprecated; use `model_json_schema` instead.
Deprecated in Pydantic V2.0 to be removed in V3.0.
```
## Root Cause
Several LangChain components were still using the deprecated `.schema()`
method directly instead of the Pydantic v1/v2 compatible approach. While
users calling `.schema()` on returned models will still see warnings
(which is correct), LangChain's internal code should not generate these
warnings.
## Changes Made
Updated 3 files to use the standard compatibility pattern:
```python
# Before (deprecated)
schema = model.schema()
# After (compatible with both v1 and v2)
if hasattr(model, "model_json_schema"):
schema = model.model_json_schema() # Pydantic v2
else:
schema = model.schema() # Pydantic v1
```
### Files Updated:
- **`evaluation/parsing/json_schema.py`**: Fixed `_parse_json()` method
to handle Pydantic models correctly
- **`output_parsers/yaml.py`**: Fixed `get_format_instructions()` to use
compatible schema access
- **`chains/openai_functions/citation_fuzzy_match.py`**: Fixed direct
`.schema()` call on QuestionAnswer model
## Verification
✅ **Zero breaking changes** - all existing functionality preserved
✅ **No deprecation warnings** from LangChain internal code
✅ **Backward compatible** with Pydantic v1
✅ **Forward compatible** with Pydantic v2
✅ **Edge cases handled** (strings, plain objects, etc.)
## User Impact
LangChain users will no longer see deprecation warnings from internal
LangChain code. Users who directly call `.schema()` on schemas returned
by LangChain should adopt the same compatibility pattern:
```python
# User code should use this pattern
input_schema = tool.get_input_schema()
if hasattr(input_schema, "model_json_schema"):
schema_result = input_schema.model_json_schema()
else:
schema_result = input_schema.schema()
```
Fixes#31458.
<!-- START COPILOT CODING AGENT TIPS -->
---
💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
This PR fixes the PostgreSQL NUL byte issue that causes
`psycopg.DataError` when inserting documents containing `\x00` bytes
into PostgreSQL-based vector stores.
## Problem
PostgreSQL text fields cannot contain NUL (0x00) bytes. When documents
with such characters are processed by PGVector or langchain-postgres
implementations, they fail with:
```
(psycopg.DataError) PostgreSQL text fields cannot contain NUL (0x00) bytes
```
This commonly occurs when processing PDFs, documents from various
loaders, or text extracted by libraries like unstructured that may
contain embedded NUL bytes.
## Solution
Added `sanitize_for_postgres()` utility function to
`langchain_core.utils.strings` that removes or replaces NUL bytes from
text content.
### Key Features
- **Simple API**: `sanitize_for_postgres(text, replacement="")`
- **Configurable**: Replace NUL bytes with empty string (default) or
space for readability
- **Comprehensive**: Handles all problematic examples from the original
issue
- **Well-tested**: Complete unit tests with real-world examples
- **Backward compatible**: No breaking changes, purely additive
### Usage Example
```python
from langchain_core.utils import sanitize_for_postgres
from langchain_core.documents import Document
# Before: This would fail with DataError
problematic_content = "Getting\x00Started with embeddings"
# After: Clean the content before database insertion
clean_content = sanitize_for_postgres(problematic_content)
# Result: "GettingStarted with embeddings"
# Or preserve readability with spaces
readable_content = sanitize_for_postgres(problematic_content, " ")
# Result: "Getting Started with embeddings"
# Use in Document processing
doc = Document(page_content=clean_content, metadata={...})
```
### Integration Pattern
PostgreSQL vector store implementations should sanitize content before
insertion:
```python
def add_documents(self, documents: List[Document]) -> List[str]:
# Sanitize documents before insertion
sanitized_docs = []
for doc in documents:
sanitized_content = sanitize_for_postgres(doc.page_content, " ")
sanitized_doc = Document(
page_content=sanitized_content,
metadata=doc.metadata,
id=doc.id
)
sanitized_docs.append(sanitized_doc)
return self._insert_documents_to_db(sanitized_docs)
```
## Changes Made
- Added `sanitize_for_postgres()` function in
`langchain_core/utils/strings.py`
- Updated `langchain_core/utils/__init__.py` to export the new function
- Added comprehensive unit tests in
`tests/unit_tests/utils/test_strings.py`
- Validated against all examples from the original issue report
## Testing
All tests pass, including:
- Basic NUL byte removal and replacement
- Multiple consecutive NUL bytes
- Empty string handling
- Real examples from the GitHub issue
- Backward compatibility with existing string utilities
This utility enables PostgreSQL integrations in both langchain-community
and langchain-postgres packages to handle documents with NUL bytes
reliably.
Fixes#26033.
<!-- START COPILOT CODING AGENT TIPS -->
---
💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
The vectorstore feature table in the documentation was showing incorrect
information for the "IDs in add Documents" capability. Most vectorstores
were marked as ❌ (not supported) when they actually support extracting
IDs from documents.
## Problem
The issue was an inconsistency between two sources of truth:
- **JavaScript feature table** (`docs/src/theme/FeatureTables.js`):
Hardcoded `idsInAddDocuments: false` for most vectorstores
- **Python script** (`docs/scripts/vectorstore_feat_table.py`):
Correctly showed `"IDs in add Documents": True` for most vectorstores
## Root Cause
All vectorstores inherit the base `VectorStore.add_documents()` method
which automatically extracts document IDs:
```python
# From libs/core/langchain_core/vectorstores/base.py lines 277-284
if "ids" not in kwargs:
ids = [doc.id for doc in documents]
# If there's at least one valid ID, we'll assume that IDs should be used.
if any(ids):
kwargs["ids"] = ids
```
Since no vectorstores override `add_documents()`, they all inherit this
behavior and support IDs in documents.
## Solution
Updated `idsInAddDocuments` from `false` to `true` for 13 vectorstores:
- AstraDBVectorStore, Chroma, Clickhouse, DatabricksVectorSearch
- ElasticsearchStore, FAISS, InMemoryVectorStore,
MongoDBAtlasVectorSearch
- PGVector, PineconeVectorStore, Redis, Weaviate, SQLServer
The other 4 vectorstores (CouchbaseSearchVectorStore, Milvus, openGauss,
QdrantVectorStore) were already correctly marked as `true`.
## Impact
Users visiting
https://python.langchain.com/docs/integrations/vectorstores/ will now
see accurate information. The "IDs in add Documents" column will
correctly show ✅ for all vectorstores instead of incorrectly showing ❌
for most of them.
This aligns with the API documentation which states: "if kwargs contains
ids and documents contain ids, the ids in the kwargs will receive
precedence" - clearly indicating that document IDs are supported.
Fixes#30622.
<!-- START COPILOT CODING AGENT TIPS -->
---
💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
See https://docs.astral.sh/ruff/rules/#tryceratops-try
* TRY004 (replace by TypeError) in main code is escaped with `noqa` to
not break backward compatibility. The rule is still interesting for new
code.
* TRY301 ignored at the moment. This one is quite hard to fix and I'm
not sure it's very interesting to activate it.
Co-authored-by: Mason Daugherty <mason@langchain.dev>
* **Description:** Updated `parse_result` logic to handle cases where
`self.first_tool_only` is `True` and multiple matching keys share the
same function name. Instead of returning the first match prematurely,
the method now prioritizes filtering results by the specified key to
ensure correct selection.
* **Issue:** #32100
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
**Description:**
This PR makes argument parsing for Ollama tool calls more robust. Some
LLMs—including Ollama—may return arguments as Python-style dictionaries
with single quotes (e.g., `{'a': 1}`), which are not valid JSON and
previously caused parsing to fail.
The updated `_parse_json_string` method in
`langchain_ollama.chat_models` now attempts standard JSON parsing and,
if that fails, falls back to `ast.literal_eval` for safe evaluation of
Python-style dictionaries. This improves interoperability with LLMs and
fixes a common usability issue for tool-based agents.
**Issue:**
Closes#30910
**Dependencies:**
None
**Tests:**
- Added new unit tests for double-quoted JSON, single-quoted dicts,
mixed quoting, and malformed/failure cases.
- All tests pass locally, including new coverage for single-quoted
inputs.
**Notes:**
- No breaking changes.
- No new dependencies introduced.
- Code is formatted and linted (`ruff format`, `ruff check`).
- If maintainers have suggestions for further improvements, I’m happy to
revise!
Thank you for maintaining LangChain! Looking forward to your feedback.
Stricter JSON schema validation broke a test. Test was fixed in
https://github.com/langchain-ai/langchain/pull/32145. Core release runs
old tests (i.e., last released version of langchain-anthropic) against
new core. So we bypass anthropic for release. Will revert after.
Previously, we hit an index out of range error with empty variable names
(accessing tag[0]), now we through a slightly nicer error
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
## **Description:**
This PR updates the `link` values for the following integration metadata
entries:
1. **VertexAILLM**
- Changed from: `google_vertexai`
- To: `google_vertex_ai_palm`
2. **NVIDIA**
- Changed from: `NVIDIA`
- To: `nvidia_ai_endpoints`
These changes ensure that the documentation links correspond to the
correct integration paths, improving documentation navigation and
consistency with the integration structure.
## **Issue:** N/A
## **Dependencies:** None
## **Twitter handle:** N/A
Co-authored-by: Mason Daugherty <mason@langchain.dev>
- **Description:** This PR updates the `package` field for the VertexAI
integration in the documentation metadata. The original value was
`langchain-google_vertexai`, which has been corrected to
`langchain-google-vertexai` to reflect the actual package name used in
PyPI and LangChain integrations.
- **Issue:** N/A
- **Dependencies:** None
- **Twitter handle:** N/A
Fixes#32042
## Summary
Fixes a critical bug in JSON Schema reference resolution that prevented
correctly dereferencing numeric components in JSON pointer paths,
specifically for list indices in `anyOf`, `oneOf`, and `allOf` arrays.
## Changes
- Fixed `_retrieve_ref` function in
`libs/core/langchain_core/utils/json_schema.py` to properly handle
numeric components
- Added comprehensive test function `test_dereference_refs_list_index()`
in `libs/core/tests/unit_tests/utils/test_json_schema.py`
- Resolved line length formatting issues
- Improved type checking and index validation for list and dictionary
references
## Key Improvements
- Correctly handles list index references in JSON pointer paths
- Maintains backward compatibility with existing dictionary numeric key
functionality
- Adds robust error handling for out-of-bounds and invalid indices
- Passes all test cases covering various reference scenarios
## Test Coverage
- Verified fix for `#/properties/payload/anyOf/1/properties/startDate`
reference
- Tested edge cases including out-of-bounds and negative indices
- Ensured no regression in existing reference resolution functionality
Resolves the reported issue with JSON Schema reference dereferencing for
list indices.
---------
Co-authored-by: open-swe-dev[bot] <open-swe-dev@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Since #29963 BaseCache and Callbacks are imported in BaseLanguageModel
so there's no need to import them and rebuild the models.
Note: fix is available since `langchain-core==0.3.39` and the current
langchain dependency on core is `>=0.3.66` so the fix will always be
there.
- **Description:** Corrected the `link` path in the Google Gemini
integration entry from
`/docs/integrations/text_embedding/google-generative-ai` to
`/docs/integrations/text_embedding/google_generative_ai` to align with
actual directory structure and prevent broken documentation links.
- **Issue:** N/A
- **Dependencies:** None
- **Twitter handle:** N/A
The `num_gpu` parameter in `OllamaEmbeddings` was not being passed to
the Ollama client in the async embedding method, causing GPU
acceleration settings to be ignored when using async operations.
## Problem
The issue was in the `aembed_documents` method where the `options`
parameter (containing `num_gpu` and other configuration) was missing:
```python
# Sync method (working correctly)
return self._client.embed(
self.model, texts, options=self._default_params, keep_alive=self.keep_alive
)["embeddings"]
# Async method (missing options parameter)
return (
await self._async_client.embed(
self.model, texts, keep_alive=self.keep_alive # ❌ No options!
)
)["embeddings"]
```
This meant that when users specified `num_gpu=4` (or any other GPU
configuration), it would work with sync calls but be ignored with async
calls.
## Solution
Added the missing `options=self._default_params` parameter to the async
embed call to match the sync version:
```python
# Fixed async method
return (
await self._async_client.embed(
self.model,
texts,
options=self._default_params, # ✅ Now includes num_gpu!
keep_alive=self.keep_alive,
)
)["embeddings"]
```
## Validation
- ✅ Added unit test to verify options are correctly passed in both sync
and async methods
- ✅ All existing tests continue to pass
- ✅ Manual testing confirms `num_gpu` parameter now works correctly
- ✅ Code passes linting and formatting checks
The fix ensures that GPU configuration works consistently across both
synchronous and asynchronous embedding operations.
Fixes#32059.
<!-- START COPILOT CODING AGENT TIPS -->
---
💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Description
The Perplexity chat model already returns a search_results field, but
LangChain dropped it when mapping Perplexity responses to
additional_kwargs.
This patch adds "search_results" to the allowed attribute lists in both
_stream and _generate, so downstream code can access it just like
images, citations, or related_questions.
Dependencies
None. The change is purely internal; no new imports or optional
dependencies required.
https://community.perplexity.ai/t/new-feature-search-results-field-with-richer-metadata/398
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
## Description
When ChatDeepSeek invokes a tool that returns a list, it results in an
openai.UnprocessableEntityError due to a failure in deserializing the
JSON body.
The root of the problem is that ChatDeepSeek uses BaseChatOpenAI
internally, but the APIs are not identical: OpenAI v1/chat/completions
accepts arrays as tool results, but Deepseek API does not.
As a solution added `_get_request_payload` method to ChatDeepSeek, which
inherits the behavior from BaseChatOpenAI but adds a step to stringify
tool message content in case the content is an array. I also add a unit
test for this.
From the linked issue you can find the full reproducible example the
reporter of the issue provided. After the changes it works as expected.
Source: [Deepseek
docs](https://api-docs.deepseek.com/api/create-chat-completion/)

Source: [OpenAI
docs](https://platform.openai.com/docs/api-reference/chat/create)

## Issue
Fixes#31394
## Dependencies:
No new dependencies.
## Twitter handle:
Don't have one.
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
- **Description:** Ensure that the tool description is an empty string
when creating a Structured Tool from a Pydantic class in case no
description is provided
- **Issue:** Fixes#31606
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
- **Description**: issues a warning if inf and nan are passed as inputs
to langchain_core.vectorstores.utils._cosine_similarity
- **Issue**: Fixes#31496
- **Dependencies**: no external dependencies added, only warnings module
imported
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Before jumping into tech implementation, I added a context for
linearization-config param, and explained what's linealization in this
context.
I also linked an AWS blog for more advanced use cases, as this single
example doesn't cover all use cases.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
**On this PR I am doing two things:**
1. Adding titles to the 4 example we have, to allow the reader to
capture the essence of the paragraph quickly
2. Replacing 'samples' with 'examples', for more clarity,
**Why 'examples' could be a better terminology over 'samples' here?**
1. On the page, we were using both 'samples' and 'examples'
interchangeably which lead to confusion, now 'examples' are the use
cases, while 'samples' are the the sample data being used
2. This is consistent with the rest of the docs, we typically use
'examples' for examples, for example
https://python.langchain.com/docs/integrations/callbacks/fiddler/
## Description
Currently when deserializing objects that contain non-deserializable
values, we throw an error. However, there are cases (e.g. proxies that
return response fields containing extra fields like Python datetimes),
where these values are not important and we just want to drop them.
Twitter handle: @hacubu
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Trying to unblock documentation build pipeline
* Bump langgraph dep in docs
* Update langgraph in lock file (resolves an issue in API reference
generation)
I am modifying two things:
1. "This sample demonstrates" with "The following samples demonstrate"
as we're talking about at least 4 samples
2. Bringing the sentence to after talking about the definition of
textract to keep the document organized (textract definition then
samples)
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
**PR title**:
add deprecation notice for PipelinePromptTemplate
**PR message**:
In the API documentation, PipelinePromptTemplate is marked as
deprecated, but this is not mentioned in the docs.
I'm submitting this PR to add a deprecation notice to the docs.
**Tests**:
N/A (documentation only)
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
This PR changes the return type hints of the `format_prompt` and
`aformat_prompt` methods in `BaseChatPromptTemplate` from `PromptValue`
to `ChatPromptValue`. Since both methods always return a
`ChatPromptValue`.
**Description:**
Added an explicit validation step in
`langchain_core.vectorstores.utils._cosine_similarity` to raise a
`ValueError` if the input query or any embedding contains `NaN` values.
This prevents silent failures or unstable behavior during similarity
calculations, especially when using maximal_marginal_relevance.
**Issue**:
Fixes#31806
**Dependencies:**
None
---------
Co-authored-by: Azhagammal S C <azhagammal@kofluence.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
* update model validation due to change in [Ollama
client](https://github.com/ollama/ollama) - ensure you are running the
latest version (0.9.6) to use `validate_model_on_init`
* add code example and fix formatting for ChatOllama reasoning
* ensure that setting `reasoning` in invocation kwargs overrides
class-level setting
* tests
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Harden the default implementation of the XML parser for the agent
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
**Description:**
I traced the kwargs starting at `.invoke()` and it was not clear where
they go. it was clarified to two layers down. so I changed it to make it
more documented for the next person.
**Issue:**
No related issue.
**Dependencies:**
No dependency changes.
**Twitter handle:**
Nah. We're good.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
This PR updates the doc on Hugging Face's inference offering from
'inference API' to 'inference providers'
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
* New `reasoning` (bool) param to support toggling [Ollama
thinking](https://ollama.com/blog/thinking) (#31573, #31700). If
`reasoning=True`, Ollama's `thinking` content will be placed in the
model responses' `additional_kwargs.reasoning_content`.
* Supported by:
* ChatOllama (class level, invocation level TODO)
* OllamaLLM (TODO)
* Added tests to ensure streaming tool calls is successful (#29129)
* Refactored tests that relied on `extract_reasoning()`
* Myriad docs additions and consistency/typo fixes
* Improved type safety in some spots
Closes#29129
Addresses #31573 and #31700
Supersedes #31701
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Integrate Bandit for security analysis, suppress warnings for specific issues, and address potential vulnerabilities such as hardcoded passwords and SQL injection risks. Adjust documentation and formatting for clarity.
* Ensure access to local model during `ChatOllama` instantiation
(#27720). This adds a new param `validate_model_on_init` (default:
`true`)
* Catch a few more errors from the Ollama client to assist users
## Summary
- Removes the `xslt_path` parameter from HTMLSectionSplitter to
eliminate XXE attack vector
- Hardens XML/HTML parsers with secure configurations to prevent XXE
attacks
- Adds comprehensive security tests to ensure the vulnerability is fixed
## Context
This PR addresses a critical XXE vulnerability discovered in the
HTMLSectionSplitter component. The vulnerability allowed attackers to:
- Read sensitive local files (SSH keys, passwords, configuration files)
- Perform Server-Side Request Forgery (SSRF) attacks
- Exfiltrate data to attacker-controlled servers
## Changes Made
1. **Removed `xslt_path` parameter** - This eliminates the primary
attack vector where users could supply malicious XSLT files
2. **Hardened XML parsers** - Added security configurations to prevent
XXE attacks even with the default XSLT:
- `no_network=True` - Blocks network access
- `resolve_entities=False` - Prevents entity expansion -
`load_dtd=False` - Disables DTD processing -
`XSLTAccessControl.DENY_ALL` - Blocks all file/network I/O in XSLT
transformations
3. **Added security tests** - New test file `test_html_security.py` with
comprehensive tests for various XXE attack vectors
4. **Updated existing tests** - Modified tests that were using the
removed `xslt_path` parameter
## Test Plan
- [x] All existing tests pass
- [x] New security tests verify XXE attacks are blocked
- [x] Code passes linting and formatting checks
- [x] Tested with both old and new versions of lxml
Twitter handle: @_colemurray
Recommend using context manager for FileCallbackHandler to avoid opening
too many file descriptors
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
- There was some ambiguous wording that has been updated to hopefully
clarify the functionality of `reasoning_format` in ChatGroq.
- Added support for `reasoning_effort`
- Added links to see models capable of `reasoning_format` and
`reasoning_effort`
- Other minor nits
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
- docs: for the Ollama notebooks, improve the specificity of some links,
add `homebrew` install info, update some wording
- tests: reduce number of local models needed to run in half from 4 → 2
(shedding 8gb of required installs)
- bump deps (non-breaking) in anticipation of upcoming "thinking" PR
Add additional hashing options to the indexing API, warn on SHA-1
Requires:
- Bumping langchain-core version
- bumping min langchain-core in langchain
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
**Description:** Updates ChatPerplexity documentation to replace
deprecated llama 3 model reference with the current sonar model in the
API key example code block.
**Issue:** N/A (maintenance update for deprecated model)
**Dependencies:** No new dependencies required
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
`Runnable`'s `Input` is contravariant so we need to enumerate all
possible inputs and it's not possible to put them in a `Union`.
Also, it's better to only require a runnable that
accepts`list[BaseMessage]` instead of a broader `Sequence[BaseMessage]`
as internally the runnable is only called with a list.
As part of core releases we run tests on the last released version of
some packages (including langchain-openai) using the new version of
langchain-core. We run langchain-openai's test suite as it was when it
was last released.
Our test for computer use started raising 500 error at some point during
the day today (test passed as part of scheduled test job in the
morning):
> InternalServerError: Error code: 500 - {'error': {'message': 'An error
occurred while processing your request. You can retry your request, or
contact us through our help center at help.openai.com if the error
persists.
Will revert this change after we release langchain-core.
**Description:**
Previously, when transitioning from a deeper Markdown header (e.g., ###)
to a shallower one (e.g., ##), the
ExperimentalMarkdownSyntaxTextSplitter retained the deeper header in the
metadata.
This commit updates the `_resolve_header_stack` method to remove headers
at the same or deeper levels before appending the current header. As a
result, each chunk now reflects only the active header context.
Fixes unexpected metadata leakage across sections in nested Markdown
documents.
Additionally, test cases have been updated to:
- Validate correct header resolution and metadata assignment.
- Cover edge cases with nested headers and horizontal rules.
**Issue:**
Fixes [#31596](https://github.com/langchain-ai/langchain/issues/31596)
**Dependencies:**
None
**Twitter handle:** -> [_RaghuKapur](https://twitter.com/_RaghuKapur)
**LinkedIn:** ->
[https://www.linkedin.com/in/raghukapur/](https://www.linkedin.com/in/raghukapur/)
## Description
<!-- What does this pull request accomplish? -->
- When parsing MistralAI chunk dicts to Langchain to `AIMessageChunk`
schemas via the `_convert_chunk_to_message_chunk` utility function, the
`finish_reason` was not being included in `response_metadata` as it is
for other providers.
- This PR adds a one-liner fix to include the finish reason.
- fixes: https://github.com/langchain-ai/langchain/issues/31666
* Simplified Pydantic handling since Pydantic v1 is not supported
anymore.
* Replace use of deprecated v1 methods by corresponding v2 methods.
* Remove use of other deprecated methods.
* Activate mypy errors on deprecated methods use.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
**Description:**
This PR fixes an `IndexError` that occurs when `LLMListwiseRerank` is
called with an empty list of documents.
Earlier, the code assumed the presence of at least one document and
attempted to construct the context string based on `len(documents) - 1`,
which raises an error when documents is an empty list.
The fix works with gpt-4o-mini if I make the list empty, but fails
occasionally with gpt-3.5-turbo. In case of empty list, setting the
string to "empty list" seems to have the expected response.
**Issue:** #31192
Description:
This pull request corrects minor spelling mistakes in the comments
within the `chat_models.py` file of the MistralAI partner integration.
Specifically, it fixes the spelling of "equivalent" and "compatibility"
in two separate comments. These changes improve code readability and
maintain professional documentation standards. No functional code
changes are included.
`uv lock --upgrade-package langsmith
`
Original issue: The lock file (uv.lock) was constraining
langsmith>=0.1.125,<0.4, preventing LangSmith 0.4.1 installation. Even
though the pyproject.toml wasn't restricting langchain core.
Issue:
https://langchain.slack.com/archives/C050X0VTN56/p1750107176007629
This PR updates the tool runtime example notebook to replace the
deprecated `.schema()` method with `.model_json_schema()`, aligning it
with Pydantic V2.
### 🔧 Changes:
- Replaced:
```python
update_favorite_pets.get_input_schema().schema()
with
update_favorite_pets.get_input_schema().model_json_schema()
```
Fixes#31609
### Description
Add keep_separator arg to HTMLSemanticPreservingSplitter and pass value
to instance of RecursiveCharacterTextSplitter used under the hood.
### Issue
Documents returned by `HTMLSemanticPreservingSplitter.split_text(text)`
are defaulted to use separators at beginning of page_content. [See third
and fourth document in example output from how-to
guide](https://python.langchain.com/docs/how_to/split_html/#using-htmlsemanticpreservingsplitter):
```
[Document(metadata={'Header 1': 'Main Title'}, page_content='This is an introductory paragraph with some basic content.'),
Document(metadata={'Header 2': 'Section 1: Introduction'}, page_content='This section introduces the topic'),
Document(metadata={'Header 2': 'Section 1: Introduction'}, page_content='. Below is a list: First item Second item Third item with bold text and a link Subsection 1.1: Details This subsection provides additional details'),
Document(metadata={'Header 2': 'Section 1: Introduction'}, page_content=". Here's a table: Header 1 Header 2 Header 3 Row 1, Cell 1 Row 1, Cell 2 Row 1, Cell 3 Row 2, Cell 1 Row 2, Cell 2 Row 2, Cell 3"),
Document(metadata={'Header 2': 'Section 2: Media Content'}, page_content='This section contains an image and a video:  '),
Document(metadata={'Header 2': 'Section 3: Code Example'}, page_content='This section contains a code block: <code:html> <div> <p>This is a paragraph inside a div.</p> </div> </code>'),
Document(metadata={'Header 2': 'Conclusion'}, page_content='This is the conclusion of the document.')]
```
### Dependencies
None
@ttrumper3
docs: update multimodal PDF and image usage for gpt-4.1
**Description:**
This update revises the LangChain documentation to support the new
GPT-4.1 multimodal API format. It fixes the previous broken example for
PDF uploads (which returned a 400 error: "Missing required parameter:
'messages[0].content[1].file'") and adds clear instructions on how to
include base64-encoded images for OpenAI models.
**Issue:**
error appointed in foruns for pdf load into api ->
'''
@[Albaeld](https://github.com/Albaeld)
Albaeld
[8 days
ago](https://github.com/langchain-ai/langchain/discussions/27702#discussioncomment-13369460)
This simply does not work with openai:gpt-4.1. I get:
Error code: 400 - {'error': {'message': "Missing required parameter:
'messages[0].content[1].file'.", 'type': 'invalid_request_error',
'param': 'messages[0].content[1].file', 'code':
'missing_required_parameter'}}
'''
**Dependencies:**
None
**Twitter handle:**
N/A
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Description:
This pull request corrects minor typographical errors in the
documentation notebooks for vectorstore integrations. Specifically, it
fixes the spelling of "datastore" in `llm_rails.ipynb` and
"pre-existent" in `redis.ipynb`. These changes improve the clarity and
professionalism of the documentation. No functional code changes are
included.
Thank you for contributing to LangChain!
[x] PR title: langchain_ollama: support custom headers for Ollama
partner APIs
Where "package" is whichever of langchain, core, etc. is being modified.
Use "docs: ..." for purely docs changes, "infra: ..." for CI changes.
Example: "core: add foobar LLM"
[x] PR message:
**Description: This PR adds support for passing custom HTTP headers to
Ollama models when used as a LangChain integration. This is especially
useful for enterprise users or partners who need to send authentication
tokens, API keys, or custom tracking headers when querying secured
Ollama servers.
Issue: N/A (new enhancement)
**Dependencies: No external dependencies introduced.
Twitter handle: @arunkumar_offl
[x] Add tests and docs: If you're adding a new integration, please
include
1.Added a unit test in test_chat_models.py to validate headers are
passed correctly.
2. Added an example notebook:
docs/docs/integrations/llms/ollama_custom_headers.ipynb showing how to
use custom headers.
[x] Lint and test: Ran make format, make lint, and make test to ensure
the code is clean and passing all checks.
Additional guidelines:
Make sure optional dependencies are imported within a function.
Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
Most PRs should not touch more than one package.
Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
This MR is only for the docs. Added integration with Nebius AI Studio to
docs. The integration package is available at
[https://github.com/nebius/langchain-nebius](https://github.com/nebius/langchain-nebius).
---------
Co-authored-by: Akim Tsvigun <aktsvigun@nebius.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
- **Description:** Remove the outdated Gemini models and replace those
with the latest models.
- **Issue:** Earlier the code was not running, now the code runs.
- **Dependencies:** No
- **Twitter handle:** [soumendrak_](https://x.com/soumendrak_)
## Description
Updating Exa integration documentation to showcase the latest features
and best practices.
## Changes
- Added examples for `ExaSearchResults` tool with advanced search
options
- Added examples for `ExaFindSimilarResults` tool
- Updated agent example to use LangGraph
- Demonstrated text content options, summaries, and highlights
- Included examples of search type control and live crawling
## Additional Context
I'm from the Exa team updating our integration documentation to reflect
current capabilities and best practices.
Remove proxy imports to langchain_experimental.
Previously, these imports would work if a user manually installed
langchain_experimental. However, we want to drop support even for that
as langchain_experimental is generally not recommended to be run in
production.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
**Description:**
Fixed a small grammatical error in the `retrievers.mdx` documentation.
Replaced "we can be built retrievers on top of search APIs..." with
"we can build retrievers on top of search APIs..." for clarity and
correctness.
**Issue:**
N/A
**Dependencies:**
None
**Twitter handle:**
@hassan_zameel
OpenAI changed their API to require the `partial_images` parameter when
using image generation + streaming.
As described in https://github.com/langchain-ai/langchain/pull/31424, we
are ignoring partial images. Here, we accept the `partial_images`
parameter (as required by OpenAI), but emit a warning and continue to
ignore partial images.
**Description:**
`langchain_huggingface` has a very large installation size of around 600
MB (on a Mac with Python 3.11). This is due to its dependency on
`sentence-transformers`, which in turn depends on `torch`, which is 320
MB all by itself. Similarly, the depedency on `transformers` adds
another set of heavy dependencies. With those dependencies removed, the
installation of `langchain_huggingface` only takes up ~26 MB. This is
only 5 % of the full installation!
These libraries are not necessary to use `langchain_huggingface`'s API
wrapper classes, only for local inferences/embeddings. All import
statements for those two libraries already have import guards in place
(try/catch with a helpful "please install x" message).
This PR therefore moves those two libraries to an optional dependency
group `full`. So a `pip install langchain_huggingface` will only install
the lightweight version, and a `pip install
"langchain_huggingface[full]"` will install all dependencies.
I know this may break existing code, because `sentence-transformers` and
`transformers` are now no longer installed by default. Given that users
will see helpful error messages when that happens, and the major impact
of this small change, I hope that you will still consider this PR.
**Dependencies:** No new dependencies, but new optional grouping.
- **Description:**
- In _infer_arg_descriptions, the annotations dictionary contains string
representations of types instead of actual typing objects. This causes
_is_annotated_type to fail, preventing the correct description from
being generated.
- This is a simple fix using the get_type_hints method, which resolves
the annotations properly and is supported across all Python versions.
- **Issue:** #31051
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
https://github.com/langchain-ai/langchain/pull/31286 included an update
to the return type for `BaseChatModel.(a)stream`, from
`Iterator[BaseMessageChunk]` to `Iterator[BaseMessage]`.
This change is correct, because when streaming is disabled, the stream
methods return an iterator of `BaseMessage`, and the inheritance is such
that an `BaseMessage` is not a `BaseMessageChunk` (but the reverse is
true).
However, LangChain includes a pattern throughout its docs of [summing
BaseMessageChunks](https://python.langchain.com/docs/how_to/streaming/#llms-and-chat-models)
to accumulate a chat model stream. This pattern is implemented in tests
for most integration packages and appears in application code. So
https://github.com/langchain-ai/langchain/pull/31286 introduces mypy
errors throughout the ecosystem (or maybe more accurately, it reveals
that this pattern does not account for use of the `.stream` method when
streaming is disabled).
Here we revert just the change to the stream return type to unblock
things. A fix for this should address docs + integration packages (or if
we elect to just force people to update code, be explicit about that).
**Description:**
This PR updates approximately 4' occurences of the deprecated
`initialize_agent` function in LangChain documentation and examples,
replacing it with the recommended `create_react_agent` and pattern. It
also refactors related examples to align with current best practices.
**Issue:**
Partially Fixes#29277
**Dependencies:**
None
**X handle:**
@TK1475
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
This PR adds documentation to our Microsft Provider page for LangChain
Azure AI. This PR does not add any extra dependencies or require any
tests besides passing CI.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
## PR Title
```
docs: enhance Salesforce Toolkit documentation
```
## PR Description
**Description:** Enhanced the Salesforce Toolkit documentation to
provide a more comprehensive overview of the `langchain-salesforce`
package. The updates include improved descriptions of the toolkit's
capabilities, detailed setup instructions for authentication using
environment variables, updated code snippets with consistent parameter
naming and improved readability, and additional resources with API
references for better user guidance.
**Issue:** N/A (documentation improvement)
**Dependencies:** None
**Twitter handle:** @colesmcintosh
---
### Changes Made:
- Improved description of the Salesforce Toolkit's capabilities and
features
- Added detailed setup instructions for authentication using environment
variables
- Updated code snippets to use consistent parameter naming and improved
readability
- Included additional resources and API references for better user
guidance
- Enhanced overall documentation structure and clarity
### Files Modified:
- `docs/docs/integrations/tools/salesforce.ipynb` (83 insertions, 36
deletions)
This is a documentation-only change that improves the user experience
for developers working with the Salesforce Toolkit. The changes are
backwards compatible and follow LangChain's documentation standards.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Does not support partial images during generation at the moment. Before
doing that I'd like to figure out how to specify the aggregation logic
without requiring changes in core.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
change `--no-cache-dirclear` -> `--no-cache-dir`.
pip throws `no such option: --no-cache-dirclear` since its invalid.
`--no-cache-dir` is the correct one.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
…cs/integrations/tools/ All tools section
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
…tegrations/tools/ All tools section
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Llama-3.1 started failing consistently with
> groq.BadRequestError: Error code: 400 - ***'error': ***'message':
"Failed to call a function. Please adjust your prompt. See
'failed_generation' for more details.", 'type': 'invalid_request_error',
'code': 'tool_use_failed', 'failed_generation':
'<function=brave_search>***"query": "Hello!"***</function>'***
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: ccurme <chester.curme@gmail.com>
Release core 0.3.63
Small update just to expand the list of well known tools. This is
necessary while the logic lives in langchain-core.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Add image generation tool to the list of well known tools. This is needed for changes in the ChatOpenAI client.
TODO: Some of this logic needs to be moved from core directly into the client as changes in core should not be required to add a new tool to the openai chat client.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Deleted two outdated phrases that were reflecting the current versions
of packages at the time i.e.: 1-"langchain-community is currently on
version 0.2.x." 2-langchain-"experimental is currently on version 0.0.x"
docs: update Valyu integration notebooks to reflect current
langchain-valyu package implementation
Updated the Valyu integration documentation notebooks to align with the
current implementation of the langchain-valyu package. The changes
include:
- Updated ValyuContextRetriever to ValyuRetriever class name
- Changed parameter name from similarity_threshold to
relevance_threshold
- Removed query_rewrite parameter from search tool examples
- Added start_date and end_date parameters for time filtering
- Updated default values to match current implementation
(relevance_threshold: 0.5)
- Enhanced parameter documentation with proper descriptions and
constraints
- Updated section titles to reflect "Deep Search" functionality
…integrations/retrievers/ All retrievers section
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
…ngchain.com/docs/integrations/tools/ All tools section
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
…gchain.com/docs/integrations/document_loaders/ All document loaders
section
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
…cs/integrations/document_loaders/ All document loaders section
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
…chain.com/docs/integrations/document_loaders/ All document loaders
section
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
…integrations/document_loaders/ All document loaders section
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
…tionguard.ipynb
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
### Description
Added a note above the retriever overview table to clarify that the
descriptions are truncated for readability and how to view the full
version (via hover or click).
### Issue
Fixes#31311 — Users were confused by incomplete retriever descriptions
in the integration docs.
### Dependencies
None
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
…rations/chat/ All chat models section
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
…ntegrations/chat/ All chat models section
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
…egrations/chat/ All chat models section
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
…tegrations/chat/ All chat models section
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
As part of core releases we run tests on the last released version of
some packages (including langchain-openai) using the new version of
langchain-core. We run langchain-openai's test suite as it was when it
was last released.
OpenAI has since updated their API— relaxing constraints on what schemas
are supported when `strict=True`— causing these tests to break. They
have since been fixed. But the old tests will continue to fail.
Will revert this change after we release OpenAI today.
Added support for new Exa API features. Updated Exa docs and python
package (langchain-exa).
Description
Added support for new Exa API features in the langchain-exa package:
- Added max_characters option for text content
- Added support for summary and custom summary prompts
- Added livecrawl option with "always", "fallback", "never" settings
- Added "auto" option for search type
- Updated documentation and tests
Dependencies
- No new dependencies required. Using existing features from exa-py.
twitter: @theishangoswami
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:** ConversationChain has been deprecated, and the
documentation says to use RunnableWithMessageHistory in its place, but
the link at the top of the page to RunnableWithMessageHistory is broken
(it's rendering as "html()"). See here at the top of the page:
https://python.langchain.com/api_reference/langchain/chains/langchain.chains.conversation.base.ConversationChain.html.
This PR fixes the link.
**Issue**: N/A
**Dependencies**: N/A
**Twitter handle:**: If you're on Bluesky, I'm @vikramsaraph.com
Scheduled testing started failing today because the Responses API
stopped raising `BadRequestError` for a schema that was previously
invalid when `strict=True`.
Although docs still say that [some type-specific keywords are not yet
supported](https://platform.openai.com/docs/guides/structured-outputs#some-type-specific-keywords-are-not-yet-supported)
(including `minimum` and `maximum` for numbers), the below appears to
run and correctly respect the constraints:
```python
import json
import openai
maximums = list(range(1, 11))
arg_values = []
for maximum in maximums:
tool = {
"type": "function",
"name": "magic_function",
"description": "Applies a magic function to an input.",
"parameters": {
"properties": {
"input": {"maximum": maximum, "minimum": 0, "type": "integer"}
},
"required": ["input"],
"type": "object",
"additionalProperties": False
},
"strict": True
}
client = openai.OpenAI()
response = client.responses.create(
model="gpt-4.1",
input=[{"role": "user", "content": "What is the value of magic_function(3)? Use the tool."}],
tools=[tool],
)
function_call = next(item for item in response.output if item.type == "function_call")
args = json.loads(function_call.arguments)
arg_values.append(args["input"])
print(maximums)
print(arg_values)
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# [1, 2, 3, 3, 3, 3, 3, 3, 3, 3]
```
Until yesterday this raised BadRequestError.
The same is not true of Chat Completions, which appears to still raise
BadRequestError
```python
tool = {
"type": "function",
"function": {
"name": "magic_function",
"description": "Applies a magic function to an input.",
"parameters": {
"properties": {
"input": {"maximum": 5, "minimum": 0, "type": "integer"}
},
"required": ["input"],
"type": "object",
"additionalProperties": False
},
"strict": True
}
}
response = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "What is the value of magic_function(3)? Use the tool."}],
tools=[tool],
)
response # raises BadRequestError
```
Here we update tests accordingly.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
**Description**:
This PR updates the documentation to address a potential issue when
using `hub.pull(...)` with non-US LangSmith endpoints (e.g.,
`https://eu.api.smith.langchain.com`).
By default, the `hub.pull` function assumes the non US-based API URL.
When the `LANGSMITH_ENDPOINT` environment variable is set to a non-US
region, this can lead to `LangSmithNotFoundError 404 not found` errors
when pulling public assets from the LangChain Hub.
Issue: #31191
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Update docs to add Featherless.ai Provider & Chat Model
- **Description:** Adding Featherless.ai as provider in teh
documentations giving access to over 4300+ open-source models
- **Twitter handle:** https://x.com/FeatherlessAI
DSPy removed their LangChain integration in version 2.6.6.
Here we remove the page and add a redirect to the LangChain v0.2 docs
for posterity.
We add an admonition to the v0.2 docs in
https://github.com/langchain-ai/langchain/pull/31277.
* It is possible to chain a `Runnable` with an `AsyncIterator` as seen
in `test_runnable.py`.
* Iterator and AsyncIterator Input/Output of Callables must be put
before `Callable[[Other], Any]` otherwise the pattern matching picks the
latter.
**PR message**: Not sure if I put the check at the right spot, but I
thought throwing the error before the loop made sense to me.
**Description:** Checks if there are only system messages using
AnthropicChat model and throws an error if it's the case. Check Issue
for more details
**Issue:** #30764
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Issue:**[
#309070](https://github.com/langchain-ai/langchain/issues/30970)
**Cause**
Arg type in python code
```
arg: Union[SubSchema1, SubSchema2]
```
is translated to `anyOf` in **json schema**
```
"anyOf" : [{sub schema 1 ...}, {sub schema 1 ...}]
```
The value of anyOf is a list sub schemas.
The bug is caused since the sub schemas inside `anyOf` list is not taken
care of.
The location where the issue happens is `convert_to_openai_function`
function -> `_recursive_set_additional_properties_false` function, that
recursively adds `"additionalProperties": false` to json schema which is
[required by OpenAI's strict function
calling](https://platform.openai.com/docs/guides/structured-outputs?api-mode=responses#additionalproperties-false-must-always-be-set-in-objects)
**Solution:**
This PR fixes this issue by iterating each sub schema inside `anyOf`
list.
A unit test is added.
**Twitter handle:** shengboma
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
`aindex` function should check not only `adelete` method, but `delete`
method too
**PR title**: "core: fix async indexing issue with adelete/delete
checking"
**PR message**: Currently `langchain.indexes.aindex` checks if vector
store has overrided adelete method. But due to `adelete` default
implementation store can have just `delete` overrided to make `adelete`
working.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Description: This document change concerns the document-loader
integration, specifically `Confluence`.
I am trying to use the ConfluenceLoader and came across deprecations
when I followed the instructions in the documentation. So I updated the
code blocks with the latest changes made to langchain, and also updated
the documentation for better readability
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** The file ```docs/api_reference/create_api_rst.py```
uses a pyproject.toml check to remove partners which don't have a valid
pyproject.toml. This PR extends that check to ```/langchain/libs/*```
sub-directories as well. Without this the ```make api_docs_build```
command fails (see error).
- **Issue:** #31109
- **Dependencies:** none
- **Error Traceback:**
uv run --no-group test python docs/api_reference/create_api_rst.py
Starting to build API reference files.
Building package: community
pyproject.toml not found in /langchain/libs/community.
You are either attempting to build a directory which is not a package or
the package is missing a pyproject.toml file which should be
added.Aborting the build.
make: *** [Makefile:35: api_docs_build] Error 1
**Description**:
Add a `async_client_kwargs` field to ollama chat/llm/embeddings adapters
that is passed to async httpx client constructor.
**Motivation:**
In my use-case:
- chat/embedding model adapters may be created frequently, sometimes to
be called just once or to never be called at all
- they may be used in bots sunc and async mode (not known at the moment
they are created)
So, I want to keep a static transport instance maintaining connection
pool, so model adapters can be created and destroyed freely. But that
doesn't work when both sync and async functions are in use as I can only
pass one transport instance for both sync and async client, while
transport types must be different for them. So I can't make both sync
and async calls use shared transport with current model adapter
interfaces.
In this PR I add a separate `async_client_kwargs` that gets passed to
async client constructor, so it will be possible to pass a separate
transport instance. For sake of backwards compatibility, it is merged
with `client_kwargs`, so nothing changes when it is not set.
I am unable to run linter right now, but the changes look ok.
Bumps
[Ana06/get-changed-files](https://github.com/ana06/get-changed-files)
from 2.2.0 to 2.3.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/ana06/get-changed-files/releases">Ana06/get-changed-files's
releases</a>.</em></p>
<blockquote>
<h2>v2.3.0</h2>
<p>This project is a fork of <a
href="https://github.com/jitterbit/get-changed-files">jitterbit/get-changed-files</a>,
which:</p>
<ul>
<li>Supports <code>pull_request_target</code></li>
<li>Allows to filter files using regular expressions</li>
<li>Removes the ahead check</li>
<li>Considers renamed modified files as modified</li>
<li>Adds <code>added_modified_renamed</code> that includes renamed
non-modified files and all files in <code>added_modified</code></li>
<li>Uses Node 20</li>
</ul>
<h2>Changes</h2>
<ul>
<li>Update to Node 20</li>
<li>Update dependencies</li>
</ul>
<h2>Raw diff</h2>
<p><a
href="https://github.com/Ana06/get-changed-files/compare/v2.2.0...v2.3.0">https://github.com/Ana06/get-changed-files/compare/v2.2.0...v2.3.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="25f79e676e"><code>25f79e6</code></a>
Update version in package.json</li>
<li><a
href="6f5373eb01"><code>6f5373e</code></a>
Merge pull request <a
href="https://redirect.github.com/ana06/get-changed-files/issues/42">#42</a>
from Ana06/release2-3</li>
<li><a
href="64dffb46a4"><code>64dffb4</code></a>
Prepare 2.3.0 release</li>
<li><a
href="791f7645b7"><code>791f764</code></a>
[CI] Update actions/checkout</li>
<li><a
href="5a4a136e91"><code>5a4a136</code></a>
[CI] Ensure GH action uses node version 20</li>
<li><a
href="38bdb2e498"><code>38bdb2e</code></a>
Update to Node 20</li>
<li><a
href="5558be5781"><code>5558be5</code></a>
Merge pull request <a
href="https://redirect.github.com/ana06/get-changed-files/issues/30">#30</a>
from Ana06/dependabot/npm_and_yarn/decode-uri-componen...</li>
<li><a
href="6a376fdbb3"><code>6a376fd</code></a>
Merge pull request <a
href="https://redirect.github.com/ana06/get-changed-files/issues/31">#31</a>
from Ana06/dependabot/npm_and_yarn/qs-6.5.3</li>
<li><a
href="ace6e7bcbb"><code>ace6e7b</code></a>
Merge pull request <a
href="https://redirect.github.com/ana06/get-changed-files/issues/32">#32</a>
from brtrick/main</li>
<li><a
href="a102fae9bf"><code>a102fae</code></a>
Merge pull request <a
href="https://redirect.github.com/ana06/get-changed-files/issues/33">#33</a>
from Ana06/dependabot/npm_and_yarn/json5-2.2.3</li>
<li>Additional commits viewable in <a
href="https://github.com/ana06/get-changed-files/compare/v2.2.0...v2.3.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Updates dependencies to Chroma to integrate the major release of Chroma
with improved performance, and to fix issues users have been seeing
using the latest chroma docker image with langchain-chroma
https://github.com/langchain-ai/langchain/issues/31047#issuecomment-2850790841
Updates chromadb dependency to >=1.0.9
This also removes the dependency of chroma-hnswlib, meaning it can run
against python 3.13 runners for tests as well.
Tested this by pulling the latest Chroma docker image, running
langchain-chroma using client mode
```
httpClient = chromadb.HttpClient(host="localhost", port=8000)
vector_store = Chroma(
client=httpClient,
collection_name="test",
embedding_function=embeddings,
)
```
"Alanis Morissette" spelling error
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 5
to 6.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v6.0.0 🌈 activate-environment and working-directory</h2>
<h2>Changes</h2>
<p>This version contains some breaking changes which have been gathering
up for a while. Lets dive into them:</p>
<ul>
<li><a
href="https://github.com/astral-sh/setup-uv/blob/HEAD/#activate-environment">Activate
environment</a></li>
<li><a
href="https://github.com/astral-sh/setup-uv/blob/HEAD/#working-directory">Working
Directory</a></li>
<li><a
href="https://github.com/astral-sh/setup-uv/blob/HEAD/#default-cache-dependency-glob">Default
<code>cache-dependency-glob</code></a></li>
<li><a
href="https://github.com/astral-sh/setup-uv/blob/HEAD/#use-default-cache-dir-on-self-hosted-runners">Use
default cache dir on self hosted runners</a></li>
</ul>
<h3>Activate environment</h3>
<p>In previous versions using the input <code>python-version</code>
automatically activated a venv at the repository root.
This led to some unwanted side-effects, was sometimes unexpected and not
flexible enough.</p>
<p>The venv activation is now explicitly controlled with the new input
<code>activate-environment</code> (false by default):</p>
<pre lang="yaml"><code>- name: Install the latest version of uv and
activate the environment
uses: astral-sh/setup-uv@v6
with:
activate-environment: true
- run: uv pip install pip
</code></pre>
<p>The venv gets created by the <a
href="https://docs.astral.sh/uv/pip/environments/"><code>uv
venv</code></a> command so the python version is controlled by the
<code>python-version</code> input or the files
<code>pyproject.toml</code>, <code>uv.toml</code>,
<code>.python-version</code> in the <code>working-directory</code>.</p>
<h3>Working Directory</h3>
<p>The new input <code>working-directory</code> controls where we look
for <code>pyproject.toml</code>, <code>uv.toml</code> and
<code>.python-version</code> files
which are used to determine the version of uv and python to install.</p>
<p>It can also be used to control where the venv gets created.</p>
<pre lang="yaml"><code>- name: Install uv based on the config files in
the working-directory
uses: astral-sh/setup-uv@v6
with:
working-directory: my/subproject/dir
</code></pre>
<blockquote>
<p>[!CAUTION]</p>
<p>The inputs <code>pyproject-file</code> and <code>uv-file</code> have
been removed.</p>
</blockquote>
<h3>Default <code>cache-dependency-glob</code></h3>
<p><a href="https://github.com/ssbarnea"><code>@ssbarnea</code></a>
found out that the default <code>cache-dependency-glob</code> was not
suitable for a lot of users.</p>
<p>The old default</p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="6b9c6063ab"><code>6b9c606</code></a>
Bump dependencies (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/389">#389</a>)</li>
<li><a
href="ef6bcdff59"><code>ef6bcdf</code></a>
Fix default cache dependency glob (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/388">#388</a>)</li>
<li><a
href="9a311713f4"><code>9a31171</code></a>
chore: update known checksums for 0.6.17 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/384">#384</a>)</li>
<li><a
href="c7f87aa956"><code>c7f87aa</code></a>
bump to v6 in README (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/382">#382</a>)</li>
<li><a
href="aadfaf08d6"><code>aadfaf0</code></a>
Change default cache-dependency-glob (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/352">#352</a>)</li>
<li><a
href="a0f9da6273"><code>a0f9da6</code></a>
No default UV_CACHE_DIR on selfhosted runners (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/380">#380</a>)</li>
<li><a
href="ec4c691628"><code>ec4c691</code></a>
new inputs activate-environment and working-directory (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/381">#381</a>)</li>
<li><a
href="aa1290542e"><code>aa12905</code></a>
chore: update known checksums for 0.6.16 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/378">#378</a>)</li>
<li><a
href="fcaddda076"><code>fcaddda</code></a>
chore: update known checksums for 0.6.15 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/377">#377</a>)</li>
<li><a
href="fb3a0a97fa"><code>fb3a0a9</code></a>
log info on venv activation (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/375">#375</a>)</li>
<li>See full diff in <a
href="https://github.com/astral-sh/setup-uv/compare/v5...v6">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
**Description:** This is a document change regarding integration with
package `langchain-anthropic` for newly released websearch tool ([Claude
doc](https://docs.anthropic.com/en/docs/build-with-claude/tool-use/web-search-tool)).
Issue 1: The sample in [Web Search
section](https://python.langchain.com/docs/integrations/chat/anthropic/#web-search)
did not run. You would get an error as below:
```
File "my_file.py", line 170, in call
model_with_tools = model.bind_tools([websearch_tool])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/langchain_anthropic/chat_models.py", line 1363, in bind_tools
tool if _is_builtin_tool(tool) else convert_to_anthropic_tool(tool)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/langchain_anthropic/chat_models.py", line 1645, in convert_to_anthropic_tool
input_schema=oai_formatted["parameters"],
~~~~~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'parameters'
```
This is because websearch tool is only recently supported in
langchain-anthropic==0.3.13`, in [0.3.13
release](https://github.com/langchain-ai/langchain/releases?q=tag%3A%22langchain-anthropic%3D%3D0%22&expanded=true)
mentioning:
> anthropic[patch]: support web search
(https://github.com/langchain-ai/langchain/pull/31157)
Issue 2: The current doc has outdated package requirements for Websearch
tool: "This guide requires langchain-anthropic>=0.3.10".
Changes:
- Updated the required `langchain-anthropic` package version (0.3.10 ->
0.3.13).
- Added notes to user when using websearch sample.
I believe this will help avoid future confusion from readers.
**Issue:** N/A
**Dependencies:** N/A
**Twitter handle:** N/A
* Remove unnecessary cast of id -> str (can do with a field setting)
* Remove unnecessary `set_text` model validator (can be done with a
computed field - though we had to make some changes to the `Generation`
class to make this possible
Before: ~2.4s
Blue circles represent time spent in custom validators :(
<img width="1337" alt="Screenshot 2025-05-14 at 10 10 12 AM"
src="https://github.com/user-attachments/assets/bb4f477f-4ee3-4870-ae93-14ca7f197d55"
/>
After: ~2.2s
<img width="1344" alt="Screenshot 2025-05-14 at 10 11 03 AM"
src="https://github.com/user-attachments/assets/99f97d80-49de-462f-856f-9e7e8662adbc"
/>
We still want to optimize the backwards compatible tool calls model
validator, though I think this might involve breaking changes, so wanted
to separate that into a different PR. This is circled in green.
**Description:** Before this commit, if one record is batched in more
than 32k rows for sqlite3 >= 3.32 or more than 999 rows for sqlite3 <
3.31, the `record_manager.delete_keys()` will fail, as we are creating a
query with too many variables.
This commit ensures that we are batching the delete operation leveraging
the `cleanup_batch_size` as it is already done for `full` cleanup.
Added unit tests for incremental mode as well on different deleting
batch size.
1. Removes summation of `ChatGenerationChunk` from hot loops in `stream`
and `astream`
2. Removes run id gen from loop as well (minor impact)
Again, benchmarking on processing ~200k chunks (a poem about broccoli).
Before: ~4.2s
Blue circle is all the time spent adding up gen chunks
<img width="1345" alt="Screenshot 2025-05-14 at 7 48 33 AM"
src="https://github.com/user-attachments/assets/08a59d78-134d-4cd3-9d54-214de689df51"
/>
After: ~2.3s
Blue circle is remaining time spent on adding chunks, which can be
minimized in a future PR by optimizing the `merge_content`,
`merge_dicts`, and `merge_lists` utilities.
<img width="1353" alt="Screenshot 2025-05-14 at 7 50 08 AM"
src="https://github.com/user-attachments/assets/df6b3506-929e-4b6d-b198-7c4e992c6d34"
/>
1. Remove `shielded` decorator from non-end event handlers
2. Exit early with a `self.handlers` check instead of doing unnecessary
asyncio work
Using a benchmark that processes ~200k chunks (a poem about broccoli).
Before: ~15s
Circled in blue is unnecessary event handling time. This is addressed by
point 2 above
<img width="1347" alt="Screenshot 2025-05-14 at 7 37 53 AM"
src="https://github.com/user-attachments/assets/675e0fed-8f37-46c0-90b3-bef3cb9a1e86"
/>
After: ~4.2s
The total time is largely reduced by the removal of the `shielded`
decorator, which holds little significance for non-end handlers.
<img width="1348" alt="Screenshot 2025-05-14 at 7 37 22 AM"
src="https://github.com/user-attachments/assets/54be8a3e-5827-4136-a87b-54b0d40fe331"
/>
Extend Google parameters in the embeddings tab to include Google GenAI
(Gemini)
**Description:** Update embeddings tab to include example for Google
GenAI (Gemini)
**Issue:** N/A
**Dependencies:** N/A
**Twitter handle:** N/A
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** Updates two notebooks in the how_to documentation to
reflect new loader interfaces and functionalities.
- **Issue:** Some how_to notebooks were still using loader interfaces
from previous versions of LangChain and did not demonstrate the latest
loader functionalities (e.g., extracting images with `ImageBlobParser`,
extracting tables in specific output formats, parsing documents using
Vision-Language Models with `ZeroxPDFLoader`, and using
`CloudBlobLoader` in the `GenericLoader`, etc.).
- **Dependencies:** `py-zerox`
- **Twitter handle:** @MarcMedlock2
---------
Co-authored-by: Marc Medlock <marc.medlock@octo.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
- [ ] **Docs Update**: "langchain-cloudflare: add env var references in
example notebooks"
- We've updated our Cloudflare integration example notebooks with
examples showing environmental variables to initialize the class
instances.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
Changed toolkit=ExampleTookit to toolkit = ExampleToolkit(...) in
tools.mdx file
- [ ] **PR message**: ***Changed toolkit=ExampleTookit to toolkit =
ExampleToolkit(...) in tools.mdx file
- [ ] ***
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: SiddharthAnandShorthillsAI <siddharth.anand@shorthills.ai>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
…map.ipynb
Update openweathermap markdown file for tools
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Replace the deprecated load_tools
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Some providers include (legacy) function calls in `additional_kwargs` in
addition to tool calls. We currently unpack both function calls and tool
calls if present, but OpenAI will raise 400 in this case.
This can come up if providers are mixed in a tool-calling loop. Example:
```python
from langchain.chat_models import init_chat_model
from langchain_core.messages import HumanMessage
from langchain_core.tools import tool
@tool
def get_weather(location: str) -> str:
"""Get weather at a location."""
return "It's sunny."
gemini = init_chat_model("google_genai:gemini-2.0-flash-001").bind_tools([get_weather])
openai = init_chat_model("openai:gpt-4.1-mini").bind_tools([get_weather])
input_message = HumanMessage("What's the weather in Boston?")
tool_call_message = gemini.invoke([input_message])
assert len(tool_call_message.tool_calls) == 1
tool_call = tool_call_message.tool_calls[0]
tool_message = get_weather.invoke(tool_call)
response = openai.invoke( # currently raises 400 / BadRequestError
[input_message, tool_call_message, tool_message]
)
```
Here we ignore function calls if tool calls are present.
**Description**: The 'inspect' package in python skips over the aliases
set in the schema of a pydantic model. This is a workound to include the
aliases from the original input.
**issue**: #31035
Cc: @ccurme @eyurtsev
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- **Description:** Integrated the Bright Data package to enable
Langchain users to seamlessly incorporate Bright Data into their agents.
- **Dependencies:** None
- **LinkedIn handle**:[Bright
Data](https://www.linkedin.com/company/bright-data)
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
**Description:** The Aerospike Vector Search vector store integration
has moved out of langchain-community and to its own repository,
https://github.com/aerospike/langchain-aerospike. This PR updates
langchain documentation to reference it.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
**Description:**
Updated the import path for `DoctranPropertyExtractor` from
`langchain_community.document_loaders` to
`langchain_community.document_transformers` in multiple locations to
reflect recent package structure changes. Also corrected a minor typo in
the word "variable".
**Issue:**
N/A
**Dependencies:**
N/A
**LinkedIn handle:** For shout out if announced [Asif
Mehmood](https://www.linkedin.com/in/asifmehmood1997/).
**Description:**
Fix the merge logic in `CharacterTextSplitter.split_text` so that when
using a regex lookahead separator (`is_separator_regex=True`) with
`keep_separator=False`, the raw pattern is not re-inserted between
chunks.
**Issue:**
Fixes#31136
**Dependencies:**
None
**Twitter handle:**
None
Since this is my first open-source PR, please feel free to point out any
mistakes, and I'll be eager to make corrections.
Anthropic updated how they report token counts during streaming today.
See changes to `MessageDeltaUsage` in [this
commit](2da00f26c5 (diff-1a396eba0cd9cd8952dcdb58049d3b13f6b7768ead1411888d66e28211f7bfc5)).
It's clean and simple to grab these fields from the final
`message_delta` event. However, some of them are typed as Optional, and
language
[here](e42451ab3f/src/anthropic/lib/streaming/_messages.py (L462))
suggests they may not always be present. So here we take the required
field from the `message_delta` event as we were doing previously, and
ignore the rest.
partners: (langchain-openai) total_tokens should not add 'Nonetype' t…
# PR Description
## Description
Fixed an issue in `langchain-openai` where `total_tokens` was
incorrectly adding `None` to an integer, causing a TypeError. The fix
ensures proper type checking before adding token counts.
## Issue
Fixes the TypeError traceback shown in the image where `'NoneType'`
cannot be added to an integer.
## Dependencies
None
## Twitter handle
None

Co-authored-by: qiulijie <qiulijie@yuaiweiwu.com>
**Library Repo Path Update **: "langchain-cloudflare"
We recently changed our `langchain-cloudflare` repo to allow for future
libraries.
Created a `libs` folder to hold `langchain-cloudflare` python package.
https://github.com/cloudflare/langchain-cloudflare/tree/main/libs/langchain-cloudflare
On `langchain`, updating `packages.yaml` to point to new
`libs/langchain-cloudflare` library folder.
This PR fixes a grammar issue in the sentence:
"A chat prompt is made up a of a list of messages..." → "A chat prompt
is made up of a list of messages. "
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** Update Pinecone notebook example
- **Issue:** N\A
- **Dependencies:** N\A
- **Twitter handle:** N\A
- [ x ] **Add tests and docs**: Just notebook updates
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** The deprecated initialize_agent functionality is
replaced with create_react_agent for the google tools. Also noticed a
potential issue with the non-existent "google-drive-search" which was
used in the old `google-drive.ipynb`. If this should be a by default
available tool, an issue should be opened to modify
langchain-community's `load_tools` accordingly.
- **Issue:** #29277
- **Dependencies:** No added dependencies
- **Twitter handle:** No Twitter account
This PR brings several improvements and modernizations to the
documentation around the Astra DB partner package.
- language alignment for better matching with the terms used in the
Astra DB docs
- updated several links to pages on said documentation
- for the `AstraDBVectorStore`, added mentions of the new features in
the overall `astra.mdx`
- for the vector store, rewritten/upgraded most of the usage example
notebook for a more straightforward experience able to highlight the
main usage patterns (including new ones such as the newly-introduced
"autodetect feature")
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
**What does this PR do?**
This PR replaces deprecated usages of ```.dict()``` with
```.model_dump()``` to ensure compatibility with Pydantic v2 and prepare
for v3, addressing the deprecation warning
```PydanticDeprecatedSince20``` as required in [Issue#
31103](https://github.com/langchain-ai/langchain/issues/31103).
**Changes made:**
* Replaced ```.dict()``` with ```.model_dump()``` in multiple locations
* Ensured consistency with Pydantic v2 migration guidelines
* Verified compatibility across affected modules
**Notes**
* This is a code maintenance and compatibility update
* Tested locally with Pydantic v2.11
* No functional logic changes; only internal method replacements to
prevent deprecation issues
When aggregating AIMessageChunks in a stream, core prefers the leftmost
non-null ID. This is problematic because:
- Core assigns IDs when they are null to `f"run-{run_manager.run_id}"`
- The desired meaningful ID might not be available until midway through
the stream, as is the case for the OpenAI Responses API.
For the OpenAI Responses API, we assign message IDs to the top-level
`AIMessage.id`. This works in `.(a)invoke`, but during `.(a)stream` the
IDs get overwritten by the defaults assigned in langchain-core. These
IDs
[must](https://community.openai.com/t/how-to-solve-badrequesterror-400-item-rs-of-type-reasoning-was-provided-without-its-required-following-item-error-in-responses-api/1151686/9)
be available on the AIMessage object to support passing reasoning items
back to the API (e.g., if not using OpenAI's `previous_response_id`
feature). We could add them elsewhere, but seeing as we've already made
the decision to store them in `.id` during `.(a)invoke`, addressing the
issue in core lets us fix the problem with no interface changes.
- **Description:** `ChatAnthropic.get_num_tokens_from_messages` does not
currently receive `kwargs` and pass those on to
`self._client.beta.messages.count_tokens`. This is a problem if you need
to pass specific options to `count_tokens`, such as the `thinking`
option. This PR fixes that.
- **Issue:** N/A
- **Dependencies:** None
- **Twitter handle:** @bengladwell
Co-authored-by: ccurme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
**Description**:
* Starting to put together some PR's to fix the typing around
`langchain-chroma` `filter` and `where_document` query filtering, as
mentioned:
https://github.com/langchain-ai/langchain/issues/30879https://github.com/langchain-ai/langchain/issues/30507
The typing of `dict[str, str]` is on the one hand too restrictive (marks
valid filter expressions as ill-typed) and also too permissive (allows
illegal filter expressions). That's not what this PR addresses though.
This PR just removes from the documentation some examples of filters
that are illegal, and also syntactically incorrect: (a) dictionaries
with keys like `$contains` but the key is missing quotation marks; (b)
dictionaries with multiple entries - this is illegal in Chroma filter
syntax and will raise an exception. (`{"foo": "bar", "qux": "baz"}`).
Filter dictionaries in Chroma must have one and one key only. Again this
is just the documentation issue, which is the lowest hanging fruit. I
also think we need to update the types for `filter` and `where_document`
to be (at the very least `dict[str, Any]`), or, since we have access to
Chroma's types, they should be `Where` and `WhereDocument` types. This
has a wider blast radius though, so I'm starting small.
This PR does not fix the issues mentioned above, it's just starting to
get the ball rolling, and cleaning up the documentation.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Really Him <hesereallyhim@proton.me>
This PR includes the following documentation fixes for the SAP HANA
Cloud vector store integration:
- Removed stale output from the `%pip install` code cell.
- Replaced an unrelated vectorstore documentation link on the provider
overview page.
- Renamed the provider from "SAP HANA" to "SAP HANA Cloud"
# What's Changed?
- [x] 1. docs: **docs/docs/integrations/chat/litellm.ipynb** : Updated
with docs for litellm_router since it has been moved into the
[langchain-litellm](https://github.com/Akshay-Dongare/langchain-litellm)
package along with ChatLiteLLM
- [x] 2. docs: **docs/docs/integrations/chat/litellm_router.ipynb** :
Deleted to avoid redundancy
- [x] 3. docs: **docs/docs/integrations/providers/litellm.mdx** :
Updated to reflect inclusion of ChatLiteLLMRouter class
- [x] Lint and test: Done
# Issue:
- [x] Related to the issue
https://github.com/langchain-ai/langchain/issues/30368
# About me
- [x] 🔗 LinkedIn:
[akshay-dongare](https://www.linkedin.com/in/akshay-dongare/)
Hi there, I'm Célina from 🤗,
This PR introduces support for Hugging Face's serverless Inference
Providers (documentation
[here](https://huggingface.co/docs/inference-providers/index)), allowing
users to specify different providers for chat completion and text
generation tasks.
This PR also removes the usage of `InferenceClient.post()` method in
`HuggingFaceEndpoint`, in favor of the task-specific `text_generation`
method. `InferenceClient.post()` is deprecated and will be removed in
`huggingface_hub v0.31.0`.
---
## Changes made
- bumped the minimum required version of the `huggingface-hub` package
to ensure compatibility with the latest API usage.
- added a `provider` field to `HuggingFaceEndpoint`, enabling users to
select the inference provider (e.g., 'cerebras', 'together',
'fireworks-ai'). Defaults to `hf-inference` (HF Inference API).
- replaced the deprecated `InferenceClient.post()` call in
`HuggingFaceEndpoint` with the task-specific `text_generation` method
for future-proofing, `post()` will be removed in huggingface-hub
v0.31.0.
- updated the `ChatHuggingFace` component:
- added async and streaming support.
- added support for tool calling.
- exposed underlying chat completion parameters for more granular
control.
- Added integration tests for `ChatHuggingFace` and updated the
corresponding unit tests.
✅ All changes are backward compatible.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Follow up to https://github.com/langchain-ai/langsmith-sdk/pull/1696,
I've bumped the `langsmith` version where applicable in `uv.lock`.
Type checking problems here because deps have been updated in
`pyproject.toml` and `uv lock` hasn't been run - we should enforce that
in the future - goes with the other dependabot todos :).
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
## Description
Added support for retrieving column comments in the SQL Database
utility. This feature allows users to see comments associated with
database columns when querying table information. Column comments
provide valuable metadata that helps LLMs better understand the
semantics and purpose of database columns.
A new optional parameter `get_col_comments` was added to the
`get_table_info` method, defaulting to `False` for backward
compatibility. When set to `True`, it retrieves and formats column
comments for each table.
Currently, this feature is supported on PostgreSQL, MySQL, and Oracle
databases.
## Implementation
You should create Table with column comments before.
```python
db = SQLDatabase.from_uri("YOUR_DB_URI")
print(db.get_table_info(get_col_comments=True))
```
## Result
```
CREATE TABLE test_table (
name VARCHAR
school VARCHAR)
/*
Column Comments: {'name': person name, 'school":school_name}
*/
/*
3 rows from test_table:
name
a
b
c
*/
```
## Benefits
1. Enhances LLM's understanding of database schema semantics
2. Preserves valuable domain knowledge embedded in database design
3. Improves accuracy of SQL query generation
4. Provides more context for data interpretation
Tests are available in
`langchain/libs/community/tests/test_sql_get_table_info.py`.
---------
Co-authored-by: chbae <chbae@gcsc.co.kr>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:**
This PR marks the `HanaDB` vector store (and related utilities) in
`langchain_community` as deprecated using the `@deprecated` annotation.
- Set `since="0.1.0"` and `removal="1.0"`
- Added a clear migration path and a link to the SAP-maintained
replacement in the
[`langchain_hana`](https://github.com/SAP/langchain-integration-for-sap-hana-cloud)
package.
Additionally, the example notebook has been updated to use the new
`HanaDB` class from `langchain_hana`, ensuring users follow the
recommended integration moving forward.
- **Issue:** None
- **Dependencies:** None
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
The `_chunk_size` has not changed by method `self._tokenize`, So i think
these is duplicate code.
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
This PR brings some much-needed updates to some of the Astra DB shorter
example notebooks,
- ensuring imports are from the partner package instead of the
(deprecated) community legacy package
- improving the wording in a few related places
- updating the constructor signature introduced with the latest partner
package's AstraDBLoader
- marking the community package counterpart of the LLM caches as
deprecated in the summary table at the end of the page.
This is a PR to return the message attachments in _get_response, as when
files are generated these attachments are not returned thus generated
files cannot be retrieved
Fixes issue: https://github.com/langchain-ai/langchain/issues/30851
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
community: fix browserbase integration
docs: update docs
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Updated BrowserbaseLoader to use the new python sdk.
- **Issue:** update browserbase integration with langchain
- **Dependencies:** n/a
- **Twitter handle:** @kylejeong21
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Following https://github.com/langchain-ai/langchain/pull/30909: need to
retain "empty" reasoning output when streaming, e.g.,
```python
{'id': 'rs_...', 'summary': [], 'type': 'reasoning'}
```
Tested by existing integration tests, which are currently failing.
This PR adds Google Gemini (via AI Studio and Gemini API). Feel free to
change the ordering, if needed.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This PR restructures the main Google integrations documentation page
(`docs/docs/integrations/providers/google.mdx`) for better clarity and
updates content.
**Key changes:**
* **Separated Sections:** Divided integrations into distinct `Google
Generative AI (Gemini API & AI Studio)`, `Google Cloud`, and `Other
Google Products` sections.
* **Updated Generative AI:** Refreshed the introduction and the `Google
Generative AI` section with current information and quickstart examples
for the Gemini API via `langchain-google-genai`.
* **Reorganized Content:** Moved non-Cloud Platform specific
integrations (e.g., Drive, GMail, Search tools, ScaNN) to the `Other
Google Products` section.
* **Cleaned Up:** Minor improvements to descriptions and code snippets.
This aims to make it easier for users to find the relevant Google
integrations based on whether they are using the Gemini API directly or
Google Cloud services.
| Before | After |
|-----------------------|------------|
| 
| 
|
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- [x] **PR title**: "community: add indexname to other functions in
opensearch"
- [x] **PR message**:
- **Description:** add ability to over-ride index-name if provided in
the kwargs of sub-functions. When used in WSGI application it's crucial
to be able to dynamically change parameters.
- [ ] **Add tests and docs**:
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Chat models currently implement support for:
- images in OpenAI Chat Completions format
- other multimodal types (e.g., PDF and audio) in a cross-provider
[standard
format](https://python.langchain.com/docs/how_to/multimodal_inputs/)
Here we update core to extend support to PDF and audio input in Chat
Completions format. **If an OAI-format PDF or audio content block is
passed into any chat model, it will be transformed to the LangChain
standard format**. We assume that any chat model supporting OAI-format
PDF or audio has implemented support for the standard format.
`mixtral-8x-7b-instruct` was recently retired from Fireworks Serverless.
Here we remove the default model altogether, so that the model must be
explicitly specified on init:
```python
ChatFireworks(model="accounts/fireworks/models/llama-v3p1-70b-instruct") # for example
```
We also set a null default for `temperature`, which previously defaulted
to 0.0. This parameter will no longer be included in request payloads
unless it is explicitly provided.
**langchain_openai: Support of reasoning summary streaming**
**Description:**
OpenAI API now supports streaming reasoning summaries for reasoning
models (o1, o3, o3-mini, o4-mini). More info about it:
https://platform.openai.com/docs/guides/reasoning#reasoning-summaries
It is supported only in Responses API (not Completion API), so you need
to create LangChain Open AI model as follows to support reasoning
summaries streaming:
```
llm = ChatOpenAI(
model="o4-mini", # also o1, o3, o3-mini support reasoning streaming
use_responses_api=True, # reasoning streaming works only with responses api, not completion api
model_kwargs={
"reasoning": {
"effort": "high", # also "low" and "medium" supported
"summary": "auto" # some models support "concise" summary, some "detailed", but auto will always work
}
}
)
```
Now, if you stream events from llm:
```
async for event in llm.astream_events(prompt, version="v2"):
print(event)
```
or
```
for chunk in llm.stream(prompt):
print (chunk)
```
OpenAI API will send you new types of events:
`response.reasoning_summary_text.added`
`response.reasoning_summary_text.delta`
`response.reasoning_summary_text.done`
These events are new, so they were ignored. So I have added support of
these events in function `_convert_responses_chunk_to_generation_chunk`,
so reasoning chunks or full reasoning added to the chunk
additional_kwargs.
Example of how this reasoning summary may be printed:
```
async for event in llm.astream_events(prompt, version="v2"):
if event["event"] == "on_chat_model_stream":
chunk: AIMessageChunk = event["data"]["chunk"]
if "reasoning_summary_chunk" in chunk.additional_kwargs:
print(chunk.additional_kwargs["reasoning_summary_chunk"], end="")
elif "reasoning_summary" in chunk.additional_kwargs:
print("\n\nFull reasoning step summary:", chunk.additional_kwargs["reasoning_summary"])
elif chunk.content and chunk.content[0]["type"] == "text":
print(chunk.content[0]["text"], end="")
```
or
```
for chunk in llm.stream(prompt):
if "reasoning_summary_chunk" in chunk.additional_kwargs:
print(chunk.additional_kwargs["reasoning_summary_chunk"], end="")
elif "reasoning_summary" in chunk.additional_kwargs:
print("\n\nFull reasoning step summary:", chunk.additional_kwargs["reasoning_summary"])
elif chunk.content and chunk.content[0]["type"] == "text":
print(chunk.content[0]["text"], end="")
```
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
PR title:
docs: add Valyu integration documentation
Description:
This PR adds documentation and example notebooks for the Valyu
integration, including retriever and tool usage.
Issue:
N/A
Dependencies:
No new dependencies.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
- [x] **PR message**:
- **Description:** Updates the documentation for the
langchain-predictionguard package, adding tool calling functionality and
some new parameters.
PR Summary
This change adds a fallback in ChatAnthropic.with_structured_output() to
handle Pydantic models that don’t include a docstring. Without it,
calling:
```py
from pydantic import BaseModel
from langchain_anthropic import ChatAnthropic
class SampleModel(BaseModel):
sample_field: str
llm = ChatAnthropic(
model="claude-3-7-sonnet-latest"
).with_structured_output(SampleModel.model_json_schema())
llm.invoke("test")
```
will raise a
```
KeyError: 'description'
```
because Pydantic omits the description field when no docstring is
present.
This issue doesn’t occur when using ChatOpenAI or if you add a docstring
to the model:
```py
from pydantic import BaseModel
from langchain_openai import ChatOpenAI
class SampleModel(BaseModel):
"""Schema for sample_field output."""
sample_field: str
llm = ChatOpenAI(
model="gpt-4o-mini"
).with_structured_output(SampleModel.model_json_schema())
llm.invoke("test")
```
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Addresses #30158
When using the output parser—either in a chain or standalone—hitting
max_tokens triggers a misleading “missing variable” error instead of
indicating the output was truncated. This subtle bug often surfaces with
Anthropic models.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
PR title:
Community: Add bind variable support for oracle adb docloader
Description:
This PR adds support of using bind variable to oracle adb doc loader
class, including minor document change.
Issue:
N/A
Dependencies:
No new dependencies.
This is a follow-on PR to go with the identical changes that were made
in parters/openai.
Previous PR: https://github.com/langchain-ai/langchain/pull/30757
When calling embed_documents and providing a chunk_size argument, that
argument is ignored when OpenAIEmbeddings is instantiated with its
default configuration (where check_embedding_ctx_length=True).
_get_len_safe_embeddings specifies a chunk_size parameter but it's not
being passed through in embed_documents, which is its only caller. This
appears to be an oversight, especially given that the
_get_len_safe_embeddings docstring states it should respect "the set
embedding context length and chunk size."
Developers typically expect method parameters to take effect (also, take
precedence) when explicitly provided, especially when instantiating
using defaults. I was confused as to why my API calls were being
rejected regardless of the chunk size I provided.
When calling `embed_documents` and providing a `chunk_size` argument,
that argument is ignored when `OpenAIEmbeddings` is instantiated with
its default configuration (where `check_embedding_ctx_length=True`).
`_get_len_safe_embeddings` specifies a `chunk_size` parameter but it's
not being passed through in `embed_documents`, which is its only caller.
This appears to be an oversight, especially given that the
`_get_len_safe_embeddings` docstring states it should respect "the set
embedding context length and chunk size."
Developers typically expect method parameters to take effect (also, take
precedence) when explicitly provided, especially when instantiating
using defaults. I was confused as to why my API calls were being
rejected regardless of the chunk size I provided.
This bug also exists in langchain_community package. I can add that to
this PR if requested otherwise I will create a new one once this passes.
**Description:**
partners-anthropic: ChatAnthropic supports b64 and urls in the
part[image_url][url] message variable
**Issue**:
ChatAnthropic right now only supports b64 encoded images in the
part[image_url][url] message variable. This PR enables ChatAnthropic to
also accept image urls in said variable and makes it compatible with
OpenAI messages to make model switching easier.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
SingleStore integration now has its package `langchain-singlestore', so
the community implementation will no longer be maintained.
Added `deprecated` decorator to `SingleStoreDBChatMessageHistory`,
`SingleStoreDBSemanticCache`, and `SingleStoreDB` classes in the
community package.
**Dependencies:** https://github.com/langchain-ai/langchain/pull/30841
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Support "usage_metadata" for LiteLLM streaming calls.
This is a follow-up to
https://github.com/langchain-ai/langchain/pull/30625, which tackled
non-streaming calls.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
- [ ] **PR message**:
- **Description:** including metadata_field in
max_marginal_relevance_search() would result in error, changed the logic
to be similar to how it's handled in similarity_search, where it can be
any field or simply a "*" to include every field
* Remove unused ignores
* Add type ignore codes
* Add mypy rule `warn_unused_ignores`
* Add ruff rule PGH003
NB: some `type: ignore[unused-ignore]` are added because the ignores are
needed when `extended_testing_deps.txt` deps are installed.
We only need to rebuild model schemas if type annotation information
isn't available during declaration - that shouldn't be the case for
these types corrected here.
Need to do more thorough testing to make sure these structures have
complete schemas, but hopefully this boosts startup / import time.
- [ ] **PR title**: "docs: adding Smabbler's Galaxia integration"
- [ ] **PR message**: **Twitter handle:** @Galaxia_graph
I'm adding docs here + added the package to the packages.yml. I didn't
add a unit test, because this integration is just a thin wrapper on top
of our API. There isn't much left to test if you mock it away.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:** This PR adds provider inference logic to
`init_chat_model` for Perplexity models that use the "sonar..." prefix
(`sonar`, `sonar-pro`, `sonar-reasoning`, `sonar-reasoning-pro` or
`sonar-deep-research`).
This allows users to initialize these models by simply passing the model
name, without needing to explicitly set `model_provider="perplexity"`.
The docstring for `init_chat_model` has also been updated to reflect
this new inference rule.
https://github.com/langchain-ai/langchain/pull/30778 (not released)
broke all invocation modes of ChatOllama (intent was to remove
`"message"` from `generation_info`, but we turned `generation_info` into
`stream_resp["message"]`), resulting in validation errors.
On core releases, we check out the latest published package for
langchain-openai and langchain-anthropic and run their tests against the
candidate version of langchain-core.
Because these packages have a local install of langchain-tests, we also
need to check out the previous version of langchain-tests.
TL;DR: you can't optimize imports with a lazy `__getattr__` if there is
a namespace conflict with a module name and an attribute name. We should
avoid introducing conflicts like this in the future.
This PR fixes a bug introduced by my lazy imports PR:
https://github.com/langchain-ai/langchain/pull/30769.
In `langchain_core`, we have utilities for loading and dumping data.
Unfortunately, one of those utilities is a `load` function, located in
`langchain_core/load/load.py`. To make this function more visible, we
make it accessible at the top level `langchain_core.load` module via
importing the function in `langchain_core/load/__init__.py`.
So, either of these imports should work:
```py
from langchain_core.load import load
from langchain_core.load.load import load
```
As you can tell, this is already a bit confusing. You'd think that the
first import would produce the module `load`, but because of the
`__init__.py` shortcut, both produce the function `load`.
<details> More on why the lazy imports PR broke this support...
All was well, except when the absolute import was run first, see the
last snippet:
```
>>> from langchain_core.load import load
>>> load
<function load at 0x101c320c0>
```
```
>>> from langchain_core.load.load import load
>>> load
<function load at 0x1069360c0>
```
```
>>> from langchain_core.load import load
>>> load
<function load at 0x10692e0c0>
>>> from langchain_core.load.load import load
>>> load
<function load at 0x10692e0c0>
```
```
>>> from langchain_core.load.load import load
>>> load
<function load at 0x101e2e0c0>
>>> from langchain_core.load import load
>>> load
<module 'langchain_core.load.load' from '/Users/sydney_runkle/oss/langchain/libs/core/langchain_core/load/load.py'>
```
In this case, the function `load` wasn't stored in the globals cache for
the `langchain_core.load` module (by the lazy import logic), so Python
defers to a module import.
</details>
New `langchain` tongue twister 😜: we've created a problem for ourselves
because you have to load the load function from the load file in the
load module 😨.
Fix CI to trigger benchmarks on `run-codspeed-benchmarks` label addition
Reduce scope of async benchmark to save time on CI
Waiting to merge this PR until we figure out how to use walltime on
local runners.
Most easily reviewed with the "hide whitespace" option toggled.
Seeing 10-50% speed ups in import time for common structures 🚀
The general purpose of this PR is to lazily import structures within
`langchain_core.XXX_module.__init__.py` so that we're not eagerly
importing expensive dependencies (`pydantic`, `requests`, etc).
Analysis of flamegraphs generated with `importtime` motivated these
changes. For example, the one below demonstrates that importing
`HumanMessage` accidentally triggered imports for `importlib.metadata`,
`requests`, etc.
There's still much more to do on this front, and we can start digging
into our own internal code for optimizations now that we're less
concerned about external imports.
<img width="1210" alt="Screenshot 2025-04-11 at 1 10 54 PM"
src="https://github.com/user-attachments/assets/112a3fe7-24a9-4294-92c1-d5ae64df839e"
/>
I've tracked the improvements with some local benchmarks:
## `pytest-benchmark` results
| Name | Before (s) | After (s) | Delta (s) | % Change |
|-----------------------------|------------|-----------|-----------|----------|
| Document | 2.8683 | 1.2775 | -1.5908 | -55.46% |
| HumanMessage | 2.2358 | 1.1673 | -1.0685 | -47.79% |
| ChatPromptTemplate | 5.5235 | 2.9709 | -2.5526 | -46.22% |
| Runnable | 2.9423 | 1.7793 | -1.163 | -39.53% |
| InMemoryVectorStore | 3.1180 | 1.8417 | -1.2763 | -40.93% |
| RunnableLambda | 2.7385 | 1.8745 | -0.864 | -31.55% |
| tool | 5.1231 | 4.0771 | -1.046 | -20.42% |
| CallbackManager | 4.2263 | 3.4099 | -0.8164 | -19.32% |
| LangChainTracer | 3.8394 | 3.3101 | -0.5293 | -13.79% |
| BaseChatModel | 4.3317 | 3.8806 | -0.4511 | -10.41% |
| PydanticOutputParser | 3.2036 | 3.2995 | 0.0959 | 2.99% |
| InMemoryRateLimiter | 0.5311 | 0.5995 | 0.0684 | 12.88% |
Note the lack of change for `InMemoryRateLimiter` and
`PydanticOutputParser` is just random noise, I'm getting comparable
numbers locally.
## Local CodSpeed results
We're still working on configuring CodSpeed on CI. The local usage
produced similar results.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
This PR fixes an issue where ChatPerplexity would raise an
AttributeError when the citations attribute was missing from the model
response (e.g., when using offline models like r1-1776).
The fix checks for the presence of citations, images, and
related_questions before attempting to access them, avoiding crashes in
models that don't provide these fields.
Tested locally with models that omit citations, and the fix works as
expected.
Hey LangChain community! 👋 Excited to propose official documentation for
our new openGauss integration that brings powerful vector capabilities
to the stack!
### What's Inside 📦
1. **Full Integration Guide**
Introducing
[langchain-opengauss](https://pypi.org/project/langchain-opengauss/) on
PyPI - your new toolkit for:
🔍 Native hybrid search (vectors + metadata)
🚀 Production-grade connection pooling
🧩 Automatic schema management
2. **Rigorous Testing Passed** ✅

- 100% non-async test coverage
ps: Current implementation resides in my personal repository:
https://github.com/mpb159753/langchain-opengauss, How can I transfer
process to langchain-ai org?? *Keen to hear your thoughts and make this
integration shine!* ✨
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Looks like `pyupgrade` was already used here but missed some docs and
tests.
This helps to keep our docs looking professional and up to date.
Eventually, we should lint / format our inline docs.
**Description:** Replaced the example with the deprecated
`intialize_agent` function with `create_react_agent` from
`langgraph.prebuild`
**Issue:** #29277
**Dependencies:** N/A
**Twitter handle:** N/A
**Description:** add support for oauth2 in Jira tool by adding the
possibility to pass a dictionary with oauth parameters. I also adapted
the documentation to show this new behavior
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
This PR aims to reduce import time of `langchain-core` tools by removing
the `importlib.metadata` import previously used in `__init__.py`. This
is the first in a sequence of PRs to reduce import time delays for
`langchain-core` features and structures 🚀.
Because we're now hard coding the version, we need to make sure
`version.py` and `pyproject.toml` stay in sync, so I've added a new CI
job that runs whenever either of those files are modified. [This
run](https://github.com/langchain-ai/langchain/actions/runs/14358012706/job/40251952044?pr=30744)
demonstrates the failure that occurs whenever the version gets out of
sync (thus blocking a PR).
Before, note the ~15% of time spent on the `importlib.metadata` /related
imports
<img width="1081" alt="Screenshot 2025-04-09 at 9 06 15 AM"
src="https://github.com/user-attachments/assets/59f405ec-ee8d-4473-89ff-45dea5befa31"
/>
After (note, lack of `importlib.metadata` time sink):
<img width="1245" alt="Screenshot 2025-04-09 at 9 01 23 AM"
src="https://github.com/user-attachments/assets/9c32e77c-27ce-485e-9b88-e365193ed58d"
/>
Description:
This PR adds documentation for the langchain-cloudflare integration
package.
Issue:
N/A
Dependencies:
No new dependencies are required.
Tests and Docs:
Added an example notebook demonstrating the usage of the
langchain-cloudflare package, located in docs/docs/integrations.
Added a new package to libs/packages.yml.
Lint and Format:
Successfully ran make format and make lint.
---------
Co-authored-by: Collier King <collier@cloudflare.com>
Co-authored-by: Collier King <collierking99@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
Hi there, This is a complementary PR to #30733.
This PR introduces support for Hugging Face's serverless Inference
Providers (documentation
[here](https://huggingface.co/docs/inference-providers/index)), allowing
users to specify different providers
This PR also removes the usage of `InferenceClient.post()` method in
`HuggingFaceEndpointEmbeddings`, in favor of the task-specific
`feature_extraction` method. `InferenceClient.post()` is deprecated and
will be removed in `huggingface_hub` v0.31.0.
## Changes made
- bumped the minimum required version of the `huggingface_hub` package
to ensure compatibility with the latest API usage.
- added a provider field to `HuggingFaceEndpointEmbeddings`, enabling
users to select the inference provider.
- replaced the deprecated `InferenceClient.post()` call in
`HuggingFaceEndpointEmbeddings` with the task-specific
`feature_extraction` method for future-proofing, `post()` will be
removed in `huggingface-hub` v0.31.0.
✅ All changes are backward compatible.
---------
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
* Only run codspeed logic when `libs/core` is changed (for now, we'll
want to add other benchmarks later
* Also run on `master` so that we can get a reference :)
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
The first in a sequence of PRs focusing on improving performance in
core. We're starting with reducing import times for common structures,
hence the benchmarks here.
The benchmark looks a little bit complicated - we have to use a process
so that we don't suffer from Python's import caching system. I tried
doing manual modification of `sys.modules` between runs, but that's
pretty tricky / hacky to get right, hence the subprocess approach.
Motivated by extremely slow baseline for common imports (we're talking
2-5 seconds):
<img width="633" alt="Screenshot 2025-04-09 at 12 48 12 PM"
src="https://github.com/user-attachments/assets/994616fe-1798-404d-bcbe-48ad0eb8a9a0"
/>
Also added a `make benchmark` command to make local runs easy :).
Currently using walltimes so that we can track total time despite using
a manual proces.
Google vertex ai search will now return the title of the found website
as part of the document metadata, if available.
Thank you for contributing to LangChain!
- **Description**: Vertex AI Search can be used to index websites and
then develop chatbots that use these websites to answer questions. At
present, the document metadata includes an `id` and `source` (which is
the URL). While the URL is enough to create a link, the ID is not
descriptive enough to show users. Therefore, I propose we return `title`
as well, when available (e.g., it will not be available in `.txt`
documents found during the website indexing).
- **Issue**: No bug in particular, but it would be better if this was
here.
- **Dependencies**: None
- I do not use twitter.
Format, Lint and Test seem to be all good.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Generally, this PR is CI performance focused + aims to clean up some
dependencies at the same time.
1. Unpins upper bounds for `numpy` in all `pyproject.toml` files where
`numpy` is specified
2. Requires `numpy >= 2.1.0` for Python 3.13 and `numpy > v1.26.0` for
Python 3.12, plus a `numpy` min version bump for `chroma`
3. Speeds up CI by minutes - linting on Python 3.13, installing `numpy <
2.1.0` was taking [~3
minutes](https://github.com/langchain-ai/langchain/actions/runs/14316342925/job/40123305868?pr=30713),
now the entire env setup takes a few seconds
4. Deleted the `numpy` test dependency from partners where that was not
used, specifically `huggingface`, `voyageai`, `xai`, and `nomic`.
It's a bit unfortunate that `langchain-community` depends on `numpy`, we
might want to try to fix that in the future...
Closes https://github.com/langchain-ai/langchain/issues/26026
Fixes https://github.com/langchain-ai/langchain/issues/30555
Resolves https://github.com/langchain-ai/langchain/issues/30724
The [prompt in
langchain-hub](https://smith.langchain.com/hub/langchain-ai/sql-query-system-prompt)
used in this guide was composed of just a system message, but the guide
did not add a human message to it. This was incompatible with some
providers (and is generally not a typical usage pattern).
The prompt in prompt hub has been updated to split the question into a
separate HumanMessage. Here we update the guide to reflect this.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
Tool-calling tests started intermittently failing with
> groq.APIError: Failed to call a function. Please adjust your prompt.
See 'failed_generation' for more details.
**Description:** The error message was supposed to display the missing
vector name, but instead, it includes only the existing collection
configs.
This simple PR just includes the correct variable name, so that the user
knows the requested vector does not exist in the collection.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Signed-off-by: Tin Lai <tin@tinyiu.com>
Description
This PR updates the docs for the
[langchain-hyperbrowser](https://pypi.org/project/langchain-hyperbrowser/)
package. It adds a few tools
- Scrape Tool
- Crawl Tool
- Extract Tool
- Browser Agents
- Claude Computer Use
- OpenAI CUA
- Browser Use
[Hyperbrowser](https://hyperbrowser.ai/) is a platform for running and
scaling headless browsers. It lets you launch and manage browser
sessions at scale and provides easy to use solutions for any webscraping
needs, such as scraping a single page or crawling an entire site.
Issue
None
Dependencies
None
Twitter Handle
`@hyperbrowser`
## Docs: Add Google Calendar Toolkit Documentation
### Description:
This PR adds documentation for the Google Calendar Toolkit as part of
the `langchain-google` repository. Refer to the related PR: [community:
Add Google Calendar
Toolkit](https://github.com/langchain-ai/langchain-google/pull/688).
### Issue:
N/A
### Twitter handle:
@jorgejrzz
**Description:**
Fixed a bug in `BaseCallbackManager.remove_handler()` that caused a
`ValueError` when removing a handler added via the constructor's
`handlers` parameter. The issue occurred because handlers passed to the
constructor were added only to the `handlers` list and not automatically
to `inheritable_handlers` unless explicitly specified. However,
`remove_handler()` attempted to remove the handler from both lists
unconditionally, triggering a `ValueError` when it wasn't in
`inheritable_handlers`.
The fix ensures the method checks for the handler’s presence in each
list before attempting removal, making it more robust while preserving
its original behavior.
**Issue:** Fixes#30640
**Dependencies:** None
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- **Description:** We do not need to set parser in `scrape` since it is
already been done in `_scrape`
- **Issue:** #30629, not directly related but makes sure xml parser is
used
This pull request includes various changes to the `langchain_core`
library, focusing on improving compatibility with different versions of
Pydantic. The primary change involves replacing checks for Pydantic
major versions with boolean flags, which simplifies the code and
improves readability.
This also solves ruff rule checks for
[RUF048](https://docs.astral.sh/ruff/rules/map-int-version-parsing/) and
[PLR2004](https://docs.astral.sh/ruff/rules/magic-value-comparison/).
Key changes include:
### Compatibility Improvements:
*
[`libs/core/langchain_core/output_parsers/json.py`](diffhunk://#diff-5add0cf7134636ae4198a1e0df49ee332ae0c9123c3a2395101e02687c717646L22-R24):
Replaced `PYDANTIC_MAJOR_VERSION` with `IS_PYDANTIC_V1` to check for
Pydantic version 1.
*
[`libs/core/langchain_core/output_parsers/pydantic.py`](diffhunk://#diff-2364b5b4aee01c462aa5dbda5dc3a877dcd20f29df173ad540dc8adf8b192361L14-R14):
Updated version checks from `PYDANTIC_MAJOR_VERSION` to `IS_PYDANTIC_V2`
in the `PydanticOutputParser` class.
[[1]](diffhunk://#diff-2364b5b4aee01c462aa5dbda5dc3a877dcd20f29df173ad540dc8adf8b192361L14-R14)
[[2]](diffhunk://#diff-2364b5b4aee01c462aa5dbda5dc3a877dcd20f29df173ad540dc8adf8b192361L27-R27)
### Utility Enhancements:
*
[`libs/core/langchain_core/utils/pydantic.py`](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896R23):
Introduced `IS_PYDANTIC_V1` and `IS_PYDANTIC_V2` flags and deprecated
the `get_pydantic_major_version` function. Updated various functions to
use these flags instead of version numbers.
[[1]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896R23)
[[2]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896R42-R78)
[[3]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L90-R89)
[[4]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L104-R101)
[[5]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L120-R122)
[[6]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L135-R132)
[[7]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L149-R151)
[[8]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L164-R161)
[[9]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L248-R250)
[[10]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L330-R335)
[[11]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L356-R357)
[[12]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L393-R390)
[[13]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L403-R400)
### Test Updates:
*
[`libs/core/tests/unit_tests/output_parsers/test_openai_tools.py`](diffhunk://#diff-694cc0318edbd6bbca34f53304934062ad59ba9f5a788252ce6c5f5452489d67L19-R22):
Updated tests to use `IS_PYDANTIC_V1` and `IS_PYDANTIC_V2` for version
checks.
[[1]](diffhunk://#diff-694cc0318edbd6bbca34f53304934062ad59ba9f5a788252ce6c5f5452489d67L19-R22)
[[2]](diffhunk://#diff-694cc0318edbd6bbca34f53304934062ad59ba9f5a788252ce6c5f5452489d67L532-R535)
[[3]](diffhunk://#diff-694cc0318edbd6bbca34f53304934062ad59ba9f5a788252ce6c5f5452489d67L567-R570)
[[4]](diffhunk://#diff-694cc0318edbd6bbca34f53304934062ad59ba9f5a788252ce6c5f5452489d67L602-R605)
*
[`libs/core/tests/unit_tests/prompts/test_chat.py`](diffhunk://#diff-3e60e744842086a4f3c4b21bc83e819c3435720eab210078e77e2430fb8c7e84R7):
Replaced version tuple checks with `PYDANTIC_VERSION` comparisons.
[[1]](diffhunk://#diff-3e60e744842086a4f3c4b21bc83e819c3435720eab210078e77e2430fb8c7e84R7)
[[2]](diffhunk://#diff-3e60e744842086a4f3c4b21bc83e819c3435720eab210078e77e2430fb8c7e84L35-R38)
[[3]](diffhunk://#diff-3e60e744842086a4f3c4b21bc83e819c3435720eab210078e77e2430fb8c7e84L924-R927)
[[4]](diffhunk://#diff-3e60e744842086a4f3c4b21bc83e819c3435720eab210078e77e2430fb8c7e84L935-R938)
*
[`libs/core/tests/unit_tests/runnables/test_graph.py`](diffhunk://#diff-99a290330ef40103d0ce02e52e21310d6fadea142bfdea13c94d23fc81c0bb5dR3):
Simplified version checks using `PYDANTIC_VERSION`.
[[1]](diffhunk://#diff-99a290330ef40103d0ce02e52e21310d6fadea142bfdea13c94d23fc81c0bb5dR3)
[[2]](diffhunk://#diff-99a290330ef40103d0ce02e52e21310d6fadea142bfdea13c94d23fc81c0bb5dL15-R18)
[[3]](diffhunk://#diff-99a290330ef40103d0ce02e52e21310d6fadea142bfdea13c94d23fc81c0bb5dL234-L239)
*
[`libs/core/tests/unit_tests/runnables/test_runnable.py`](diffhunk://#diff-06bed920c0dad0cfd41d57a8d9e47a7b56832409649c10151061a791860d5bb5L18-R20):
Introduced `PYDANTIC_VERSION_AT_LEAST_29` and
`PYDANTIC_VERSION_AT_LEAST_210` for more readable version checks.
[[1]](diffhunk://#diff-06bed920c0dad0cfd41d57a8d9e47a7b56832409649c10151061a791860d5bb5L18-R20)
[[2]](diffhunk://#diff-06bed920c0dad0cfd41d57a8d9e47a7b56832409649c10151061a791860d5bb5L92-R99)
[[3]](diffhunk://#diff-06bed920c0dad0cfd41d57a8d9e47a7b56832409649c10151061a791860d5bb5L230-R233)
[[4]](diffhunk://#diff-06bed920c0dad0cfd41d57a8d9e47a7b56832409649c10151061a791860d5bb5L652-R655)
Add ruff rules:
* FIX: https://docs.astral.sh/ruff/rules/#flake8-fixme-fix
* TD: https://docs.astral.sh/ruff/rules/#flake8-todos-td
Code cleanup:
*
[`libs/core/langchain_core/outputs/chat_generation.py`](diffhunk://#diff-a1017ee46f58fa4005b110ffd4f8e1fb08f6a2a11d6ca4c78ff8be641cbb89e5L56-R56):
Removed the "HACK" prefix from a comment in the `set_text` method.
Configuration adjustments:
*
[`libs/core/pyproject.toml`](diffhunk://#diff-06baaee12b22a370fef9f170c9ed13e2727e377d3b32f5018430f4f0a39d3537R85-R93):
Added new rules `FIX002`, `TD002`, and `TD003` to the ignore list.
*
[`libs/core/pyproject.toml`](diffhunk://#diff-06baaee12b22a370fef9f170c9ed13e2727e377d3b32f5018430f4f0a39d3537L102-L108):
Removed the `FIX` and `TD` rules from the ignore list.
Test refinement:
*
[`libs/core/tests/unit_tests/runnables/test_runnable.py`](diffhunk://#diff-06bed920c0dad0cfd41d57a8d9e47a7b56832409649c10151061a791860d5bb5L3231-R3232):
Updated a TODO comment to improve clarity in the `test_map_stream`
function.
- [ ] **PR title**: "community: Removes pandas dependency for using
DuckDB for similarity search"
- [ ] **PR message**:
- **Description:** Removes pandas dependency for using DuckDB for
similarity search. The old function still exists as
`similarity_search_pd`, while the new one is at `similarity_search` and
requires no code changes. Return format remains the same.
- **Issue:** Issue #29933 and update on PR #30435
- **Dependencies:** No dependencies
LangChain QwQ allows non-Tongyi users to access thinking models with
extra capabilities which serve as an extension to Alibaba Cloud.
Hi @ccurme I'm back with the updated PR this time with documentation and
a finished package.
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- **Description:** adds documentation of `langchain-qwq` integration
package. Also adds it to Alibaba Cloud provider
- **Issue:** #30580#30317#30579
- **Dependencies:** openai, json-repair
- **Twitter handle:** YigitBekir
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
**Description:**
Adds support for Riza custom runtimes to the two Riza code interpreter
tools, allowing users to run LLM-generated code that depends on
libraries outside stdlib.
**Issue:** N/A
**Dependencies:** None
**Twitter handle:** @rizaio
## Description:
This PR adds the necessary documentation for the `langchain-runpod`
partner package integration. It includes:
* A provider page (`docs/docs/integrations/providers/runpod.ipynb`)
explaining the overall setup.
* An LLM component page (`docs/docs/integrations/llms/runpod.ipynb`)
detailing the `RunPod` class usage.
* A Chat Model component page
(`docs/docs/integrations/chat/runpod.ipynb`) detailing the `ChatRunPod`
class usage, including a feature support table.
These documentation files reflect the latest features of the
`langchain-runpod` package (v0.2.0+) such as async support and API
polling logic.
This work also addresses the review feedback provided on the previous
attempt in PR #30246 by:
* Removing all TODOs from documentation.
* Adding the required links between provider and component pages.
* Completing the feature support table in the chat documentation.
* Linking to the source code on GitHub for API reference.
Finally, it registers the `langchain-runpod` package in
`libs/packages.yml`.
## Dependencies:
None added to the core LangChain repository by these documentation
changes. The required dependency (`langchain-runpod`) is managed as a
separate package.
## Twitter handle:
@runpod_io
---------
Co-authored-by: Max Forsey <maxpod@maxpod.local>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [x] Fix Tool description of SerpAPI tool: "docs: Fix SerpAPI tool
description"
- [ ] Fix SerpAPI tool description:
- Tool description + name in example initialization of the SerpAPI tool
was still that of the python repl tool.
- @RLHoeppi
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Plus, some accompanying docs updates
Some compelling usage:
```py
from langchain_perplexity import ChatPerplexity
chat = ChatPerplexity(model="llama-3.1-sonar-small-128k-online")
response = chat.invoke(
"What were the most significant newsworthy events that occurred in the US recently?",
extra_body={"search_recency_filter": "week"},
)
print(response.content)
# > Here are the top significant newsworthy events in the US recently: ...
```
Also, some confirmation of structured outputs:
```py
from langchain_perplexity import ChatPerplexity
from pydantic import BaseModel
class AnswerFormat(BaseModel):
first_name: str
last_name: str
year_of_birth: int
num_seasons_in_nba: int
messages = [
{"role": "system", "content": "Be precise and concise."},
{
"role": "user",
"content": (
"Tell me about Michael Jordan. "
"Please output a JSON object containing the following fields: "
"first_name, last_name, year_of_birth, num_seasons_in_nba. "
),
},
]
llm = ChatPerplexity(model="llama-3.1-sonar-small-128k-online")
structured_llm = llm.with_structured_output(AnswerFormat)
response = structured_llm.invoke(messages)
print(repr(response))
#> AnswerFormat(first_name='Michael', last_name='Jordan', year_of_birth=1963, num_seasons_in_nba=15)
```
Perplexity's importance in the space has been growing, so we think it's
time to add an official integration!
Note: following the release of `langchain-perplexity` to `pypi`, we
should be able to add `perplexity` as an extra in
`libs/langchain/pyproject.toml`, but we're blocked by a circular import
for now.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Support "usage_metadata" for LiteLLM.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Related to https://github.com/langchain-ai/langchain/issues/30344https://github.com/langchain-ai/langchain/pull/30542 introduced an
erroneous test for token counts for o-series models. tiktoken==0.8 does
not support o-series models in
`tiktoken.encoding_for_model(model_name)`, and this is the version of
tiktoken we had in the lock file. So we would default to `cl100k_base`
for o-series, which is the wrong encoding model. The test tested against
this wrong encoding (so it passed with tiktoken 0.8).
Here we update tiktoken to 0.9 in the lock file, and fix the expected
counts in the test. Verified that we are pulling
[o200k_base](https://github.com/openai/tiktoken/blob/main/tiktoken/model.py#L8),
as expected.
Description:
This PR adds documentation for the langchain-oxylabs integration
package.
The documentation includes instructions for configuring Oxylabs
credentials and provides example code demonstrating how to use the
package.
Issue:
N/A
Dependencies:
No new dependencies are required.
Tests and Docs:
Added an example notebook demonstrating the usage of the
Langchain-Oxylabs package, located in docs/docs/integrations.
Added a provider page in docs/docs/providers.
Added a new package to libs/packages.yml.
Lint and Test:
Successfully ran make format, make lint, and make test.
- **Description:** Propagates config_factories when calling decoration
methods for RunnableBinding--e.g. bind, with_config, with_types,
with_retry, and with_listeners. This ensures that configs attached to
the original RunnableBinding are kept when creating the new
RunnableBinding and the configs are merged during invocation. Picks up
where #30551 left off.
- **Issue:** #30531
Co-authored-by: ccurme <chester.curme@gmail.com>
## Description
This PR adds a new `sitemap_url` parameter to the `GitbookLoader` class
that allows users to specify a custom sitemap URL when loading content
from a GitBook site. This is particularly useful for GitBook sites that
use non-standard sitemap file names like `sitemap-pages.xml` instead of
the default `sitemap.xml`.
The standard `GitbookLoader` assumes that the sitemap is located at
`/sitemap.xml`, but some GitBook instances (including GitBook's own
documentation) use different paths for their sitemaps. This parameter
makes the loader more flexible and helps users extract content from a
wider range of GitBook sites.
## Issue
Fixes bug
[30473](https://github.com/langchain-ai/langchain/issues/30473) where
the `GitbookLoader` would fail to find pages on GitBook sites that use
custom sitemap URLs.
## Dependencies
No new dependencies required.
*I've added*:
* Unit tests to verify the parameter works correctly
* Integration tests to confirm the parameter is properly used with real
GitBook sites
* Updated docstrings with parameter documentation
The changes are fully backward compatible, as the parameter is optional
with a sensible default.
---------
Co-authored-by: andrasfe <andrasf94@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
This pull request updates the `pyproject.toml` configuration file to
modify the linting rules and ignored warnings for the project. The most
important changes include switching to a more comprehensive selection of
linting rules and updating the list of ignored rules to better align
with the project's requirements.
Linting rules update:
* Changed the `select` option to include all available linting rules by
setting it to `["ALL"]`.
Ignored rules update:
* Updated the `ignore` option to include specific rules that interfere
with the formatter, are incompatible with Pydantic, or are temporarily
excluded due to project constraints.
This PR addresses two key issues:
- **Prevent history errors from failing silently**: Previously, errors
in message history were only logged and not raised, which can lead to
inconsistent state and downstream failures (e.g., ValidationError from
Bedrock due to malformed message history). This change ensures that such
errors are raised explicitly, making them easier to detect and debug.
(Side note: I’m using AWS Lambda Powertools Logger but hadn’t configured
it properly with the standard Python logger—my bad. If the error had
been raised, I would’ve seen it in the logs 😄) This is a **BREAKING
CHANGE**
- **Add messages in bulk instead of iteratively**: This introduces a
custom add_messages method to add all messages at once. The previous
approach failed silently when individual messages were too large,
resulting in partial history updates and inconsistent state. With this
change, either all messages are added successfully, or none are—helping
avoid obscure history-related errors from Bedrock.
---------
Co-authored-by: Kacper Wlodarczyk <kacper.wlodarczyk@chaosgears.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
**Description:**
Fixes a bug in the YoutubeLoader where FetchedTranscript objects were
not properly processed. The loader was only extracting the 'text'
attribute from FetchedTranscriptSnippet objects while ignoring 'start'
and 'duration' attributes. This would cause a TypeError when the code
later tried to access these missing keys, particularly when using the
CHUNKS format or any code path that needed timestamp information.
This PR modifies the conversion of FetchedTranscriptSnippet objects to
include all necessary attributes, ensuring that the loader works
correctly with all transcript formats.
**Issue:** Fixes#30309
**Dependencies:** None
**Testing:**
- Tested the fix with multiple YouTube videos to confirm it resolves the
issue
- Verified that both regular loading and CHUNKS format work correctly
- **Description:**
- Make Brave Search Tool consistent with other tools and allow reading
its api key from `BRAVE_SEARCH_API_KEY` instead of having to pass the
api key manually (no breaking changes)
- Improve Brave Search Tool by storing api key in `SecretStr` instead of
plain `str`.
- Add unit test for `BraveSearchWrapper`
- Reflect the changes in the documentation
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** ivan_brko
Release notes: https://pydantic.dev/articles/pydantic-v2-11-release
Covered here:
- We no longer access `model_fields` on class instances (that is now
deprecated);
- Update schema normalization for Pydantic version testing to reflect
changes to generated JSON schema (addition of `"additionalProperties":
True` for dict types with value Any or object).
## Considerations:
### Changes to JSON schema generation
#### Tool-calling / structured outputs
This may impact tool-calling + structured outputs for some providers,
but schema generation only changes if you have parameters of the form
`dict`, `dict[str, Any]`, `dict[str, object]`, etc. If dict parameters
are typed my understanding is there are no changes.
For OpenAI for example, untyped dicts work for structured outputs with
default settings before and after updating Pydantic, and error both
before/after if `strict=True`.
### Use of `model_fields`
There is one spot where we previously accessed `super(cls,
self).model_fields`, where `cls` is an object in the MRO. This was done
for the purpose of tracking aliases in secrets. I've updated this to
always be `type(self).model_fields`-- see comment in-line for detail.
---------
Co-authored-by: Sydney Runkle <54324534+sydney-runkle@users.noreply.github.com>
- **Description:** Add samba nova cloud embeddings docs, only
samabastudio embeddings were supported, now in the latest release of
langchan_sambanova sambanova cloud embeddings is also available
Broken source/docs links for Runnable methods
### What was changed
Added the `with_config` method to the method lists in both Runnable
template files:
- docs/api_reference/templates/runnable_non_pydantic.rst
- docs/api_reference/templates/runnable_pydantic.rst
# Community: update RankLLM integration and fix LangChain deprecation
- [x] **Description:**
- Removed `ModelType` enum (`VICUNA`, `ZEPHYR`, `GPT`) to align with
RankLLM's latest implementation.
- Updated `chain({query})` to `chain.invoke({query})` to resolve
LangChain 0.1.0 deprecation warnings from
https://github.com/langchain-ai/langchain/pull/29840.
- [x] **Dependencies:** No new dependencies added.
- [x] **Tests and Docs:**
- Updated RankLLM documentation
(`docs/docs/integrations/document_transformers/rankllm-reranker.ipynb`).
- Fixed LangChain usage in related code examples.
- [x] **Lint and Test:**
- Ran `make format`, `make lint`, and verified functionality after
updates.
- No breaking changes introduced.
```
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in langchain.
If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
```
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:**
This PR addresses the loss of partially initialised variables when
composing different prompts. I.e. it allows the following snippet to
run:
```python
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([('system', 'Prompt {x} {y}')]).partial(x='1')
appendix = ChatPromptTemplate.from_messages([('system', 'Appendix {z}')])
(prompt + appendix).invoke({'y': '2', 'z': '3'})
```
Previously, this would have raised a `KeyError`, stating that variable
`x` remains undefined.
**Issue**
References issue #30049
**Todo**
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Please see PR #27678 for context
## Overview
This pull request presents a refactor of the `HTMLHeaderTextSplitter`
class aimed at improving its maintainability and readability. The
primary enhancements include simplifying the internal structure by
consolidating multiple private helper functions into a single private
method, thereby reducing complexity and making the codebase easier to
understand and extend. Importantly, all existing functionalities and
public interfaces remain unchanged.
## PR Goals
1. **Simplify Internal Logic**:
- **Consolidation of Private Methods**: The original implementation
utilized multiple private helper functions (`_header_level`,
`_dom_depth`, `_get_elements`) to manage different aspects of HTML
parsing and document generation. This fragmentation increased cognitive
load and potential maintenance overhead.
- **Streamlined Processing**: By merging these functionalities into a
single private method (`_generate_documents`), the class now offers a
more straightforward flow, making it easier for developers to trace and
understand the processing steps. (Thanks to @eyurtsev)
2. **Enhance Readability**:
- **Clearer Method Responsibilities**: With fewer private methods, each
method now has a more focused responsibility. The primary logic resides
within `_generate_documents`, which handles both HTML traversal and
document creation in a cohesive manner.
- **Reduced Redundancy**: Eliminating redundant checks and consolidating
logic reduces the code's verbosity, making it more concise without
sacrificing clarity.
3. **Improve Maintainability**:
- **Easier Debugging and Extension**: A simplified internal structure
allows for quicker identification of issues and easier implementation of
future enhancements or feature additions.
- **Consistent Header Management**: The new implementation ensures that
headers are managed consistently within a single context, reducing the
likelihood of bugs related to header scope and hierarchy.
4. **Maintain Backward Compatibility**:
- **Unchanged Public Interface**: All public methods (`split_text`,
`split_text_from_url`, `split_text_from_file`) and their signatures
remain unchanged, ensuring that existing integrations and usage patterns
are unaffected.
- **Preserved Docstrings**: Comprehensive docstrings are retained,
providing clear documentation for users and developers alike.
## Detailed Changes
1. **Removed Redundant Private Methods**:
- **Eliminated `_header_level`, `_dom_depth`, and `_get_elements`**:
These methods were merged into the `_generate_documents` method,
centralizing the logic for HTML parsing and document generation.
2. **Consolidated Document Generation Logic**:
- **Single Private Method `_generate_documents`**: This method now
handles the entire process of parsing HTML, tracking active headers,
managing document chunks, and yielding `Document` instances. This
consolidation reduces the number of moving parts and simplifies the
overall processing flow.
3. **Simplified Header Management**:
- **Immediate Header Scope Handling**: Headers are now managed within
the traversal loop of `_generate_documents`, ensuring that headers are
added or removed from the active headers dictionary in real-time based
on their DOM depth and hierarchy.
- **Removed `chunk_dom_depth` Attribute**: The need to track chunk DOM
depth separately has been eliminated, as header scopes are now directly
managed within the traversal logic.
4. **Streamlined Chunk Finalization**:
- **Enhanced `finalize_chunk` Function**: The chunk finalization process
has been simplified to directly yield a single `Document` when needed,
without maintaining an intermediate list. This change reduces
unnecessary list operations and makes the logic more straightforward.
5. **Improved Variable Naming and Flow**:
- **Descriptive Variable Names**: Variables such as `current_chunk` and
`node_text` provide clear insights into their roles within the
processing logic.
- **Direct Header Removal Logic**: Headers that are out of scope are
removed immediately during traversal, ensuring that the active headers
dictionary remains accurate and up-to-date.
6. **Preserved Comprehensive Docstrings**:
- **Unchanged Documentation**: All existing docstrings, including
class-level and method-level documentation, remain intact. This ensures
that users and developers continue to have access to detailed usage
instructions and method explanations.
## Testing
All existing test cases from `test_html_header_text_splitter.py` have
been executed against the refactored code. The results confirm that:
- **Functionality Remains Intact**: The splitter continues to accurately
parse HTML content, respect header hierarchies, and produce the expected
`Document` objects with correct metadata.
- **Backward Compatibility is Maintained**: No changes were required in
the test cases, and all tests pass without modifications, demonstrating
that the refactor does not introduce any regressions or alter existing
behaviors.
This example remains fully operational and behaves as before, returning
a list of `Document` objects with the expected metadata and content
splits.
## Conclusion
This refactor achieves a more maintainable and readable codebase by
simplifying the internal structure of the `HTMLHeaderTextSplitter`
class. By consolidating multiple private methods into a single, cohesive
private method, the class becomes easier to understand, debug, and
extend. All existing functionalities are preserved, and comprehensive
tests confirm that the refactor maintains the expected behavior. These
changes align with LangChain’s standards for clean, maintainable, and
efficient code.
---
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
This pull request adds documentation and a tutorial for integrating the
[Vectorize](https://vectorize.io/) service with LangChain. The most
important changes include adding a new documentation page for Vectorize
and creating a Jupyter notebook that demonstrates how to use the
Vectorize retriever.
The source code for the langchain-vectorize package can be found
[here](https://github.com/vectorize-io/integrations-python/tree/main/langchain).
Previews:
*
https://langchain-git-fork-cbornet-vectorize-langchain.vercel.app/docs/integrations/providers/vectorize/
*
https://langchain-git-fork-cbornet-vectorize-langchain.vercel.app/docs/integrations/retrievers/vectorize/
Documentation updates:
*
[`docs/docs/integrations/providers/vectorize.mdx`](diffhunk://#diff-7e00d4ce4768f73b4d381a7c7b1f94d138f1b27ebd08e3666b942630a0285606R1-R40):
Added a new documentation page for Vectorize, including an overview of
its features, installation instructions, and a basic usage example.
Tutorial updates:
*
[`docs/docs/integrations/retrievers/vectorize.ipynb`](diffhunk://#diff-ba5bb9a1b4586db7740944b001bcfeadc88be357640ded0c82a329b11d8d6e29R1-R294):
Created a Jupyter notebook tutorial that shows how to set up the
Vectorize environment, create a RAG pipeline, and use the LangChain
Vectorize retriever. The notebook includes steps for account creation,
token generation, environment setup, and pipeline deployment.
This can only be reviewed by [hiding
whitespaces](https://github.com/langchain-ai/langchain/pull/30302/files?diff=unified&w=1).
The motivation behind this PR is to get my hands on the docs and make
the LangSmith teasing short and clear.
Right now I don't know how to do it, but this could be an include in the
future.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
This PR includes support for HANA dialect in SQLDatabase, which is a
wrapper class for SQLAlchemy.
Currently, it is unable to set schema name when using HANA DB with
Langchain. And, it does not show any message to user so that it makes
hard for user to figure out why the SQL does not work as expected.
Here is the reference document for HANA DB to set schema for the
session.
- [SET SCHEMA Statement (Session
Management)](https://help.sap.com/docs/SAP_HANA_PLATFORM/4fe29514fd584807ac9f2a04f6754767/20fd550375191014b886a338afb4cd5f.html)
**partners: Enable max_retries in ChatMistralAI**
**Description**
- This pull request reactivates the retry logic in the
completion_with_retry method of the ChatMistralAI class, restoring the
intended functionality of the previously ineffective max_retries
parameter. New unit test that mocks failed/successful retry calls and an
integration test to confirm end-to-end functionality.
**Issue**
- Closes#30362
**Dependencies**
- No additional dependencies required
Co-authored-by: andrasfe <andrasf94@gmail.com>
This pull request includes fixes in documentation for PDF loaders to
correct the names of the loaders and the required installations. The
most important changes include updating the loader names and
installation instructions in the Jupyter notebooks.
Documentation fixes:
*
[`docs/docs/integrations/document_loaders/pdfminer.ipynb`](diffhunk://#diff-a4a0561cd4a6e876ea34b7182de64a452060b921bb32d37b02e6a7980a41729bL34-R34):
Changed references from `PyMuPDFLoader` to `PDFMinerLoader` and updated
the installation instructions to replace `pymupdf` with `pdfminer`.
[[1]](diffhunk://#diff-a4a0561cd4a6e876ea34b7182de64a452060b921bb32d37b02e6a7980a41729bL34-R34)
[[2]](diffhunk://#diff-a4a0561cd4a6e876ea34b7182de64a452060b921bb32d37b02e6a7980a41729bL63-R63)
[[3]](diffhunk://#diff-a4a0561cd4a6e876ea34b7182de64a452060b921bb32d37b02e6a7980a41729bL330-R330)
*
[`docs/docs/integrations/document_loaders/pymupdf.ipynb`](diffhunk://#diff-8487995f457e33daa2a08fdcff3b42e144eca069eeadfad5651c7c08cce7a5cdL292-R292):
Corrected the loader name from `PDFPlumberLoader` to `PyMuPDFLoader`.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Adding back a section of the Elasticsearch
vectorstore documentation that was deleted in [this
commit]([a72fddbf8d (diff-4988344c6ccc08191f89ac1ebf1caab5185e13698d7567fde5352038cd950d77))).
The only change I've made is to update the example RRF request, which
was out of date.
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
This pull request includes enhancements to the `perplexity.py` file in
the `chat_models` module, focusing on improving the handling of
additional keyword arguments (`additional_kwargs`) in message processing
methods. Additionally, new unit tests have been added to ensure the
correct inclusion of citations, images, and related questions in the
`additional_kwargs`.
Issue: resolves https://github.com/langchain-ai/langchain/issues/30439
Enhancements to `perplexity.py`:
*
[`libs/community/langchain_community/chat_models/perplexity.py`](diffhunk://#diff-d3e4d7b277608683913b53dcfdbd006f0f4a94d110d8b9ac7acf855f1f22207fL208-L212):
Modified the `_convert_delta_to_message_chunk`, `_stream`, and
`_generate` methods to handle `additional_kwargs`, which include
citations, images, and related questions.
[[1]](diffhunk://#diff-d3e4d7b277608683913b53dcfdbd006f0f4a94d110d8b9ac7acf855f1f22207fL208-L212)
[[2]](diffhunk://#diff-d3e4d7b277608683913b53dcfdbd006f0f4a94d110d8b9ac7acf855f1f22207fL277-L286)
[[3]](diffhunk://#diff-d3e4d7b277608683913b53dcfdbd006f0f4a94d110d8b9ac7acf855f1f22207fR324-R331)
New unit tests:
*
[`libs/community/tests/unit_tests/chat_models/test_perplexity.py`](diffhunk://#diff-dab956d79bd7d17a0f5dea3f38ceab0d583b43b63eb1b29138ee9b6b271ba1d9R119-R275):
Added new tests `test_perplexity_stream_includes_citations_and_images`
and `test_perplexity_stream_includes_citations_and_related_questions` to
verify that the `stream` method correctly includes citations, images,
and related questions in the `additional_kwargs`.
When OpenAI originally released `stream_options` to enable token usage
during streaming, it was not supported in AzureOpenAI. It is now
supported.
Like the [OpenAI
SDK](f66d2e6fdc/src/openai/resources/completions.py (L68)),
ChatOpenAI does not return usage metadata during streaming by default
(which adds an extra chunk to the stream). The OpenAI SDK requires users
to pass `stream_options={"include_usage": True}`. ChatOpenAI implements
a convenience argument `stream_usage: Optional[bool]`, and an attribute
`stream_usage: bool = False`.
Here we extend this to AzureChatOpenAI by moving the `stream_usage`
attribute and `stream_usage` kwarg (on `_(a)stream`) from ChatOpenAI to
BaseChatOpenAI.
---
Additional consideration: we must be sensitive to the number of users
using BaseChatOpenAI to interact with other APIs that do not support the
`stream_options` parameter.
Suppose OpenAI in the future updates the default behavior to stream
token usage. Currently, BaseChatOpenAI only passes `stream_options` if
`stream_usage` is True, so there would be no way to disable this new
default behavior.
To address this, we could update the `stream_usage` attribute to
`Optional[bool] = None`, but this is technically a breaking change (as
currently values of False are not passed to the client). IMO: if / when
this change happens, we could accompany it with this update in a minor
bump.
---
Related previous PRs:
- https://github.com/langchain-ai/langchain/pull/22628
- https://github.com/langchain-ai/langchain/pull/22854
- https://github.com/langchain-ai/langchain/pull/23552
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Thank you for contributing to LangChain!
**PR title**: Docs Update for vectara
**Description:** Vectara is moved as langchain partner package and
updating the docs according to that.
Thank you for contributing to LangChain!
- **Description:** Azure Document Intelligence OCR solution has a
*feature* parameter that enables some features such as high-resolution
document analysis, key-value pairs extraction, ... In langchain parser,
you could be provided as a `analysis_feature` parameter to the
constructor that was passed on the `DocumentIntelligenceClient`.
However, according to the `DocumentIntelligenceClient` [API
Reference](https://learn.microsoft.com/en-us/python/api/azure-ai-documentintelligence/azure.ai.documentintelligence.documentintelligenceclient?view=azure-python),
this is not a valid constructor parameter. It was therefore remove and
instead stored as a parser property that is used in the
`begin_analyze_document`'s `features` parameter (see [API
Reference](https://learn.microsoft.com/en-us/python/api/azure-ai-formrecognizer/azure.ai.formrecognizer.documentanalysisclient?view=azure-python#azure-ai-formrecognizer-documentanalysisclient-begin-analyze-document)).
I also removed the check for "Supported features" since all features are
supported out-of-the-box. Also I did not check if the provided `str`
actually corresponds to the Azure package enumeration of features, since
the `ValueError` when creating the enumeration object is pretty
explicit.
Last caveat, is that some features are not supported for some kind of
documents. This is documented inside Microsoft documentation and
exception are also explicit.
- **Issue:** N/A
- **Dependencies:** No
- **Twitter handle:** @Louis___A
---------
Co-authored-by: Louis Auneau <louis@handshakehealth.co>
## **Description:**
The Jupyter notebooks in the docs section are extremely useful and
critical for widespread adoption of LangChain amongst new developers.
However, because they are also converted to MDX and used to build the
HTML for the Docusaurus site, they contain JSX code that degrades
readability when opened in a "notebook" setting (local notebook server,
google colab, etc.). For instance, here we see the website, with a nice
React tab component for installation instructions (`pip` vs `conda`):

Now, here is the same notebook viewed in colab:

Note that the text following "To install LangChain run:" contains
snippets of JSX code that is (i) confusing, (ii) bad for readability,
(iii) potentially misleading for a novice developer, who might take it
literally to mean that "to install LangChain I should run `import Tabs
from...`" and then an ill-formed command which mixes the `pip` and
`conda` installation instructions.
Ideally, we would like to have a system that presents a
similar/equivalent UI when viewing the notebooks on the documentation
site, or when interacting with them in a notebook setting - or, at a
minimum, we should not present ill-formed JSX snippets to someone trying
to execute the notebooks. As the documentation itself states, running
the notebooks yourself is a great way to learn the tools. Therefore,
these distracting and ill-formed snippets are contrary to that goal.
## **Fixes:**
* Comment out the JSX code inside the notebook
`docs/tutorials/llm_chain` with a special directive `<!-- HIDE_IN_NB`
(closed with `HIDE_IN_NB -->`). This makes the JSX code "invisible" when
viewed in a notebook setting.
* Add a custom preprocessor that runs process_cell and just erases these
comment strings. This makes sure they are rendered when converted to
MDX.
* Minor tweak: Refactor some of the Markdown instructions into an
executable codeblock for better experience when running as a notebook.
* Minor tweak: Optionally try to get the environment variables from a
`.env` file in the repo so the user doesn't have to enter it every time.
Depends on the user installing `python-dotenv` and adding their own
`.env` file.
* Add an environment variable for "LANGSMITH_PROJECT"
(default="default"), per the LangSmith docs, so a local user can target
a specific project in their LangSmith account.
**NOTE:** If this PR is approved, and the maintainers agree with the
general goal of aligning the notebook execution experience and the doc
site UI, I would plan to implement this on the rest of the JSX snippets
that are littered in the notebooks.
**NOTE:** I wasn't able to/don't know how to run the linkcheck Makefile
commands.
- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Really Him <hesereallyhim@proton.me>
We are implementing a token-counting callback handler in
`langchain-core` that is intended to work with all chat models
supporting usage metadata. The callback will aggregate usage metadata by
model. This requires responses to include the model name in its
metadata.
To support this, if a model `returns_usage_metadata`, we check that it
includes a string model name in its `response_metadata` in the
`"model_name"` key.
More context: https://github.com/langchain-ai/langchain/pull/30487
Thank you for contributing to LangChain!
**Description:**
Since we just implemented
[langchain-memgraph](https://pypi.org/project/langchain-memgraph/)
integration, we are adding basic docs to [your site based on this
comment](https://github.com/langchain-ai/langchain/pull/30197#pullrequestreview-2671616410)
from @ccurme .
**Twitter handle:**
[@memgraphdb](https://x.com/memgraphdb)
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Stripped-down version of
[OpenAICallbackHandler](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/callbacks/openai_info.py)
that just tracks `AIMessage.usage_metadata`.
```python
from langchain_core.callbacks import get_usage_metadata_callback
from langgraph.prebuilt import create_react_agent
def get_weather(location: str) -> str:
"""Get the weather at a location."""
return "It's sunny."
tools = [get_weather]
agent = create_react_agent("openai:gpt-4o-mini", tools)
with get_usage_metadata_callback() as cb:
result = await agent.ainvoke({"messages": "What's the weather in Boston?"})
print(cb.usage_metadata)
```
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** fix typo
- **Issue:** -
- **Dependencies:** -
- **Twitter handle:** -
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Description: Extend the gremlin graph schema to include the edge
properties, grouped by its triples; i.e: `inVLabel` and `outVLabel`.
This should give more context when crafting queries to run against a
gremlin graph db
This pull request includes extensive documentation updates for the
`ChatPerplexity` class in the
`libs/community/langchain_community/chat_models/perplexity.py` file. The
changes provide detailed setup instructions, key initialization
arguments, and usage examples for various functionalities of the
`ChatPerplexity` class.
Documentation improvements:
* Added setup instructions for installing the `openai` package and
setting the `PPLX_API_KEY` environment variable.
* Documented key initialization arguments for completion parameters and
client parameters, including `model`, `temperature`, `max_tokens`,
`streaming`, `pplx_api_key`, `request_timeout`, and `max_retries`.
* Provided examples for instantiating the `ChatPerplexity` class,
invoking it with messages, using structured output, invoking with
perplexity-specific parameters, streaming responses, and accessing token
usage and response metadata.Thank you for contributing to LangChain!
This pull request includes updates to the
`docs/docs/integrations/chat/perplexity.ipynb` file to enhance the
documentation for `ChatPerplexity`. The changes focus on demonstrating
the use of Perplexity-specific parameters and supporting structured
outputs for Tier 3+ users.
Enhancements to documentation:
* Added a new markdown cell explaining the use of Perplexity-specific
parameters through the `ChatPerplexity` class, including parameters like
`search_domain_filter`, `return_images`, `return_related_questions`, and
`search_recency_filter` using the `extra_body` parameter.
* Added a new code cell demonstrating how to invoke `ChatPerplexity`
with the `extra_body` parameter to filter search recency.
Support for structured outputs:
* Added a new markdown cell explaining that `ChatPerplexity` supports
structured outputs for Tier 3+ users.
* Added a new code cell demonstrating how to use `ChatPerplexity` with
structured outputs by defining a `BaseModel` class and invoking the chat
with structured output.[Copilot is generating a summary...]Thank you for
contributing to LangChain!
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Hello!
I have reopened a pull request for tool integration.
Please refer to the previous
[PR](https://github.com/langchain-ai/langchain/pull/30248).
I understand that for the tool integration, a separate package should be
created, and only the documentation should be added under docs/docs/. If
there are any other procedures, please let me know.
[langchain-naver-community](https://github.com/e7217/langchain-naver-community)
cc: @ccurme
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Hi @ccurme!
Thanks so much for helping with getting the Contextual documentation
merged last time. We added the reranker to our provider's documentation!
Please let me know if there's any issues with it! Would love to also
work with your team on an announcement for this! 🙏
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** updates contextual provider documentation to include
information about our reranker, also includes documentation for
contextual's reranker in the retrievers section
- **Twitter handle:** https://x.com/ContextualAI/highlights
docs have been added
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Description: Update vector store tab inits to match either the docs or
api_ref (whichever was more comprehensive)
List of changes per vector stores:
- In-memory
- no change
- AstraDB
- match to docs - docs/api_refs match (excluding embeddings)
- Chroma
- match to docs - api_refs is less descriptive
- FAISS
- match to docs - docs/api_refs match (excluding embeddings)
- Milvus
- match to docs to use Milvus Lite with Flat index - api_refs does not
have index_param for generalization
- MongoDB
- match to docs - api_refs are sparser
- PGVector
- match to api_ref
- changed to include docker cmd directly in code
- docs/api_ref has comment to view docker command in separate code block
- Pinecone
- match to api_refs - docs have code dispersed
- Qdrant
- match to api_ref - docs has size=3072, api_ref has size=1536
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:**
a third party package not listed in the default valid namespaces cannot
pass test_serdes because the load() does not allow for extending the
valid_namespaces.
test_serdes will fail with -
ValueError: Invalid namespace: {'lc': 1, 'type': 'constructor', 'id':
['langchain_other', 'chat_models', 'ChatOther'], 'kwargs':
{'model_name': '...', 'api_key': '...'}, 'name': 'ChatOther'}
this change has test_serdes automatically extend valid_namespaces based
off the ChatModel under test's namespace.
this_row_id previously used UUID v1. However, since UUID v1 can be
predicted if the MAC address and timestamp are known, it poses a
potential security risk. Therefore, it has been changed to UUID v4.
added warning when duckdb is used as a vectorstore without pandas being
installed (currently used for similarity search result processing)
Thank you for contributing to LangChain!
- [ ] **PR title**: "community: added warning when duckdb is used as a
vectorstore without pandas"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** displays a warning when using duckdb as a vector
store without pandas being installed, as it is used by the
`similarity_search` function
- **Issue:** #29933
- **Dependencies:** None
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Deepseek model does not return reasoning when hosted on openrouter
(Issue [30067](https://github.com/langchain-ai/langchain/issues/30067))
the following code did not return reasoning:
```python
llm = ChatDeepSeek( model = 'deepseek/deepseek-r1:nitro', api_base="https://openrouter.ai/api/v1", api_key=os.getenv("OPENROUTER_API_KEY"))
messages = [
{"role": "system", "content": "You are an assistant."},
{"role": "user", "content": "9.11 and 9.8, which is greater? Explain the reasoning behind this decision."}
]
response = llm.invoke(messages, extra_body={"include_reasoning": True})
print(response.content)
print(f"REASONING: {response.additional_kwargs.get('reasoning_content', '')}")
print(response)
```
The fix is to extract reasoning from
response.choices[0].message["model_extra"] and from
choices[0].delta["reasoning"]. and place in response additional_kwargs.
Change is really just the addition of a couple one-sentence if
statements.
---------
Co-authored-by: andrasfe <andrasf94@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Fix several typos in docs/docs/how_to/split_html.ipynb
* `structered` should be `structured`
* `signifcant` should be `significant`
* `seperator` should be `separator`
# Description
This PR adds reasoning model support for `langchain-ollama` by
extracting reasoning token blocks, like those used in deepseek. It was
inspired by
[ollama-deep-researcher](https://github.com/langchain-ai/ollama-deep-researcher),
specifically the parsing of [thinking
blocks](6d1aaf2139/src/assistant/graph.py (L91)):
```python
# TODO: This is a hack to remove the <think> tags w/ Deepseek models
# It appears very challenging to prompt them out of the responses
while "<think>" in running_summary and "</think>" in running_summary:
start = running_summary.find("<think>")
end = running_summary.find("</think>") + len("</think>")
running_summary = running_summary[:start] + running_summary[end:]
```
This notes that it is very hard to remove the reasoning block from
prompting, but we actually want the model to reason in order to increase
model performance. This implementation extracts the thinking block, so
the client can still expect a proper message to be returned by
`ChatOllama` (and use the reasoning content separately when desired).
This implementation takes the same approach as
[ChatDeepseek](5d581ba22c/libs/partners/deepseek/langchain_deepseek/chat_models.py (L215)),
which adds the reasoning content to
chunk.additional_kwargs.reasoning_content;
```python
if hasattr(response.choices[0].message, "reasoning_content"): # type: ignore
rtn.generations[0].message.additional_kwargs["reasoning_content"] = (
response.choices[0].message.reasoning_content # type: ignore
)
```
This should probably be handled upstream in ollama + ollama-python, but
this seems like a reasonably effective solution. This is a standalone
example of what is happening;
```python
async def deepseek_message_astream(
llm: BaseChatModel,
messages: list[BaseMessage],
config: RunnableConfig | None = None,
*,
model_target: str = "deepseek-r1",
**kwargs: Any,
) -> AsyncIterator[BaseMessageChunk]:
"""Stream responses from Deepseek models, filtering out <think> tags.
Args:
llm: The language model to stream from
messages: The messages to send to the model
Yields:
Filtered chunks from the model response
"""
# check if the model is deepseek based
if (llm.name and model_target not in llm.name) or (hasattr(llm, "model") and model_target not in llm.model):
async for chunk in llm.astream(messages, config=config, **kwargs):
yield chunk
return
# Yield with a buffer, upon completing the <think></think> tags, move them to the reasoning content and start over
buffer = ""
async for chunk in llm.astream(messages, config=config, **kwargs):
# start or append
if not buffer:
buffer = chunk.content
else:
buffer += chunk.content if hasattr(chunk, "content") else chunk
# Process buffer to remove <think> tags
if "<think>" in buffer or "</think>" in buffer:
if hasattr(chunk, "tool_calls") and chunk.tool_calls:
raise NotImplementedError("tool calls during reasoning should be removed?")
if "<think>" in chunk.content or "</think>" in chunk.content:
continue
chunk.additional_kwargs["reasoning_content"] = chunk.content
chunk.content = ""
# upon block completion, reset the buffer
if "<think>" in buffer and "</think>" in buffer:
buffer = ""
yield chunk
```
# Issue
Integrating reasoning models (e.g. deepseek-r1) into existing LangChain
based workflows is hard due to the thinking blocks that are included in
the message contents. To avoid this, we could match the `ChatOllama`
integration with `ChatDeepseek` to return the reasoning content inside
`message.additional_arguments.reasoning_content` instead.
# Dependenices
None
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- Test if models support forcing tool calls via `tool_choice`. If they
do, they should support
- `"any"` to specify any tool
- the tool name as a string to force calling a particular tool
- Add `tool_choice` to signature of `BaseChatModel.bind_tools` in core
- Deprecate `tool_choice_value` in standard tests in favor of a boolean
`has_tool_choice`
Will follow up with PRs in external repos (tested in AWS and Google
already).
- **Description:** This PR updates the [MLflow
integration](https://python.langchain.com/docs/integrations/providers/mlflow_tracking/)
docs. This PR is based on feedback and suggestions from @efriis on
#29612 . This proposed revision is much shorter, does not contain
images, and links out to the MLflow docs rather than providing lengthy
descriptions directly within these docs. Thank you for taking another
look!
- **Issue:** NA
- **Dependencies:** NA
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description**
This contribution adds a retriever for the Zotero API.
[Zotero](https://www.zotero.org/) is an open source reference management
for bibliographic data and related research materials. A retriever will
allow langchain applications to retrieve relevant documents from
personal or shared group libraries, which I believe will be helpful for
numerous applications, such as RAG systems, personal research
assistants, etc. Tests and docs were added.
The documentation provided assumes the retriever will be part of the
langchain-community package, as this seemed customary. Please let me
know if this is not the preferred way to do it. I also uploaded the
implementation to PyPI.
**Dependencies**
The retriever requires the `pyzotero` package for API access. This
dependency is stated in the docs, and the retriever will return an error
if the package is not found. However, this dependency is not added to
the langchain package itself.
**Twitter handle**
I'm no longer using Twitter, but I'd appreciate a shoutout on
[Bluesky](https://bsky.app/profile/koenigt.bsky.social) or
[LinkedIn](https://www.linkedin.com/in/dr-tim-k%C3%B6nig-534aa2324/)!
Let me know if there are any issues, I'll gladly try and sort them out!
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This pull request includes a change to the following
- docs/docs/integrations/tools/tavily_search.ipynb
- docs/docs/integrations/tools/tavily_extract.ipynb
- added docs/docs/integrations/providers/tavily.mdx
---------
Co-authored-by: pulvedu <dustin@tavily.com>
**Description:**
Implements an additional `browser_session` parameter on
PlaywrightURLLoader which can be used to initialize the browser context
by providing a stored playwright context.
**Description:**
This PR fixes a minor typo in the comments within
`libs/partners/openai/langchain_openai/chat_models/base.py`. The word
"ben" has been corrected to "be" for clarity and professionalism.
**Issue:**
N/A
**Dependencies:**
None
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Signed-off-by: pudongair <744355276@qq.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:**
Since `ChatLiteLLM` is forwarding most parameters to
`litellm.completion(...)`, there is no reason to set other default
values than the ones defined by `litellm`.
In the case of parameter 'n', it also provokes an issue when trying to
call a serverless endpoint on Azure, as it is considered an extra
parameter. So we need to keep it optional.
We can debate about backward compatibility of this change: in my
opinion, there should not be big issues since from my experience,
calling `litellm.completion()` without these parameters works fine.
**Issue:**
- #29679
**Dependencies:** None
- **Description:** Adding keep_newlines parameter to process_pages
method with page_ids on Confluence document loader
- **Issue:** N/A (This is an enhancement rather than a bug fix)
- **Dependencies:** N/A
- **Twitter handle:** N/A
- [x] **PR title**
- [x] **PR message**:
- **Description:** Updated the sparse and hybrid vector search due to
changes in the Qdrant API, and cleaned up the notebook
- [x] **Add tests and docs**:
- N/A
- [x] **Lint and test**
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Mark Perfect <mark.anthony.perfect1@gmail.com>
# Description
Adds documentation on LangChain website for a Dell specific document
loader for on-prem storage devices. Additional details on what the
document loader is described in the PR as well as on our github repo:
[https://github.com/dell/powerscale-rag-connector](https://github.com/dell/powerscale-rag-connector)
This PR also creates a category on the document loader webpage as no
existing category exists for on-prem. This follows the existing pattern
already established as the website has a category for cloud providers.
# Issue:
New release, no issue.
# Dependencies:
None
# Twitter handle:
DellTech
---------
Signed-off-by: Adam Brenner <adam@aeb.io>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** I was testing out `init_chat` and saw that chat
models can now be inferred. Azure OpenAI is currently only supported but
we would like to add support for Azure AI which is a different package.
This PR edits the `base.py` file to add the chat implementation.
- I don't think this adds any additional dependencies
- Will add a test and lint, but starting an initial draft PR.
cc @santiagxf
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
The default model for `ChatGroq`, `"mixtral-8x7b-32768"`, is being
retired on March 20, 2025. Here we remove the default, such that model
names must be explicitly specified (being explicit is a good practice
here, and avoids the need for breaking changes down the line). This
change will be released in a minor version bump to 0.3.
This follows https://github.com/langchain-ai/langchain/pull/30161
(released in version 0.2.5), where we began generating warnings to this
effect.

## Description:
- Removed deprecated `initialize_agent()` usage in AWS Lambda
integration.
- Replaced it with `AgentExecutor` for compatibility with LangChain
v0.3.
- Fixed documentation linting errors.
## Issue:
- No specific issue linked, but this resolves the use of deprecated
agent initialization.
## Dependencies:
- No new dependencies added.
## Request for Review:
- Please verify if the implementation is correct.
- If approved and merged, I will proceed with updating other related
files.
## Twitter Handle (Optional):
I don't have a Twitter but here is my LinkedIn instead
(https://www.linkedin.com/in/aryan1227/)
OpenAIWhisperParser, OpenAIWhisperParserLocal, YandexSTTParser do not
handle in-memory audio data (loaded via Blob.from_data) correctly. They
require Blob.path to be set and AudioSegment is always read from the
file system. In-memory data is handled correctly only for
FasterWhisperParser so far. I changed OpenAIWhisperParser,
OpenAIWhisperParserLocal, YandexSTTParser accordingly to match
FasterWhisperParser.
Thanks for reviewing the PR!
Co-authored-by: qonnop <qonnop@users.noreply.github.com>
**description:** the ChatModel[Integration]Tests classes are powerful
and helpful, this change allows sub-classes to add additional tests.
for instance,
```
class TestChatMyServiceIntegration(ChatModelIntegrationTests):
...
def test_myservice(self, model: BaseChatModel) -> None:
...
```
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
## Description
This pull request introduces a new text splitter,
`JSFrameworkTextSplitter`, to the Langchain library. The
`JSFrameworkTextSplitter` extends the `RecursiveCharacterTextSplitter`
to handle JavaScript framework code effectively, including React (JSX),
Vue, and Svelte. It identifies and utilizes framework-specific component
tags and syntax elements as splitting points, alongside standard
JavaScript syntax. This ensures that code is divided at natural
boundaries, enhancing the parsing and processing of JavaScript and
framework-specific code.
### Key Features
- Supports React (JSX), Vue, and Svelte frameworks.
- Identifies and uses framework-specific tags and syntax elements as
natural splitting points.
- Extends the existing `RecursiveCharacterTextSplitter` for seamless
integration.
## Issue
No specific issue addressed.
## Dependencies
No additional dependencies required.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
The former link led to a site that explains that the docs have moved,
but did not redirect the user to the actual site automatically. I just
copied the provided url, checked that it works and updated the link to
the current version.
**Description:** Updated the link to Unstructured Docs at
https://docs.unstructured.io
**Issue:** #30315
**Dependencies:** None
**Twitter handle:** @lahoramaker
**Description:**
Added an 'extract' mode to FireCrawlLoader that enables structured data
extraction from web pages. This feature allows users to Extract
structured data from a single URLs, or entire websites using Large
Language Models (LLMs).
You can show more params and usage on [firecrawl
docs](https://docs.firecrawl.dev/features/extract-beta).
You can extract from only one url now.(it depends on firecrawl's extract
method)
**Dependencies:**
No new dependencies required. Uses existing FireCrawl API capabilities.
---------
Co-authored-by: chbae <chbae@gcsc.co.kr>
Co-authored-by: ccurme <chester.curme@gmail.com>
FasterWhisperParser fails on a machine without an NVIDIA GPU: "Requested
float16 compute type, but the target device or backend do not support
efficient float16 computation." This problem arises because the
WhisperModel is called with compute_type="float16", which works only for
NVIDIA GPU.
According to the [CTranslate2
docs](https://opennmt.net/CTranslate2/quantization.html#bit-floating-points-float16)
float16 is supported only on NVIDIA GPUs. Removing the compute_type
parameter solves the problem for CPUs. According to the [CTranslate2
docs](https://opennmt.net/CTranslate2/quantization.html#quantize-on-model-loading)
setting compute_type to "default" (standard when omitting the parameter)
uses the original compute type of the model or performs implicit
conversion for the specific computation device (GPU or CPU). I suggest
to remove compute_type="float16".
@hulitaitai you are the original author of the FasterWhisperParser - is
there a reason for setting the parameter to float16?
Thanks for reviewing the PR!
Co-authored-by: qonnop <qonnop@users.noreply.github.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- **Description:** Do not load non-public dimensions and measures
(public: false) with Cube semantic loader
- **Issue:** Currently, non-public dimensions and measures are loaded by
the Cube document loader which leads to downstream applications using
these which is not allowed by Cube.
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
- Support features from recent update:
https://www.anthropic.com/news/token-saving-updates (mostly adding
support for built-in tools in `bind_tools`
- Add documentation around prompt caching, token-efficient tool use, and
built-in tools.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- **Description:** Fix bad log message on line#56 and replace f-string
logs with format specifiers
- **Issue:** Log messages such as this one
`INFO:langchain_community.document_loaders.cube_semantic:Loading
dimension values for: {dimension_name}...`
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
PR Title:
community: Fix Pass API_KEY as argument
PR Message:
Description:
This PR fixes validation error "Value error, Did not find
tavily_api_key, please add an environment variable `TAVILY_API_KEY`
which contains it, or pass `tavily_api_key` as a named parameter."
Dependencies:
No new dependencies introduced.
---------
Co-authored-by: pulvedu <dustin@tavily.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Here we add a job to the release workflow that, when releasing
`langchain-core`, tests prior published versions of select packages
against the new version of core. We limit the testing to the most recent
published versions of langchain-anthropic and langchain-openai.
This is designed to catch backward-incompatible updates to core. We
sometimes update core and downstream packages simultaneously, so there
may not be any commit in the history at which tests would fail. So
although core and latest downstream packages could be consistent, we can
benefit from testing prior versions of downstream packages against core.
I tested the workflow by simulating a [breaking
change](d7287248cf)
in core and running it with publishing steps disabled:
https://github.com/langchain-ai/langchain/actions/runs/13741876345. The
workflow correctly caught the issue.
## Description
The models in DashScope support multiple SystemMessage. Here is the
[Doc](https://bailian.console.aliyun.com/model_experience_center/text#/model-market/detail/qwen-long?tabKey=sdk),
and the example code on the document page:
```python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # 如果您没有配置环境变量,请在此处替换您的API-KEY
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1", # 填写DashScope服务base_url
)
# 初始化messages列表
completion = client.chat.completions.create(
model="qwen-long",
messages=[
{'role': 'system', 'content': 'You are a helpful assistant.'},
# 请将 'file-fe-xxx'替换为您实际对话场景所使用的 file-id。
{'role': 'system', 'content': 'fileid://file-fe-xxx'},
{'role': 'user', 'content': '这篇文章讲了什么?'}
],
stream=True,
stream_options={"include_usage": True}
)
full_content = ""
for chunk in completion:
if chunk.choices and chunk.choices[0].delta.content:
# 拼接输出内容
full_content += chunk.choices[0].delta.content
print(chunk.model_dump())
print({full_content})
```
Tip: The example code is for OpenAI, but the document said that it also
supports the DataScope API, and I tested it, and it works.
```
Is the Dashscope SDK invocation method compatible?
Yes, the Dashscope SDK remains compatible for model invocation. However, file uploads and file-ID retrieval are currently only supported via the OpenAI SDK. The file-ID obtained through this method is also compatible with Dashscope for model invocation.
```
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
```markdown
**Description:**
This PR integrates Valthera into LangChain, introducing an framework designed to send highly personalized nudges by an LLM agent. This is modeled after Dr. BJ Fogg's Behavior Model. This integration includes:
- Custom data connectors for HubSpot, PostHog, and Snowflake.
- A unified data aggregator that consolidates user data.
- Scoring configurations to compute motivation and ability scores.
- A reasoning engine that determines the appropriate user action.
- A trigger generator to create personalized messages for user engagement.
**Issue:**
N/A
**Dependencies:**
N/A
**Twitter handle:**
- `@vselvarajijay`
**Tests and Docs:**
- `docs/docs/integrations/tools/valthera`
- `https://github.com/valthera/langchain-valthera/tree/main/tests`
```
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
## Changes
- `/Makefile` - added extra step to `make format` and `make lint` to
ensure the lint dep-group is installed before running ruff (documented
in issue #30069)
- `/pyproject.toml` - removed ruff exceptions for files that no longer
exist or no longer create formatting/linting errors in ruff
## Testing
**running `make format` on this branch/PR**
<img width="435" alt="image"
src="https://github.com/user-attachments/assets/82751788-f44e-4591-98ed-95ce893ce623"
/>
## Issue
fixes#30069
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
**Description:** adds ContextualAI's `langchain-contextual` package's
documentation
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
The OpenAI API requires function names to match the pattern
'^[a-zA-Z0-9_-]+$'. This updates the JIRA toolkit's tool names to use
underscores instead of spaces to comply with this requirement and
prevent BadRequestError when using the tools with OpenAI functions.
Error fixed:
```
File "langgraph-bug-fix/.venv/lib/python3.13/site-packages/openai/_base_client.py", line 1023, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid 'tools[0].function.name': string does not match pattern. Expected a string that matches the pattern '^[a-zA-Z0-9_-]+$'.", 'type': 'invalid_request_error', 'param': 'tools[0].function.name', 'code': 'invalid_value'}}
During task with name 'agent' and id 'aedd7537-e8d5-6678-d0c5-98129586d3ac'
```
Issue:#30182
Thank you for contributing to LangChain!
- **Description:** update docs to suppress type checker complain on
args_schema type hint when inheriting from BaseTool
- **Issue:** #30142
- **Dependencies:** N/A
- **Twitter handle:** N/A
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [ ] **PR title**: "community: chinese doc extracting"
- [ ] **PR message**:
- **Description:** add jieba_link_extractor.py for chinese doc
extracting
- **Dependencies:** jieba
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
/doc/doc/integrations/providers/jieba.md
/doc/doc/integrations/vectorstores/jieba_link_extractor.ipynb
/libs/packages.yml
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Groq is retiring `mixtral-8x7b-32768`, which is currently the default
model for ChatGroq, on March 20. Here we emit a warning if the model is
not specified explicitly.
A version 0.3.0 will be released ahead of March 20 that removes the
default altogether.
docs: New integration for LangChain - ads4gpts-langchain
Description: Tools and Toolkit for Agentic integration natively within
LangChain with ADS4GPTs, in order to help applications monetize with
advertising.
Twitter handle: @ads4gpts
Co-authored-by: knitlydevaccount <loom+github@knitly.app>
- **Description:** Fix Apify Actors tool notebook main heading text so
there is an actual description instead of "Overview" in the tool
integration description on [LangChain tools integration
page](https://python.langchain.com/docs/integrations/tools/#all-tools).
- **Description: a notebook showing langchain and langraph agents using
the new langchain_tableau tool
- **Twitter handle: @joe_constantin0
---------
Co-authored-by: Joe Constantino <joe@constantino.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- Support thinking blocks in core's `convert_to_openai_messages` (pass
through instead of error)
- Ignore thinking blocks in ChatOpenAI (instead of error)
- Support Anthropic-style image blocks in ChatOpenAI
---
Standard integration tests include a `supports_anthropic_inputs`
property which is currently enabled only for tests on `ChatAnthropic`.
This test enforces compatibility with message histories of the form:
```
- system message
- human message
- AI message with tool calls specified only through `tool_use` content blocks
- human message containing `tool_result` and an additional `text` block
```
It additionally checks support for Anthropic-style image inputs if
`supports_image_inputs` is enabled.
Here we change this test, such that if you enable
`supports_anthropic_inputs`:
- You support AI messages with text and `tool_use` content blocks
- You support Anthropic-style image inputs (if `supports_image_inputs`
is enabled)
- You support thinking content blocks.
That is, we add a test case for thinking content blocks, but we also
remove the requirement of handling tool results within HumanMessages
(motivated by existing agent abstractions, which should all return
ToolMessage). We move that requirement to a ChatAnthropic-specific test.
**Description:**
This PR adds a call to `guard_import()` to fix an AttributeError raised
when creating LanceDB vectorstore instance with an existing LanceDB
table.
**Issue:**
This PR fixes issue #30124.
**Dependencies:**
No additional dependencies.
**Twitter handle:**
[@metadaddy](https://x.com/metadaddy), but I spend more time at
[@metadaddy.net](https://bsky.app/profile/metadaddy.net) these days.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
## Description
make DashScope models support Partial Mode for text continuation.
For text continuation in ChatTongYi, it supports text continuation with
a prefix by adding a "partial" argument in AIMessage. The document is
[Partial Mode
](https://help.aliyun.com/zh/model-studio/user-guide/partial-mode?spm=a2c4g.11186623.help-menu-2400256.d_1_0_0_8.211e5b77KMH5Pn&scm=20140722.H_2862210._.OR_help-T_cn~zh-V_1).
The API example is:
```py
import os
import dashscope
messages = [{
"role": "user",
"content": "请对“春天来了,大地”这句话进行续写,来表达春天的美好和作者的喜悦之情"
},
{
"role": "assistant",
"content": "春天来了,大地",
"partial": True
}]
response = dashscope.Generation.call(
api_key=os.getenv("DASHSCOPE_API_KEY"),
model='qwen-plus',
messages=messages,
result_format='message',
)
print(response.output.choices[0].message.content)
```
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description**: Added the request_id field to the check_response
function to improve request tracking and debugging, applicable for the
Tongyi model.
- **Issue**: None
- **Dependencies**: None
- **Twitter handle**: None
- **Add tests and docs**: None
- **Lint and test**: Ran `make format`, `make lint`, and `make test` to
ensure the code meets formatting and testing requirements.
### **Description**
Converts the boolean `jira_cloud` parameter in the Jira API Wrapper to a
string before initializing the Jira Client. Also adds tests for the
same.
### **Issue**
[Jira API Wrapper
Bug](8abb65e138/libs/community/langchain_community/utilities/jira.py (L47))
```python
jira_cloud_str = get_from_dict_or_env(values, "jira_cloud", "JIRA_CLOUD")
jira_cloud = jira_cloud_str.lower() == "true"
```
The above code has a bug where the value of `"jira_cloud"` is a boolean.
If it is passed, calling `.lower()` on a boolean raises an error.
Additionally, `False` cannot be passed explicitly since
`get_from_dict_or_env` falls back to environment variables.
Relevant code in `langchain_core`:
[Source](https://github.com/thesmallstar/langchain/blob/master/.venv/lib/python3.13/site-packages/langchain_core/utils/env.py#L46)
```python
if isinstance(key, str) and key in data and data[key]: # Here, data[key] is False
```
This PR fixes both issues.
### **Twitter Handle**
[Manthan Surkar](https://x.com/manthan_surkar)
This PR adds documentation for the langchain-taiga Tool integration,
including an example notebook at
'docs/docs/integrations/tools/taiga.ipynb' and updates to
'libs/packages.yml' to track the new package.
Issue:
N/A
Dependencies:
None
Twitter handle:
N/A
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
PR Title:
langchain: add attachments support in OpenAIAssistantRunnable
PR Description:
This PR fixes an issue with the "retrieval" tool (internally named
"file_search") in the OpenAI Assistant by adding support for the
"attachments" parameter in the invoke method. This change allows files
to be linked to messages when they are inserted into threads, which is
essential for utilizing OpenAI's Retrieval Augmented Generation (RAG)
feature.
Issue:
N/A
Dependencies:
None
Twitter handle:
N/A
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Document refinement: optimize milvus server description. The description
of "milvus standalone", and "milvus server" is confusing, so I clarify
it with a detailed description.
Signed-off-by: ChengZi <chen.zhang@zilliz.com>
- **Description:** Fix typo in code samples for max_tokens_for_prompt.
Code blocks had singular "token" but the method has plural "tokens".
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** N/A
**Description:**
5 fix of example from function with_alisteners() in
libs/core/langchain_core/runnables/base.py
Replace incoherent example output with workable example's output.
1. SyntaxError: unterminated string literal
print(f"on start callback starts at {format_t(time.time())}
correct as
print(f"on start callback starts at {format_t(time.time())}")
2. SyntaxError: unterminated string literal
print(f"on end callback starts at {format_t(time.time())}
correct as
print(f"on end callback starts at {format_t(time.time())}")
3. NameError: name 'Runnable' is not defined
Fix as
from langchain_core.runnables import Runnable
4. NameError: name 'asyncio' is not defined
Fix as
import asyncio
5. NameError: name 'format_t' is not defined.
Implement format_t() as
from datetime import datetime, timezone
def format_t(timestamp: float) -> str:
return datetime.fromtimestamp(timestamp, tz=timezone.utc).isoformat()
Thank you for contributing to LangChain!
- [x] **PR title**: "docs: added proper width to sidebar content"
- [x] **PR message**: added proper width to sidebar content
- **Description:** While accessing the [LangChain Python API
Reference](https://python.langchain.com/api_reference/index.html) the
sidebar content does not display correctly.
- **Issue:** Follow-up to #30061
- **Dependencies:** None
- **Twitter handle:** https://x.com/implicitdefcnc
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
add batch_size to fix oom when embed large amount texts
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Structured output will currently always raise a BadRequestError when
Claude 3.7 Sonnet's `thinking` is enabled, because we rely on forced
tool use for structured output and this feature is not supported when
`thinking` is enabled.
Here we:
- Emit a warning if `with_structured_output` is called when `thinking`
is enabled.
- Raise `OutputParserException` if no tool calls are generated.
This is arguably preferable to raising an error in all cases.
```python
from langchain_anthropic import ChatAnthropic
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
llm = ChatAnthropic(
model="claude-3-7-sonnet-latest",
max_tokens=5_000,
thinking={"type": "enabled", "budget_tokens": 2_000},
)
structured_llm = llm.with_structured_output(Person) # <-- this generates a warning
```
```python
structured_llm.invoke("Alice is 30.") # <-- works
```
```python
structured_llm.invoke("Hello!") # <-- raises OutputParserException
```
Took a "census" of models supported by init_chat_model-- of those that
return model names in response metadata, these were the only two that
had it keyed under `"model"` instead of `"model_name"`.
- [ ] **PR title**: [langchain_community.llms.xinference]: Add
asynchronous generate interface
- [ ] **PR message**: The asynchronous generate interface support stream
data and non-stream data.
chain = prompt | llm
async for chunk in chain.astream(input=user_input):
yield chunk
- [ ] **Add tests and docs**:
from langchain_community.llms import Xinference
from langchain.prompts import PromptTemplate
llm = Xinference(
server_url="http://0.0.0.0:9997", # replace your xinference server url
model_uid={model_uid} # replace model_uid with the model UID return from
launching the model
stream = True
)
prompt = PromptTemplate(input=['country'], template="Q: where can we
visit in the capital of {country}? A:")
chain = prompt | llm
async for chunk in chain.astream(input=user_input):
yield chunk
Thank you for contributing to LangChain!
- [ ] **PR title**: "docs: add xAI to ChatModelTabs"
- [ ] **PR message**:
- **Description:** Added `ChatXAI` to `ChatModelTabs` dropdown to
improve visibility of xAI chat models (e.g., "grok-2", "grok-3").
- **Issue:** Follow-up to #30010
- **Dependencies:** none
- **Twitter handle:** @tiestvangool
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- **Implementing the MMR algorithm for OLAP vector storage**:
- Support Apache Doris and StarRocks OLAP database.
- Example: "vectorstore.as_retriever(search_type="mmr",
search_kwargs={"k": 10})"
- **Implementing the MMR algorithm for OLAP vector storage**:
- **Apache Doris
- **StarRocks
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- **Add tests and docs**:
- Example: "vectorstore.as_retriever(search_type="mmr",
search_kwargs={"k": 10})"
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: fakzhao <fakzhao@cisco.com>
This pull request includes a change to the `TavilySearchResults` class
in the `tool.py` file, which updates the code block format in the
documentation.
Documentation update:
*
[`libs/community/langchain_community/tools/tavily_search/tool.py`](diffhunk://#diff-e3b6a980979268b639c6a86e9b182756b0f7c7e9e5605e613bc0a72ea6aa5301L54-R59):
Changed the code block format from Python to JSON in the example
provided in the docstring.Thank you for contributing to LangChain!
## **Description:**
When using the Tavily retriever with include_raw_content=True, the
retriever occasionally fails with a Pydantic ValidationError because
raw_content can be None.
The Document model in langchain_core/documents/base.py requires
page_content to be a non-None value, but the Tavily API sometimes
returns None for raw_content.
This PR fixes the issue by ensuring that even when raw_content is None,
an empty string is used instead:
```python
page_content=result.get("content", "")
if not self.include_raw_content
else (result.get("raw_content") or ""),
This pull request includes updates to the
`libs/community/langchain_community/callbacks/bedrock_anthropic_callback.py`
file to add a new model version to the list of supported models.
Updates to supported models:
* Added support for the `anthropic.claude-3-7-sonnet-20250219-v1:0`
model with a rate of `0.003` for 1000 input tokens.
* Added support for the `anthropic.claude-3-7-sonnet-20250219-v1:0`
model with a rate of `0.015` for 1000 output tokens.
AWS Bedrock pricing reference : https://aws.amazon.com/bedrock/pricing
- [x] **PR title**:
- [x] **PR message**:
- Added a new section for how to set up and use Milvus with Docker, and
added an example of how to instantiate Milvus for hybrid retrieval
- Fixed the documentation setup to run `make lint` and `make format`
- [x] **Add tests and docs**: If you're adding a new integration, please
include
N/A
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Mark Perfect <mark.anthony.perfect1@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
## PyMuPDF4LLM integration to LangChain for PDF content extraction in
Markdown format
### Description
[PyMuPDF4LLM](https://github.com/pymupdf/RAG) makes it easier to extract
PDF content in Markdown format, needed for LLM & RAG applications.
(License: GNU Affero General Public License v3.0)
[langchain-pymupdf4llm](https://github.com/lakinduboteju/langchain-pymupdf4llm)
integrates PyMuPDF4LLM to LangChain as a Document Loader.
(License: MIT License)
This pull request introduces the integration of
[PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm) into
the LangChain project as an integration package:
[`langchain-pymupdf4llm`](https://github.com/lakinduboteju/langchain-pymupdf4llm).
The most important changes include adding new Jupyter notebooks to
document the integration and updating the package configuration file to
include the new package.
### Documentation:
* `docs/docs/integrations/providers/pymupdf4llm.ipynb`: Added a new
Jupyter notebook to document the integration of `PyMuPDF4LLM` with
LangChain, including installation instructions and class imports.
* `docs/docs/integrations/document_loaders/pymupdf4llm.ipynb`: Added a
new Jupyter notebook to document the usage of `langchain-pymupdf4llm` as
a LangChain integration package in detail.
### Package registration:
* `libs/packages.yml`: Updated the package configuration file to include
the `langchain-pymupdf4llm` package.
### Additional information
* Related to: https://github.com/langchain-ai/langchain/pull/29848
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** Same changes as #26593 but for FileCallbackHandler
- **Issue:** Fixes#29941
- **Dependencies:** None
- **Twitter handle:** None
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
**Issue**: This trigger can only be used by the first table created.
Cannot create additional triggers for other tables.
**fixed**: Update the trigger name so that it can be used for new
tables.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:**
Tavily search results returned from API include useful information like
title, score and (optionally) raw_content that is missed in wrapper
although it's documented there properly. Add this data to the result
structure.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Instead of Github it was mentioned that Gitlab which causing confusion
while refering the documentation
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Resolves https://github.com/langchain-ai/langchain/issues/29951
Was able to reproduce the issue with Anthropic installing from pydantic
`main` and correct it with the fix recommended in the issue.
Thanks very much @Viicos for finding the bug and the detailed writeup!
Resolves https://github.com/langchain-ai/langchain/issues/29003,
https://github.com/langchain-ai/langchain/issues/27264
Related: https://github.com/langchain-ai/langchain-redis/issues/52
```python
from langchain.chat_models import init_chat_model
from langchain.globals import set_llm_cache
from langchain_community.cache import SQLiteCache
from pydantic import BaseModel
cache = SQLiteCache()
set_llm_cache(cache)
class Temperature(BaseModel):
value: int
city: str
llm = init_chat_model("openai:gpt-4o-mini")
structured_llm = llm.with_structured_output(Temperature)
```
```python
# 681 ms
response = structured_llm.invoke("What is the average temperature of Rome in May?")
```
```python
# 6.98 ms
response = structured_llm.invoke("What is the average temperature of Rome in May?")
```
Some o-series models will raise a 400 error for `"role": "system"`
(`o1-mini` and `o1-preview` will raise, `o1` and `o3-mini` will not).
Here we update `ChatOpenAI` to update the role to `"developer"` for all
model names matching `^o\d`.
We only make this change on the ChatOpenAI class (not BaseChatOpenAI).
For Context please check #29626
The Deepseek is using langchain_openai. The error happens that it show
`json decode error`.
I added a handler for this to give a more sensible error message which
is DeepSeek API returned empty/invalid json.
Reproducing the issue is a bit challenging as it is inconsistent,
sometimes DeepSeek returns valid data and in other times it returns
invalid data which triggers the JSON Decode Error.
This PR is an exception handling, but not an ultimate fix for the issue.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:** As commented on the commit
[41b6a86](41b6a86bbe)
it introduced a bug for when we do an embedding request and the model
returns a non-nested list. Typically it's the case for model
**_nomic-embed-text_**.
- I added the unit test, and ran `make format`, `make lint` and `make
test` from the `community` package.
- No new dependency.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- [x] **PR title**: docs: (community) update ChatLiteLLM
- [x] **PR message**:
- **Description:** updated description of model_kwargs parameter which
was wrongly describing for temperature.
- **Issue:** #29862
- **Dependencies:** N/A
- [x] **Add tests and docs**: N/A
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
See https://docs.astral.sh/ruff/rules/#flake8-annotations-ann
The interest compared to only mypy is that ruff is very fast at
detecting missing annotations.
ANN101 and ANN102 are deprecated so we ignore them
ANN401 (no Any type) ignored to be in sync with mypy config
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
## Which area of LangChain is being modified?
- This PR adds a new "Permit" integration to the `docs/integrations/`
folder.
- Introduces two new Tools (`LangchainJWTValidationTool` and
`LangchainPermissionsCheckTool`)
- Introduces two new Retrievers (`PermitSelfQueryRetriever` and
`PermitEnsembleRetriever`)
- Adds demo scripts in `examples/` showcasing usage.
## Description of Changes
- Created `langchain_permit/tools.py` for JWT validation and permission
checks with Permit.
- Created `langchain_permit/retrievers.py` for custom Permit-based
retrievers.
- Added documentation in `docs/integrations/providers/permit.ipynb` (or
`.mdx`) to explain setup, usage, and examples.
- Provided sample scripts in `examples/demo_scripts/` to illustrate
usage of these tools and retrievers.
- Ensured all code is linted and tested locally.
Thank you again for reviewing!
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:**
Since mlx_lm 0.20, all calls to mlx crash due to deprecation of the way
parameters are passed to methods generate and generate_step.
Parameters top_p, temp, repetition_penalty and repetition_context_size
are not passed directly to those method anymore but wrapped into
"sampler" and "logit_processor".
- **Dependencies:** mlx_lm (optional)
- **Tests:**
I've had a new test to existing test file:
tests/integration_tests/llms/test_mlx_pipeline.py
---------
Co-authored-by: Jean-Philippe Dournel <jp@insightkeeper.io>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
# community: Fix AttributeError in RankLLMRerank (`list` object has no
attribute `candidates`)
## **Description**
This PR fixes an issue in `RankLLMRerank` where reranking fails with the
following error:
```
AttributeError: 'list' object has no attribute 'candidates'
```
The issue arises because `rerank_batch()` returns a `List[Result]`
instead of an object containing `.candidates`.
### **Changes Introduced**
- Adjusted `compress_documents()` to support both:
- Old API format: `rerank_results.candidates`
- New API format: `rerank_results` as a list
- Also fix wrong .txt location parsing while I was at it.
---
## **Issue**
Fixes **AttributeError** in `RankLLMRerank` when using
`compression_retriever.invoke()`. The issue is observed when
`rerank_batch()` returns a list instead of an object with `.candidates`.
**Relevant log:**
```
AttributeError: 'list' object has no attribute 'candidates'
```
## **Dependencies**
- No additional dependencies introduced.
---
## **Checklist**
- [x] **Backward compatible** with previous API versions
- [x] **Tested** locally with different RankLLM models
- [x] **No new dependencies introduced**
- [x] **Linted** with `make format && make lint`
- [x] **Ready for review**
---
## **Testing**
- Ran `compression_retriever.invoke(query)`
## **Reviewers**
If no review within a few days, please **@mention** one of:
- @baskaryan
- @efriis
- @eyurtsev
- @ccurme
- @vbarda
- @hwchase17
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This PR adds a new cognee integration, knowledge graph based retrieval
enabling developers to ingest documents into cognee’s knowledge graph,
process them, and then retrieve context via CogneeRetriever.
It includes:
- langchain_cognee package with a CogneeRetriever class
- a test for the integration, demonstrating how to create, process, and
retrieve with cognee
- an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
Followed additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
Thank you for the review!
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:** Two small changes have been proposed here:
(1)
Previous code assumes that every issue has a priority field. If an issue
lacks this field, the code will raise a KeyError.
Now, the code checks if priority exists before accessing it. If priority
is missing, it assigns None instead of crashing. This prevents runtime
errors when processing issues without a priority.
(2)
Also If the "style" field is missing, the code throws a KeyError.
`.get("style", None)` safely retrieves the value if present.
**Issue:** #29875
**Dependencies:** N/A
Thank you for contributing to LangChain!
- [ ] **Handled query records properly**: "community:
vectorstores/kinetica"
- [ ] **Bugfix for empty query results handling**:
- **Description:** checked for the number of records returned by a query
before processing further
- **Issue:** resulted in an `AttributeError` earlier which has now been
fixed
@efriis
This PR adds documentation for the Azure AI package in Langchain to the
main mono-repo
No issue connected or updated dependencies.
Utilises existing tests and makes updates to the docs
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:** Update docstring for `reasoning_effort` argument to
specify that it applies to reasoning models only (e.g., OpenAI o1 and
o3-mini), clarifying its supported models.
**Issue:** None
**Dependencies:** None
Adds a `attachment_filter_func` parameter to the ConfluenceLoader class
which can be used to determine which files are indexed. This is useful
if you are interested in excluding files based on their media type or
other metadata.
The build in #29867 is currently broken because `langchain-cli` didn't
add download stats to the provider file.
This change gracefully handles sorting packages with missing download
counts. I initially updated the build to fetch download counts on every
run, but pypistats [requests](https://pypistats.org/api/) that users not
fetch stats like this via CI.
https://docs.x.ai/docs/guides/structured-outputs
Interface appears identical to OpenAI's.
```python
from langchain.chat_models import init_chat_model
from pydantic import BaseModel
class Joke(BaseModel):
setup: str
punchline: str
llm = init_chat_model("xai:grok-2").with_structured_output(
Joke, method="json_schema"
)
llm.invoke("Tell me a joke about cats.")
```
# Description
2 changes:
1. removes get pass from the code example as it reads from stdio causing
a freeze to occur
2. updates to the latest gemini model in the example
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** add deprecation warning when using weaviate from
langchain_community
- **Issue:** NA
- **Dependencies:** NA
- **Twitter handle:** NA
---------
Signed-off-by: hsm207 <hsm207@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Add `model` properties for OpenAIWhisperParser. Defaulted to `whisper-1`
(previous value).
Please help me update the docs and other related components of this
repo.
**Description:**
This PR adds a Jupyter notebook that explains the features,
installation, and usage of the
[`langchain-salesforce`](https://github.com/colesmcintosh/langchain-salesforce)
package. The notebook includes:
- Setup instructions for configuring Salesforce credentials
- Example code demonstrating common operations such as querying,
describing objects, creating, updating, and deleting records
**Issue:**
N/A
**Dependencies:**
No new dependencies are required.
**Tests and Docs:**
- Added an example notebook demonstrating the usage of the
`langchain-salesforce` package, located in `docs/docs/integrations`.
**Lint and Test:**
- Ran `make format`, `make lint`, and `make test` successfully.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [X] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [x] **PR message**:
This PR adds top_k as a param to the Needle Retriever. By default we use
top 10.
- [X] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
Rename IBM product name to `IBM watsonx`
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
- ** Description**: I have added a new operator in the operator map with
key `$in` and value `IN`, so that you can define filters using lists as
values. This was already contemplated but as IN operator was not in the
map they cannot be used.
- **Issue**: Fixes#29804.
- **Dependencies**: No extra.
This PR adds documentation for the `langchain-discord-shikenso`
integration, including an example notebook at
`docs/docs/integrations/tools/discord.ipynb` and updates to
`libs/packages.yml` to track the new package.
**Issue:**
N/A
**Dependencies:**
None
**Twitter handle:**
N/A
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**fix: Correct getpass usage in Google Generative AI Embedding docs
(#29809)**
- **Description:** Corrected the `getpass` usage in the Google
Generative AI Embedding documentation by replacing `getpass()` with
`getpass.getpass()` to fix the `TypeError`.
- **Issue:** #29809
- **Dependencies:** None
**Additional Notes:**
The change ensures compatibility with Google Colab and follows Python's
`getpass` module usage standards.
docs(rag.ipynb) : Add the `full code` snippet, it’s necessary and useful
for beginners to demonstrate.
Preview the change :
https://langchain-git-fork-googtech-patch-3-langchain.vercel.app/docs/tutorials/rag/
Two `full code` snippets are added as below :
<details>
<summary>Full Code:</summary>
```python
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chat_models import init_chat_model
from langchain_openai import OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
from google.colab import userdata
from langchain_core.prompts import PromptTemplate
from langchain_core.documents import Document
from typing_extensions import List, TypedDict
from langgraph.graph import START, StateGraph
#################################################
# 1.Initialize the ChatModel and EmbeddingModel #
#################################################
llm = init_chat_model(
model="gpt-4o-mini",
model_provider="openai",
openai_api_key=userdata.get('OPENAI_API_KEY'),
base_url=userdata.get('BASE_URL'),
)
embeddings = OpenAIEmbeddings(
model="text-embedding-3-large",
openai_api_key=userdata.get('OPENAI_API_KEY'),
base_url=userdata.get('BASE_URL'),
)
#######################
# 2.Loading documents #
#######################
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs=dict(
# Only keep post title, headers, and content from the full HTML.
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()
#########################
# 3.Splitting documents #
#########################
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # chunk size (characters)
chunk_overlap=200, # chunk overlap (characters)
add_start_index=True, # track index in original document
)
all_splits = text_splitter.split_documents(docs)
###########################################################
# 4.Embedding documents and storing them in a vectorstore #
###########################################################
vector_store = InMemoryVectorStore(embeddings)
_ = vector_store.add_documents(documents=all_splits)
##########################################################
# 5.Customizing the prompt or loading it from Prompt Hub #
##########################################################
# prompt = hub.pull("rlm/rag-prompt") # load the prompt from the prompt-hub
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
prompt = PromptTemplate.from_template(template)
##################################################################################################
# 5.Using LangGraph to tie together the retrieval and generation steps into a single application # #
##################################################################################################
# 5.1.Define the state of application, which controls the application datas
class State(TypedDict):
question: str
context: List[Document]
answer: str
# 5.2.1.Define the node of application, which signifies the application steps
def retrieve(state: State):
retrieved_docs = vector_store.similarity_search(state["question"])
return {"context": retrieved_docs}
# 5.2.2.Define the node of application, which signifies the application steps
def generate(state: State):
docs_content = "\n\n".join(doc.page_content for doc in state["context"])
messages = prompt.invoke({"question": state["question"], "context": docs_content})
response = llm.invoke(messages)
return {"answer": response.content}
# 6.Define the "control flow" of application, which signifies the ordering of the application steps
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()
```
</details>
<details>
<summary>Full Code:</summary>
```python
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chat_models import init_chat_model
from langchain_openai import OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
from google.colab import userdata
from langchain_core.prompts import PromptTemplate
from langchain_core.documents import Document
from typing_extensions import List, TypedDict
from langgraph.graph import START, StateGraph
from typing import Literal
from typing_extensions import Annotated
#################################################
# 1.Initialize the ChatModel and EmbeddingModel #
#################################################
llm = init_chat_model(
model="gpt-4o-mini",
model_provider="openai",
openai_api_key=userdata.get('OPENAI_API_KEY'),
base_url=userdata.get('BASE_URL'),
)
embeddings = OpenAIEmbeddings(
model="text-embedding-3-large",
openai_api_key=userdata.get('OPENAI_API_KEY'),
base_url=userdata.get('BASE_URL'),
)
#######################
# 2.Loading documents #
#######################
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs=dict(
# Only keep post title, headers, and content from the full HTML.
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()
#########################
# 3.Splitting documents #
#########################
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # chunk size (characters)
chunk_overlap=200, # chunk overlap (characters)
add_start_index=True, # track index in original document
)
all_splits = text_splitter.split_documents(docs)
# Search analysis: Add some metadata to the documents in our vector store,
# so that we can filter on section later.
total_documents = len(all_splits)
third = total_documents // 3
for i, document in enumerate(all_splits):
if i < third:
document.metadata["section"] = "beginning"
elif i < 2 * third:
document.metadata["section"] = "middle"
else:
document.metadata["section"] = "end"
# Search analysis: Define the schema for our search query
class Search(TypedDict):
query: Annotated[str, ..., "Search query to run."]
section: Annotated[
Literal["beginning", "middle", "end"], ..., "Section to query."]
###########################################################
# 4.Embedding documents and storing them in a vectorstore #
###########################################################
vector_store = InMemoryVectorStore(embeddings)
_ = vector_store.add_documents(documents=all_splits)
##########################################################
# 5.Customizing the prompt or loading it from Prompt Hub #
##########################################################
# prompt = hub.pull("rlm/rag-prompt") # load the prompt from the prompt-hub
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
prompt = PromptTemplate.from_template(template)
###################################################################
# 5.Using LangGraph to tie together the analyze_query, retrieval #
# and generation steps into a single application #
###################################################################
# 5.1.Define the state of application, which controls the application datas
class State(TypedDict):
question: str
query: Search
context: List[Document]
answer: str
# Search analysis: Define the node of application,
# which be used to generate a query from the user's raw input
def analyze_query(state: State):
structured_llm = llm.with_structured_output(Search)
query = structured_llm.invoke(state["question"])
return {"query": query}
# 5.2.1.Define the node of application, which signifies the application steps
def retrieve(state: State):
query = state["query"]
retrieved_docs = vector_store.similarity_search(
query["query"],
filter=lambda doc: doc.metadata.get("section") == query["section"],
)
return {"context": retrieved_docs}
# 5.2.2.Define the node of application, which signifies the application steps
def generate(state: State):
docs_content = "\n\n".join(doc.page_content for doc in state["context"])
messages = prompt.invoke({"question": state["question"], "context": docs_content})
response = llm.invoke(messages)
return {"answer": response.content}
# 6.Define the "control flow" of application, which signifies the ordering of the application steps
graph_builder = StateGraph(State).add_sequence([analyze_query, retrieve, generate])
graph_builder.add_edge(START, "analyze_query")
graph = graph_builder.compile()
```
</details>
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- [ ] **PR title**: langchain_community: add image support to
DuckDuckGoSearchAPIWrapper
- **Description:** This PR enhances the DuckDuckGoSearchAPIWrapper
within the langchain_community package by introducing support for image
searches. The enhancement includes:
- Adding a new method _ddgs_images to handle image search queries.
- Updating the run and results methods to process and return image
search results appropriately.
- Modifying the source parameter to accept "images" as a valid option,
alongside "text" and "news".
- **Dependencies:** No additional dependencies are required for this
change.
Thank you for contributing to LangChain!
Fix `model_id` in IBM provider on EmbeddingTabs page
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Thank you for contributing to LangChain!
Added IBM to ChatModelTabs and EmbeddingTabs
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
- **Description:** Add the new introduction about checking `store` in
in_memory.py, It’s necessary and useful for beginners.
```python
Check Documents:
.. code-block:: python
for doc in vector_store.store.values():
print(doc)
```
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
Update presented model in `WatsonxLLM` and `ChatWatsonx` documentation.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
**Description:** Fixed and updated Apify integration documentation to
use the new [langchain-apify](https://github.com/apify/langchain-apify)
package.
**Twitter handle:** @apify
- **Description:** Small fix in `add_texts` to make embedding
nullability is checked properly.
- **Issue:** #29765
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This fix ensures that the chunk size is correctly determined when
processing text embeddings. Previously, the code did not properly handle
cases where chunk_size was None, potentially leading to incorrect
chunking behavior.
Now, chunk_size_ is explicitly set to either the provided chunk_size or
the default self.chunk_size, ensuring consistent chunking. This update
improves reliability when processing large text inputs in batches and
prevents unintended behavior when chunk_size is not specified.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Add the documentation for the community package `langchain-abso`. It
provides a new Chat Model class, that uses https://abso.ai
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
**Description:** Updated init_chat_model to support Granite models
deployed on IBM WatsonX
**Dependencies:**
[langchain-ibm](https://github.com/langchain-ai/langchain-ibm)
Tagging @baskaryan @efriis for review when you get a chance.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
community: langchain_community/vectorstore/oraclevs.py
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Refactored code to allow a connection or a connection
pool.
- **Issue:** Normally an idel connection is terminated by the server
side listener at timeout. A user thus has to re-instantiate the vector
store. The timeout in case of connection is not configurable. The
solution is to use a connection pool where a user can specify a user
defined timeout and the connections are managed by the pool.
- **Dependencies:** None
- **Twitter handle:**
- [ ] **Add tests and docs**: This is not a new integration. A user can
pass either a connection or a connection pool. The determination of what
is passed is made at run time. Everything should work as before.
- [ ] **Lint and test**: Already done.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
1. Make `_convert_chunk_to_generation_chunk` an instance method on
BaseChatOpenAI
2. Override on ChatDeepSeek to add `"reasoning_content"` to message
additional_kwargs.
Resolves https://github.com/langchain-ai/langchain/issues/29513
Fix the syntax for SQL-based metadata filtering in the [Google BigQuery
Vector Search
docs](https://python.langchain.com/docs/integrations/vectorstores/google_bigquery_vector_search/#searching-documents-with-metadata-filters).
Also add a link to learn more about BigQuery operators that can be used
here.
I have been using this library, and have found that this is the correct
syntax to use for the SQL-based filters.
**Issue**: no open issue.
**Dependencies**: none.
**Twitter handle**: none.
No tests as this is only a change to the documentation.
<!-- Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. -->
**Description:**
According to the [wikidata
documentation](https://www.wikidata.org/wiki/Wikidata_talk:REST_API),
Wikibase REST API version 1 (stable) is released from November 11, 2024.
Their guide is to use the new v1 API and, it just requires replacing v0
in the routes with v1 in almost all cases.
So I replaced WIKIDATA_REST_API_URL from v0 to v1 for stable usage.
Co-authored-by: ccurme <chester.curme@gmail.com>
## **Description:**
- Added information about the retriever that Nimble's provider exposes.
- Fixed the authentication explanation on the retriever page.
**issue**
In Langchain, the original content is generally stored under the `text`
key. However, the `PineconeHybridSearchRetriever` searches the `context`
field in the metadata and cannot change this key. To address this, I
have modified the code to allow changing the key to something other than
context.
In my opinion, following Langchain's conventions, the `text` key seems
more appropriate than `context`. However, since I wasn't sure about the
author's intent, I have left the default value as `context`.
The .dict() method is deprecated inf Pydantic V2.0 and use `model_dump`
method instead.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- **Description:** Added some comments to the example code in the Vearch
vector database documentation and included commonly used sample code.
- **Issue:** None
- **Dependencies:** None
---------
Co-authored-by: wangchuxiong <wangchuxiong@jd.com>
## **Description**
This PR updates the LangChain documentation to address an issue where
the `HuggingFaceEndpoint` example **does not specify the required `task`
argument**. Without this argument, users on `huggingface_hub == 0.28.1`
encounter the following error:
```
ValueError: Task unknown has no recommended model. Please specify a model explicitly.
```
---
## **Issue**
Fixes#29685
---
## **Changes Made**
✅ **Updated `HuggingFaceEndpoint` documentation** to explicitly define
`task="text-generation"`:
```python
llm = HuggingFaceEndpoint(
repo_id=GEN_MODEL_ID,
huggingfacehub_api_token=HF_TOKEN,
task="text-generation" # Explicitly specify task
)
```
✅ **Added a deprecation warning note** and recommended using
`InferenceClient`:
```python
from huggingface_hub import InferenceClient
from langchain.llms.huggingface_hub import HuggingFaceHub
client = InferenceClient(model=GEN_MODEL_ID, token=HF_TOKEN)
llm = HuggingFaceHub(
repo_id=GEN_MODEL_ID,
huggingfacehub_api_token=HF_TOKEN,
client=client,
)
```
---
## **Dependencies**
- No new dependencies introduced.
- Change only affects **documentation**.
---
## **Testing**
- ✅ Verified that adding `task="text-generation"` resolves the issue.
- ✅ Tested the alternative approach with `InferenceClient` in Google
Colab.
---
## **Twitter Handle (Optional)**
If this PR gets announced, a shout-out to **@AkmalJasmin** would be
great! 🚀
---
## **Reviewers**
📌 **@langchain-maintainers** Please review this PR. Let me know if
further changes are needed.
🚀 This fix improves **developer onboarding** and ensures the **LangChain
documentation remains up to date**! 🚀
- Description: Adding getattr methods and set default value 500 to
cls.bulk_size, it can prevent the error below:
Error: type object 'OpenSearchVectorSearch' has no attribute 'bulk_size'
- Issue: https://github.com/langchain-ai/langchain/issues/29071
Description:
This PR fixes handling of null action_input in
[langchain.agents.output_parser]. Previously, passing null to
action_input could cause OutputParserException with unclear error
message which cause LLM don't know how to modify the action. The changes
include:
Added null-check validation before processing action_input
Implemented proper fallback behavior with default values
Maintained backward compatibility with existing implementations
Error Examples:
```
{
"action":"some action",
"action_input":null
}
```
Issue:
None
Dependencies:
None
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once. This specific part focuses on updating the
PyPDFium2 parser.
For more details, see
https://github.com/langchain-ai/langchain/pull/28970.
- **Description:** Add tests for respecting max_concurrency and
implement it for abatch_as_completed so that test passes
- **Issue:** #29425
- **Dependencies:** none
- **Twitter handle:** keenanpepper
Description:
The change allows you to use the overloaded `+` operator correctly when
`+`ing two BaseMessageChunk subclasses. Without this you *must*
instantiate a subclass for it to work.
Which feels... wrong. Base classes should be decoupled from sub classes
and should have in no way a dependency on them.
Issue:
You can't `+` a BaseMessageChunk with a BaseMessageChunk
e.g. this will explode
```py
from langchain_core.outputs import (
ChatGenerationChunk,
)
from langchain_core.messages import BaseMessageChunk
chunk1 = ChatGenerationChunk(
message=BaseMessageChunk(
type="customChunk",
content="HI",
),
)
chunk2 = ChatGenerationChunk(
message=BaseMessageChunk(
type="customChunk",
content="HI",
),
)
# this will throw
new_chunk = chunk1 + chunk2
```
In case anyone ran into this issue themselves, it's probably best to use
the AIMessageChunk:
a la
```py
from langchain_core.outputs import (
ChatGenerationChunk,
)
from langchain_core.messages import AIMessageChunk
chunk1 = ChatGenerationChunk(
message=AIMessageChunk(
content="HI",
),
)
chunk2 = ChatGenerationChunk(
message=AIMessageChunk(
content="HI",
),
)
# No explosion!
new_chunk = chunk1 + chunk2
```
Dependencies:
None!
Twitter handle:
`aaron_vogler`
Keeping these for later if need be:
```
baskaryan
efriis
eyurtsev
ccurme
vbarda
hwchase17
baskaryan
efriis
```
Co-authored-by: Erick Friis <erick@langchain.dev>
- This pull request includes various changes to add a `user_agent`
parameter to Azure OpenAI, Azure Search and Whisper in the Community and
Partner packages. This helps in identifying the source of API requests
so we can better track usage and help support the community better. I
will also be adding the user_agent to the new `langchain-azure` repo as
well.
- No issue connected or updated dependencies.
- Utilises existing tests and docs
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
ONNX and OpenVINO models are available by specifying the `backend`
argument (the model is loaded using `optimum`
https://github.com/huggingface/optimum)
```python
from langchain_huggingface import HuggingFaceEmbeddings
embedding = HuggingFaceEmbeddings(
model_name=model_id,
model_kwargs={"backend": "onnx"},
)
```
With this PR we also enable the IPEX backend
```python
from langchain_huggingface import HuggingFaceEmbeddings
embedding = HuggingFaceEmbeddings(
model_name=model_id,
model_kwargs={"backend": "ipex"},
)
```
**Description**
Currently, when parsing a partial JSON, if a string ends with the escape
character, the whole key/value is removed. For example:
```
>>> from langchain_core.utils.json import parse_partial_json
>>> my_str = '{"foo": "bar", "baz": "qux\\'
>>>
>>> parse_partial_json(my_str)
{'foo': 'bar'}
```
My expectation (and with this fix) would be for `parse_partial_json()`
to return:
```
>>> from langchain_core.utils.json import parse_partial_json
>>>
>>> my_str = '{"foo": "bar", "baz": "qux\\'
>>> parse_partial_json(my_str)
{'foo': 'bar', 'baz': 'qux'}
```
Notes:
1. It could be argued that current behavior is still desired.
2. I have experienced this issue when the streaming output from an LLM
and the chunk happens to end with `\\`
3. I haven't included tests. Will do if change is accepted.
4. This is specially troublesome when this function is used by
187131c55c/libs/core/langchain_core/output_parsers/transform.py (L111)
since what happens is that, for example, if the received sequence of
chunks are: `{"foo": "b` , `ar\\` :
Then, the result of calling `self.parse_result()` is:
```
{"foo": "b"}
```
and the second time:
```
{}
```
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Before sending a completion chunk at the end of an
OpenAI stream, removing the tool_calls as those have already been sent
as chunks.
- **Issue:** -
- **Dependencies:** -
- **Twitter handle:** -
@ccurme as mentioned in another PR
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** The llamacpp.ipynb notebook used a deprecated
environment variable, LLAMA_CUBLAS, for llama.cpp installation with GPU
support. This commit updates the notebook to use the correct GGML_CUDA
variable, fixing the installation error.
- **Issue:** none
- **Dependencies:** none
Added `similarity_search_with_score_by_vector()` function to the
`QdrantVectorStore` class.
It is required when we want to query multiple time with the same
embeddings. It was present in the now deprecated original `Qdrant`
vectorstore implementation, but was absent from the new one. It is also
implemented in a number of others `VectorStore` implementations
I have added tests for this new function
Note that I also argued in this discussion that it should be part of the
general `VectorStore`
https://github.com/langchain-ai/langchain/discussions/29638
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
I made a change to how was implemented the support for GPU in
`FastEmbedEmbeddings` to be more consistent with the existing
implementation `langchain-qdrant` sparse embeddings implementation
It is directly enabling to provide the list of ONNX execution providers:
https://github.com/langchain-ai/langchain/blob/master/libs/partners/qdrant/langchain_qdrant/fastembed_sparse.py#L15
It is a bit less clear to a user that just wants to enable GPU, but
gives more capabilities to work with other execution providers that are
not the `CUDAExecutionProvider`, and is more future proof
Sorry for the disturbance @ccurme
> Nice to see you just moved to `uv`! It is so much nicer to run
format/lint/test! No need to manually rerun the `poetry install` with
all required extras now
These are set in Github workflows, but forgot to add them to most
makefiles for convenience when developing locally.
`uv run` will automatically sync the lock file. Because many of our
development dependencies are local installs, it will pick up version
changes and update the lock file. Passing `--frozen` or setting this
environment variable disables the behavior.
Motivation: dedicated structured output features are becoming more
common, such that integrations can support structured output without
supporting tool calling.
Here we make two changes:
1. Update the `has_structured_output` method to default to True if a
model supports tool calling (in addition to defaulting to True if
`with_structured_output` is overridden).
2. Update structured output tests to engage if `has_structured_output`
is True.
Deep Lake recently released version 4, which introduces significant
architectural changes, including a new on-disk storage format, enhanced
indexing mechanisms, and improved concurrency. However, LangChain's
vector store integration currently does not support Deep Lake v4 due to
breaking API changes.
Previously, the installation command was:
`pip install deeplake[enterprise]`
This installs the latest available version, which now defaults to Deep
Lake v4. Since LangChain's vector store integration is still dependent
on v3, this can lead to compatibility issues when using Deep Lake as a
vector database within LangChain.
To ensure compatibility, the installation command has been updated to:
`pip install deeplake[enterprise]<4.0.0`
This constraint ensures that pip installs the latest available version
of Deep Lake within the v3 series while avoiding the incompatible v4
update.
- **Description:** add a `gpu: bool = False` field to the
`FastEmbedEmbeddings` class which enables to use GPU (through ONNX CUDA
provider) when generating embeddings with any fastembed model. It just
requires the user to install a different dependency and we use a
different provider when instantiating `fastembed.TextEmbedding`
- **Issue:** when generating embeddings for a really large amount of
documents this drastically increase performance (honestly that is a must
have in some situations, you can't just use CPU it is way too slow)
- **Dependencies:** no direct change to dependencies, but internally the
users will need to install `fastembed-gpu` instead of `fastembed`, I
made all the changes to the init function to properly let the user know
which dependency they should install depending on if they enabled `gpu`
or not
cf. fastembed docs about GPU for more details:
https://qdrant.github.io/fastembed/examples/FastEmbed_GPU/
I did not added test because it would require access to a GPU in the
testing environment
### PR Title:
**community: add latest OpenAI models pricing**
### Description:
This PR updates the OpenAI model cost calculation mapping by adding the
latest OpenAI models, **o1 (non-preview)** and **o3-mini**, based on the
pricing listed on the [OpenAI pricing
page](https://platform.openai.com/docs/pricing).
### Changes:
- Added pricing for `o1`, `o1-2024-12-17`, `o1-cached`, and
`o1-2024-12-17-cached` for input tokens.
- Added pricing for `o1-completion` and `o1-2024-12-17-completion` for
output tokens.
- Added pricing for `o3-mini`, `o3-mini-2025-01-31`, `o3-mini-cached`,
and `o3-mini-2025-01-31-cached` for input tokens.
- Added pricing for `o3-mini-completion` and
`o3-mini-2025-01-31-completion` for output tokens.
### Issue:
N/A
### Dependencies:
None
### Testing & Validation:
- No functional changes outside of updating the cost mapping.
- No tests were added or modified.
**Description:**
The response from `tool.invoke()` is always a ToolMessage, with content
and artifact fields, not a tuple.
The tuple is converted to a ToolMessage here
b6ae7ca91d/libs/core/langchain_core/tools/base.py (L726)
**Issue:**
Currently `ToolsIntegrationTests` requires `invoke()` to return a tuple
and so standard tests fail for "content_and_artifact" tools. This fixes
that to check the returned ToolMessage.
This PR also adds a test that now passes.
Description: Fixes PreFilter value handling in Azure Cosmos DB NoSQL
vectorstore. The current implementation fails to handle numeric values
in filter conditions, causing an undefined value variable error. This PR
adds support for numeric, boolean, and NULL values while maintaining the
existing string and list handling.
Changes:
Added handling for numeric types (int/float)
Added boolean value support
Added NULL value handling
Added type validation for unsupported values
Fixed scope of value variable initialization
Issue:
Fixes#29610
Implementation Notes:
No changes to public API
Backwards compatible
Maintains consistent behavior with existing MongoDB-style filtering
Preserves SQL injection prevention through proper value handling
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once. This specific part focuses on updating the XXX
parser.
For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
## Description
- Removed broken link for the API Reference
- Added `OPENAI_API_KEY` setter for the chains to properly run
- renamed one of our examples so it won't override the original
retriever and cause confusion due to it using a different mode of
retrieving
- Moved one of our simple examples to be the first example of our
retriever :)
Failing with:
> ValueError: Provider page not found for databricks-langchain. Please
add one at docs/integrations/providers/databricks-langchain.{mdx,ipynb}
**PR title**: "community: Option to pass auth_file_location for
oci_generative_ai"
**Description:** Option to pass auth_file_location, to overwrite config
file default location "~/.oci/config" where profile name configs
present. This is not fixing any issues. Just added optional parameter
called "auth_file_location", which internally supported by any OCI
client including GenerativeAiInferenceClient.
- **Description:** Add to check pad_token_id and eos_token_id of model
config. It seems that this is the same bug as the HuggingFace TGI bug.
It's same bug as #29434
- **Issue:** #29431
- **Dependencies:** none
- **Twitter handle:** tell14
Example code is followings:
```python
from langchain_huggingface.llms import HuggingFacePipeline
hf = HuggingFacePipeline.from_model_id(
model_id="meta-llama/Llama-3.2-3B-Instruct",
task="text-generation",
pipeline_kwargs={"max_new_tokens": 10},
)
from langchain_core.prompts import PromptTemplate
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)
chain = prompt | hf
question = "What is electroencephalography?"
print(chain.invoke({"question": question}))
```
## Description:
This PR addresses issue #29429 by fixing the _wrap_query method in
langchain_community/graphs/age_graph.py. The method now correctly
handles Cypher queries with UNION and EXCEPT operators, ensuring that
the fields in the SQL query are ordered as they appear in the Cypher
query. Additionally, the method now properly handles cases where RETURN
* is not supported.
### Issue: #29429
### Dependencies: None
### Add tests and docs:
Added unit tests in tests/unit_tests/graphs/test_age_graph.py to
validate the changes.
No new integrations were added, so no example notebook is necessary.
Lint and test:
Ran make format, make lint, and make test to ensure code quality and
functionality.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- [x] **PR title**:
- [x] **PR message**:
- A change in the Milvus API has caused an issue with the local vector
store initialization. Having used an Ollama embedding model, the vector
store initialization results in the following error:
<img width="978" alt="image"
src="https://github.com/user-attachments/assets/d57e495c-1764-4fbe-ab8c-21ee44f1e686"
/>
- This is fixed by setting the index type explicitly:
`vector_store = Milvus(embedding_function=embeddings,
connection_args={"uri": URI}, index_params={"index_type": "FLAT",
"metric_type": "L2"},)`
Other small documentation edits were also made.
- [x] **Add tests and docs**:
N/A
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This PR uses the [blockbuster](https://github.com/cbornet/blockbuster)
library in langchain-core to detect blocking calls made in the asyncio
event loop during unit tests.
Avoiding blocking calls is hard as these can be deeply buried in the
code or made in 3rd party libraries.
Blockbuster makes it easier to detect them by raising an exception when
a call is made to a known blocking function (eg: `time.sleep`).
Adding blockbuster allowed to find a blocking call in
`aconfig_with_context` (it ends up calling `get_function_nonlocals`
which loads function code).
**Dependencies:**
- blockbuster (test)
**Twitter handle:** cbornet_
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once.
This specific part focuses on updating the PyPDF parser.
For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).
*Description:**
Updates the YahooFinanceNewsTool to handle the current yfinance news
data structure. The tool was failing with a KeyError due to changes in
the yfinance API's response format. This PR updates the code to
correctly extract news URLs from the new structure.
**Issue:** #29495
**Dependencies:**
No new dependencies required. Works with existing yfinance package.
The changes maintain backwards compatibility while fixing the KeyError
that users were experiencing.
The modified code properly handles the new data structure where:
- News type is now at `content.contentType`
- News URL is now at `content.canonicalUrl.url`
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
@@ -5,26 +5,31 @@ This project includes a [dev container](https://containers.dev/), which lets you
You can use the dev container configuration in this folder to build and run the app without needing to install any of its tools locally! You can use it in [GitHub Codespaces](https://github.com/features/codespaces) or the [VS Code Dev Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers).
## GitHub Codespaces
[](https://codespaces.new/langchain-ai/langchain)
You may use the button above, or follow these steps to open this repo in a Codespace:
1. Click the **Code** drop-down menu at the top of https://github.com/langchain-ai/langchain.
1. Click the **Code** drop-down menu at the top of <https://github.com/langchain-ai/langchain>.
1. Click on the **Codespaces** tab.
1. Click **Create codespace on master**.
For more info, check out the [GitHub documentation](https://docs.github.com/en/free-pro-team@latest/github/developing-online-with-codespaces/creating-a-codespace#creating-a-codespace).
## VS Code Dev Containers
[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
Note: If you click the link above you will open the main repo (langchain-ai/langchain) and not your local cloned repo. This is fine if you only want to run and test the library, but if you want to contribute you can use the link below and replace with your username and cloned repo name:
> If you click the link above you will open the main repo (`langchain-ai/langchain`) and *not* your local cloned repo. This is fine if you only want to run and test the library, but if you want to contribute you can use the link below and replace with your username and cloned repo name:
Then you will have a local cloned repo where you can contribute and then create pull requests.
If you already have VS Code and Docker installed, you can use the button above to get started. This will cause VSCode to automatically install the Dev Containers extension if needed, clone the source code into a container volume, and spin up a dev container for use.
If you already have VS Code and Docker installed, you can use the button above to get started. This will use VSCode to automatically install the Dev Containers extension if needed, clone the source code into a container volume, and spin up a dev container for use.
Alternatively you can also follow these steps to open this repo in a container using the VS Code Dev Containers extension:
@@ -40,5 +45,5 @@ You can learn more in the [Dev Containers documentation](https://code.visualstud
## Tips and tricks
* If you are working with the same repository folder in a container and Windows, you'll want consistent line endings (otherwise you may see hundreds of changes in the SCM view). The `.gitattributes` file in the root of this repo will disable line ending conversion and should prevent this. See [tips and tricks](https://code.visualstudio.com/docs/devcontainers/tips-and-tricks#_resolving-git-line-ending-issues-in-containers-resulting-in-many-modified-files) for more info.
* If you'd like to review the contents of the image used in this dev container, you can check it out in the [devcontainers/images](https://github.com/devcontainers/images/tree/main/src/python) repo.
- If you are working with the same repository folder in a container and Windows, you'll want consistent line endings (otherwise you may see hundreds of changes in the SCM view). The `.gitattributes` file in the root of this repo will disable line ending conversion and should prevent this. See [tips and tricks](https://code.visualstudio.com/docs/devcontainers/tips-and-tricks#_resolving-git-line-ending-issues-in-containers-resulting-in-many-modified-files) for more info.
- If you'd like to review the contents of the image used in this dev container, you can check it out in the [devcontainers/images](https://github.com/devcontainers/images/tree/main/src/python) repo.
Hi there! Thank you for even being interested in contributing to LangChain.
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether they involve new features, improved infrastructure, better documentation, or bug fixes.
To learn how to contribute to LangChain, please follow the [contribution guide here](https://python.langchain.com/docs/contributing/).
To learn how to contribute to LangChain, please follow the [contribution guide here](https://docs.langchain.com/oss/python/contributing).
description:Please confirm and check all the following options.
options:
- label:I searched existing ideas and did not find a similar one
required:true
- label:I added a very descriptive title
required:true
- label:I've clearly described the feature request and motivation for it
required:true
- type:textarea
id:feature-request
validations:
required:true
attributes:
label:Feature request
description:|
A clear and concise description of the feature proposal. Please provide links to any relevant GitHub repos, papers, or other resources if relevant.
- type:textarea
id:motivation
validations:
required:true
attributes:
label:Motivation
description:|
Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.
- type:textarea
id:proposal
validations:
required:false
attributes:
label:Proposal (If applicable)
description:|
If you would like to propose a solution, please describe it here.
Please follow these instructions, fill every question, and do every step. 🙏
We're asking for this because answering questions and solving problems in GitHub takes a lot of time --
this is time that we cannot spend on adding new features, fixing bugs, writing documentation or reviewing pull requests.
By asking questions in a structured way (following this) it will be much easier for us to help you.
There's a high chance that by following this process, you'll find the solution on your own, eliminating the need to submit a question and wait for an answer. 😎
As there are many questions submitted every day, we will **DISCARD** and close the incomplete ones.
That will allow us (and others) to focus on helping people like you that follow the whole process. 🤓
Relevant links to check before opening a question to see if your question has already been answered, fixed or
if there's another way to solve your problem:
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
description:Please confirm and check all the following options.
options:
- label:I added a very descriptive title to this question.
required:true
- label:I searched the LangChain documentation with the integrated search.
required:true
- label:I used the GitHub search to find a similar question and didn't find it.
required:true
- type:checkboxes
id:help
attributes:
label:Commit to Help
description:|
After submitting this, I commit to one of:
* Read open questions until I find 2 where I can help someone and add a comment to help there.
* I already hit the "watch" button in this repository to receive notifications and I commit to help at least 2 people that ask questions in the future.
* Once my question is answered, I will mark the answer as "accepted".
options:
- label:I commit to help with one of those options 👆
required:true
- type:textarea
id:example
attributes:
label:Example Code
description:|
Please add a self-contained, [minimal, reproducible, example](https://stackoverflow.com/help/minimal-reproducible-example) with your use case.
If a maintainer can copy it, run it, and see it right away, there's a much higher chance that you'll be able to get help.
**Important!**
* Use code tags (e.g., ```python ... ```) to correctly [format your code](https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting).
* INCLUDE the language label (e.g. `python`) after the first three backticks to enable syntax highlighting. (e.g., ```python rather than ```).
* Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you.
* Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
placeholder:|
from langchain_core.runnables import RunnableLambda
def bad_code(inputs) -> int:
raise NotImplementedError('For demo purpose')
chain = RunnableLambda(bad_code)
chain.invoke('Hello!')
render:python
validations:
required:true
- type:textarea
id:description
attributes:
label:Description
description:|
What is the problem, question, or error?
Write a short description explaining what you are doing, what you expect to happen, and what is currently happening.
placeholder:|
* I'm trying to use the `langchain` library to do X.
* I expect to see Y.
* Instead, it does Z.
validations:
required:true
- type:textarea
id:system-info
attributes:
label:System Info
description:|
Please share your system info with us.
"pip freeze | grep langchain"
platform (windows / linux / mac)
python version
OR if you're on a recent version of langchain-core you can paste the output of:
python -m langchain_core.sys_info
placeholder:|
"pip freeze | grep langchain"
platform
python version
Alternatively, if you're on a recent version of langchain-core you can paste the output of:
python -m langchain_core.sys_info
These will only surface LangChain packages, don't forget to include any other relevant
packages you're using (if you're not sure what's relevant, you can paste the entire output of `pip freeze`).
description:Report a bug in LangChain. To report a security issue, please instead use the security option below. For questions, please use the GitHub Discussions.
labels:["02 Bug Report"]
description:Report a bug in LangChain. To report a security issue, please instead use the security option below. For questions, please use the LangChain forum.
labels:["bug"]
type:bug
body:
- type:markdown
attributes:
value:>
Thank you for taking the time to file a bug report.
Use this to report bugs in LangChain.
If you're not certain that your issue is due to a bug in LangChain, please use [GitHub Discussions](https://github.com/langchain-ai/langchain/discussions)
to ask for help with your issue.
value:|
Thank you for taking the time to file a bug report.
Use this to report BUGS in LangChain. For usage questions, feature requests and general design questions, please use the [LangChain Forum](https://forum.langchain.com/).
Relevant links to check before filing a bug report to see if your issue has already been reported, fixed or
if there's another way to solve your problem:
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
description:Please confirm and check all the following options.
options:
- label:I added a very descriptive title to this issue.
- label:This is a bug, not a usage question.
required:true
- label:I searched the LangChain documentation with the integrated search.
- label:I added a clear and descriptive title that summarizes this issue.
required:true
- label:I used the GitHub search to find a similar question and didn't find it.
required:true
@@ -37,6 +34,12 @@ body:
required:true
- label:The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
required:true
- label:This is not related to the langchain-community package.
required:true
- label:I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
required:true
- label:I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.
required:true
- type:textarea
id:reproduction
validations:
@@ -45,25 +48,25 @@ body:
label:Example Code
description:|
Please add a self-contained, [minimal, reproducible, example](https://stackoverflow.com/help/minimal-reproducible-example) with your use case.
If a maintainer can copy it, run it, and see it right away, there's a much higher chance that you'll be able to get help.
**Important!**
**Important!**
* Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
* Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you.
* Use code tags (e.g., ```python ... ```) to correctly [format your code](https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting).
* INCLUDE the language label (e.g. `python`) after the first three backticks to enable syntax highlighting. (e.g., ```python rather than ```).
* Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you.
* Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
placeholder:|
The following code:
The following code:
```python
from langchain_core.runnables import RunnableLambda
def bad_code(inputs) -> int:
raise NotImplementedError('For demo purpose')
chain = RunnableLambda(bad_code)
chain.invoke('Hello!')
```
@@ -99,16 +102,18 @@ body:
Please share your system info with us. Do NOT skip this step and please don't trim
the output. Most users don't include enough information here and it makes it harder
for us to help you.
Run the following command in your terminal and paste the output here:
python -m langchain_core.sys_info
`python -m langchain_core.sys_info`
or if you have an existing python interpreter running:
```python
from langchain_core import sys_info
sys_info.print_sys_info()
```
alternatively, put the entire output of `pip freeze` here.
description:Request a new feature or enhancement for LangChain. For questions, please use the LangChain forum.
labels:["feature request"]
type:feature
body:
- type:markdown
attributes:
value:|
Thank you for taking the time to request a new feature.
Use this to request NEW FEATURES or ENHANCEMENTS in LangChain. For bug reports, please use the bug report template. For usage questions and general design questions, please use the [LangChain Forum](https://forum.langchain.com/).
Relevant links to check before filing a feature request to see if your request has already been made or
If you are not a LangChain maintainer or were not asked directly by a maintainer to create an issue, then please start the conversation in a [Question in GitHub Discussions](https://github.com/langchain-ai/langchain/discussions/categories/q-a) instead.
You are a LangChain maintainer if you maintain any of the packages inside of the LangChain repository
or are a regular contributor to LangChain with previous merged pull requests.
If you are not a LangChain maintainer, employee, or were not asked directly by a maintainer to create an issue, then please start the conversation on the [LangChain Forum](https://forum.langchain.com/) instead.
description:Create a task for project management and tracking by LangChain maintainers. If you are not a maintainer, please use other templates or the forum.
labels:["task"]
type:task
body:
- type:markdown
attributes:
value:|
Thanks for creating a task to help organize LangChain development.
This template is for **maintainer tasks** such as project management, development planning, refactoring, documentation updates, and other organizational work.
If you are not a LangChain maintainer or were not asked directly by a maintainer to create a task, then please start the conversation on the [LangChain Forum](https://forum.langchain.com/) instead or use the appropriate bug report or feature request templates on the previous page.
- type:checkboxes
id:maintainer
attributes:
label:Maintainer task
description:Confirm that you are allowed to create a task here.
options:
- label:I am a LangChain maintainer, or was asked directly by a LangChain maintainer to create a task here.
required:true
- type:textarea
id:task-description
attributes:
label:Task Description
description:|
Provide a clear and detailed description of the task.
What needs to be done? Be specific about the scope and requirements.
placeholder:|
This task involves...
The goal is to...
Specific requirements:
- ...
- ...
validations:
required:true
- type:textarea
id:acceptance-criteria
attributes:
label:Acceptance Criteria
description:|
Define the criteria that must be met for this task to be considered complete.
What are the specific deliverables or outcomes expected?
placeholder:|
This task will be complete when:
- [ ] ...
- [ ] ...
- [ ] ...
validations:
required:true
- type:textarea
id:context
attributes:
label:Context and Background
description:|
Provide any relevant context, background information, or links to related issues/PRs.
Why is this task needed? What problem does it solve?
placeholder:|
Background:
- ...
Related issues/PRs:
- #...
Additional context:
- ...
validations:
required:false
- type:textarea
id:dependencies
attributes:
label:Dependencies
description:|
List any dependencies or blockers for this task.
Are there other tasks, issues, or external factors that need to be completed first?
- Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
Thank you for contributing to LangChain! Follow these steps to mark your pull request as ready for review. **If any of these steps are not completed, your PR will not be considered for review.**
- [ ]**PR title**: Follows the format: {TYPE}({SCOPE}): {DESCRIPTION}
- *Note:* the `{DESCRIPTION}` must not start with an uppercase letter.
- Once you've written the title, please delete this checklist item; do not include it in the PR.
- [ ]**PR message**: ***Delete this entire checklist*** and replace with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out!
- **Description:** a description of the change. Include a [closing keyword](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword) if applicable to a relevant issue.
- **Issue:** the issue # it fixes, if applicable (e.g. Fixes #123)
- **Dependencies:** any dependencies required for this change
- [ ]**Add tests and docs**: If you're adding a new integration, you must include:
1. A test for the integration, preferably unit tests that do not rely on network access,
2. An example notebook showing its use. It lives in `docs/docs/integrations` directory.
- [ ]**Add tests and docs**: If you're adding a new integration, please include
1. a test for the integration, preferably unit tests that do not rely on network access,
2. an example notebook showing its use. It lives in `docs/docs/integrations` directory.
- [ ]**Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/
- [ ]**Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. **We will not consider a PR unless these three are passing in CI.** See [contribution guidelines](https://python.langchain.com/docs/contributing/) for more.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in langchain.
If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- Most PRs should not touch more than one package.
- Please do not add dependencies to `pyproject.toml` files (even optional ones) unless they are **required** for unit tests.
- Changes should be backwards compatible.
- Make sure optional dependencies are imported within a function.
@@ -7,5 +7,5 @@ Please see the following guides for migrating LangChain code:
* Migrating from [LangChain 0.0.x Chains](https://python.langchain.com/docs/versions/migrating_chains/)
* Upgrade to [LangGraph Memory](https://python.langchain.com/docs/versions/migrating_memory/)
The [LangChain CLI](https://python.langchain.com/docs/versions/v0_3/#migrate-using-langchain-cli) can help you automatically upgrade your code to use non-deprecated imports.
The [LangChain CLI](https://python.langchain.com/docs/versions/v0_3/#migrate-using-langchain-cli) can help you automatically upgrade your code to use non-deprecated imports.
This will be especially helpful if you're still on either version 0.0.x or 0.1.x of LangChain.
[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
[](https://codespaces.new/langchain-ai/langchain)
<img src="https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode&style=flat-square" alt="Open in Dev Containers">
Looking for the JS/TS library? Check out [LangChain.js](https://github.com/langchain-ai/langchainjs).
To help you ship LangChain apps to production faster, check out [LangSmith](https://smith.langchain.com).
[LangSmith](https://smith.langchain.com) is a unified developer platform for building, testing, and monitoring LLM applications.
Fill out [this form](https://www.langchain.com/contact-sales) to speak with our sales team.
## Quick Install
With pip:
LangChain is a framework for building LLM-powered applications. It helps you chain together interoperable components and third-party integrations to simplify AI application development — all while future-proofing decisions as the underlying technology evolves.
```bash
pip install langchain
pip install -U langchain
```
With conda:
---
```bash
conda install langchain -c conda-forge
```
**Documentation**: To learn more about LangChain, check out [the docs](https://python.langchain.com/docs/introduction/).
## 🤔 What is LangChain?
If you're looking for more advanced customization or agent orchestration, check out [LangGraph](https://langchain-ai.github.io/langgraph/), our framework for building controllable agent workflows.
**LangChain** is a framework for developing applications powered by large language models (LLMs).
> [!NOTE]
> Looking for the JS/TS library? Check out [LangChain.js](https://github.com/langchain-ai/langchainjs).
For these applications, LangChain simplifies the entire application lifecycle:
## Why use LangChain?
LangChain helps developers build applications powered by LLMs through a standard interface for models, embeddings, vector stores, and more.
- **Open-source libraries**: Build your applications using LangChain's open-source
[components](https://python.langchain.com/docs/concepts/) and
Use [LangGraph](https://langchain-ai.github.io/langgraph/) to build stateful agents with first-class streaming and human-in-the-loop support.
- **Productionization**: Inspect, monitor, and evaluate your apps with [LangSmith](https://docs.smith.langchain.com/) so that you can constantly optimize and deploy with confidence.
- **Deployment**: Turn your LangGraph applications into production-ready APIs and Assistants with [LangGraph Platform](https://langchain-ai.github.io/langgraph/cloud/).
Use LangChain for:
### Open-source libraries
- **Real-time data augmentation**. Easily connect LLMs to diverse data sources and external/internal systems, drawing from LangChain’s vast library of integrations with model providers, tools, vector stores, retrievers, and more.
- **Model interoperability**. Swap models in and out as your engineering team experiments to find the best choice for your application’s needs. As the industry frontier evolves, adapt quickly — LangChain’s abstractions keep you moving without losing momentum.
- **`langchain-core`**: Base abstractions.
- **Integration packages** (e.g. **`langchain-openai`**, **`langchain-anthropic`**, etc.): Important integrations have been split into lightweight packages that are co-maintained by the LangChain team and the integration developers.
- **`langchain`**: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
- **`langchain-community`**: Third-party integrations that are community maintained.
- **[LangGraph](https://langchain-ai.github.io/langgraph)**: Build robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph. Integrates smoothly with LangChain, but can be used without it. To learn more about LangGraph, check out our first LangChain Academy course, *Introduction to LangGraph*, available [here](https://academy.langchain.com/courses/intro-to-langgraph).
## LangChain’s ecosystem
### Productionization:
While the LangChain framework can be used standalone, it also integrates seamlessly with any LangChain product, giving developers a full suite of tools when building LLM applications.
- **[LangSmith](https://docs.smith.langchain.com/)**: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
To improve your LLM application development, pair LangChain with:
### Deployment:
- [LangSmith](https://www.langchain.com/langsmith) - Helpful for agent evals and observability. Debug poor-performing LLM app runs, evaluate agent trajectories, gain visibility in production, and improve performance over time.
- [LangGraph](https://langchain-ai.github.io/langgraph/) - Build agents that can reliably handle complex tasks with LangGraph, our low-level agent orchestration framework. LangGraph offers customizable architecture, long-term memory, and human-in-the-loop workflows — and is trusted in production by companies like LinkedIn, Uber, Klarna, and GitLab.
- [LangGraph Platform](https://docs.langchain.com/langgraph-platform) - Deploy and scale agents effortlessly with a purpose-built deployment platform for long-running, stateful workflows. Discover, reuse, configure, and share agents across teams — and iterate quickly with visual prototyping in [LangGraph Studio](https://langchain-ai.github.io/langgraph/concepts/langgraph_studio/).
- **[LangGraph Platform](https://langchain-ai.github.io/langgraph/cloud/)**: Turn your LangGraph applications into production-ready APIs and Assistants.
## Additional resources


- End-to-end Example: [Web LangChain (web researcher chatbot)](https://weblangchain.vercel.app) and [repo](https://github.com/langchain-ai/weblangchain)
And much more! Head to the [Tutorials](https://python.langchain.com/docs/tutorials/) section of the docs for more.
## 🚀 How does LangChain help?
The main value props of the LangChain libraries are:
1.**Components**: composable building blocks, tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not.
2.**Easy orchestration with LangGraph**: [LangGraph](https://langchain-ai.github.io/langgraph/),
built on top of `langchain-core`, has built-in support for [messages](https://python.langchain.com/docs/concepts/messages/), [tools](https://python.langchain.com/docs/concepts/tools/),
and other LangChain abstractions. This makes it easy to combine components into
production-ready applications with persistence, streaming, and other key features.
Check out the LangChain [tutorials page](https://python.langchain.com/docs/tutorials/#orchestration) for examples.
## Components
Components fall into the following **modules**:
**📃 Model I/O**
This includes [prompt management](https://python.langchain.com/docs/concepts/prompt_templates/)
and a generic interface for [chat models](https://python.langchain.com/docs/concepts/chat_models/), including a consistent interface for [tool-calling](https://python.langchain.com/docs/concepts/tool_calling/) and [structured output](https://python.langchain.com/docs/concepts/structured_outputs/) across model providers.
**📚 Retrieval**
Retrieval Augmented Generation involves [loading data](https://python.langchain.com/docs/concepts/document_loaders/) from a variety of sources, [preparing it](https://python.langchain.com/docs/concepts/text_splitters/), then [searching over (a.k.a. retrieving from)](https://python.langchain.com/docs/concepts/retrievers/) it for use in the generation step.
**🤖 Agents**
Agents allow an LLM autonomy over how a task is accomplished. Agents make decisions about which Actions to take, then take that Action, observe the result, and repeat until the task is complete. [LangGraph](https://langchain-ai.github.io/langgraph/) makes it easy to use
LangChain components to build both [custom](https://langchain-ai.github.io/langgraph/tutorials/)
and [built-in](https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/)
LLM agents.
## 📖 Documentation
Please see [here](https://python.langchain.com) for full documentation, which includes:
- [Introduction](https://python.langchain.com/docs/introduction/): Overview of the framework and the structure of the docs.
- [Tutorials](https://python.langchain.com/docs/tutorials/): If you're looking to build something specific or are more of a hands-on learner, check out our tutorials. This is the best place to get started.
- [How-to guides](https://python.langchain.com/docs/how_to/): Answers to “How do I….?” type questions. These guides are goal-oriented and concrete; they're meant to help you complete a specific task.
- [Conceptual guide](https://python.langchain.com/docs/concepts/): Conceptual explanations of the key parts of the framework.
- [API Reference](https://python.langchain.com/api_reference/): Thorough documentation of every class and method.
## 🌐 Ecosystem
- [🦜🛠️ LangSmith](https://docs.smith.langchain.com/): Trace and evaluate your language model applications and intelligent agents to help you move from prototype to production.
- [🦜🕸️ LangGraph](https://langchain-ai.github.io/langgraph/): Create stateful, multi-actor applications with LLMs. Integrates smoothly with LangChain, but can be used without it.
- [🦜🕸️ LangGraph Platform](https://langchain-ai.github.io/langgraph/concepts/#langgraph-platform): Deploy LLM applications built with LangGraph into production.
## 💁 Contributing
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.
For detailed information on how to contribute, see [here](https://python.langchain.com/docs/contributing/).
- [Tutorials](https://python.langchain.com/docs/tutorials/): Simple walkthroughs with guided examples on getting started with LangChain.
- [How-to Guides](https://python.langchain.com/docs/how_to/): Quick, actionable code snippets for topics such as tool calling, RAG use cases, and more.
- [Conceptual Guides](https://python.langchain.com/docs/concepts/): Explanations of key concepts behind the LangChain framework.
- [LangChain Forum](https://forum.langchain.com/): Connect with the community and share all of your technical questions, ideas, and feedback.
- [API Reference](https://python.langchain.com/api_reference/): Detailed reference on navigating base packages and integrations for LangChain.
- [Chat LangChain](https://chat.langchain.com/): Ask questions & chat with our documentation.
@@ -4,13 +4,14 @@ LangChain has a large ecosystem of integrations with various external resources
## Best practices
When building such applications developers should remember to follow good security practices:
When building such applications, developers should remember to follow good security practices:
* [**Limit Permissions**](https://en.wikipedia.org/wiki/Principle_of_least_privilege): Scope permissions specifically to the application's need. Granting broad or excessive permissions can introduce significant security vulnerabilities. To avoid such vulnerabilities, consider using read-only credentials, disallowing access to sensitive resources, using sandboxing techniques (such as running inside a container), specifying proxy configurations to control external requests, etc. as appropriate for your application.
* **Anticipate Potential Misuse**: Just as humans can err, so can Large Language Models (LLMs). Always assume that any system access or credentials may be used in any way allowed by the permissions they are assigned. For example, if a pair of database credentials allows deleting data, it’s safest to assume that any LLM able to use those credentials may in fact delete data.
* [**Defense in Depth**](https://en.wikipedia.org/wiki/Defense_in_depth_(computing)): No security technique is perfect. Fine-tuning and good chain design can reduce, but not eliminate, the odds that a Large Language Model (LLM) may make a mistake. It’s best to combine multiple layered security approaches rather than relying on any single layer of defense to ensure security. For example: use both read-only permissions and sandboxing to ensure that LLMs are only able to access data that is explicitly meant for them to use.
* [**Limit Permissions**](https://en.wikipedia.org/wiki/Principle_of_least_privilege): Scope permissions specifically to the application's need. Granting broad or excessive permissions can introduce significant security vulnerabilities. To avoid such vulnerabilities, consider using read-only credentials, disallowing access to sensitive resources, using sandboxing techniques (such as running inside a container), specifying proxy configurations to control external requests, etc., as appropriate for your application.
* **Anticipate Potential Misuse**: Just as humans can err, so can Large Language Models (LLMs). Always assume that any system access or credentials may be used in any way allowed by the permissions they are assigned. For example, if a pair of database credentials allows deleting data, it's safest to assume that any LLM able to use those credentials may in fact delete data.
* [**Defense in Depth**](https://en.wikipedia.org/wiki/Defense_in_depth_(computing)): No security technique is perfect. Fine-tuning and good chain design can reduce, but not eliminate, the odds that a Large Language Model (LLM) may make a mistake. It's best to combine multiple layered security approaches rather than relying on any single layer of defense to ensure security. For example: use both read-only permissions and sandboxing to ensure that LLMs are only able to access data that is explicitly meant for them to use.
Risks of not doing so include, but are not limited to:
* Data corruption or loss.
* Unauthorized access to confidential information.
* Compromised performance or availability of critical resources.
@@ -21,65 +22,58 @@ Example scenarios with mitigation strategies:
* A user may ask an agent with write access to an external API to write malicious data to the API, or delete data from that API. To mitigate, give the agent read-only API keys, or limit it to only use endpoints that are already resistant to such misuse.
* A user may ask an agent with access to a database to drop a table or mutate the schema. To mitigate, scope the credentials to only the tables that the agent needs to access and consider issuing READ-ONLY credentials.
If you're building applications that access external resources like file systems, APIs
or databases, consider speaking with your company's security team to determine how to best
design and secure your applications.
If you're building applications that access external resources like file systems, APIs or databases, consider speaking with your company's security team to determine how to best design and secure your applications.
## Reporting OSS Vulnerabilities
LangChain is partnered with [huntr by Protect AI](https://huntr.com/) to provide
a bounty program for our open source projects.
LangChain is partnered with [huntr by Protect AI](https://huntr.com/) to provide
a bounty program for our open source projects.
Please report security vulnerabilities associated with the LangChain
open source projects by visiting the following link:
Please report security vulnerabilities associated with the LangChain
open source projects at [huntr](https://huntr.com/bounties/disclose/?target=https%3A%2F%2Fgithub.com%2Flangchain-ai%2Flangchain&validSearch=true).
Before reporting a vulnerability, please review:
1) In-Scope Targets and Out-of-Scope Targets below.
2) The [langchain-ai/langchain](https://python.langchain.com/docs/contributing/repo_structure) monorepo structure.
3) The [Best practicies](#best-practices) above to
understand what we consider to be a security vulnerability vs. developer
responsibility.
2) The [langchain-ai/langchain](https://docs.langchain.com/oss/python/contributing/code#supporting-packages) monorepo structure.
3) The [Best Practices](#best-practices) above to understand what we consider to be a security vulnerability vs. developer responsibility.
### In-Scope Targets
The following packages and repositories are eligible for bug bounties:
- langchain-core
- langchain (see exceptions)
- langchain-community (see exceptions)
- langgraph
- langserve
* langchain-core
* langchain (see exceptions)
* langchain-community (see exceptions)
* langgraph
* langserve
### Out of Scope Targets
All out of scope targets defined by huntr as well as:
- **langchain-experimental**: This repository is for experimental code and is not
* **langchain-experimental**: This repository is for experimental code and is not
eligible for bug bounties (see [package warning](https://pypi.org/project/langchain-experimental/)), bug reports to it will be marked as interesting or waste of
time and published with no bounty attached.
- **tools**: Tools in either langchain or langchain-community are not eligible for bug
***tools**: Tools in either langchain or langchain-community are not eligible for bug
bounties. This includes the following directories
- libs/langchain/langchain/tools
- libs/community/langchain_community/tools
- Please review the [best practices](#best-practices)
* libs/langchain/langchain/tools
* libs/community/langchain_community/tools
* Please review the [Best Practices](#best-practices)
for more details, but generally tools interact with the real world. Developers are
expected to understand the security implications of their code and are responsible
for the security of their tools.
- Code documented with security notices. This will be decided done on a case by
case basis, but likely will not be eligible for a bounty as the code is already
* Code documented with security notices. This will be decided on a case-by-case basis, but likely will not be eligible for a bounty as the code is already
documented with guidelines for developers that should be followed for making their
application secure.
- Any LangSmith related repositories or APIs (see [Reporting LangSmith Vulnerabilities](#reporting-langsmith-vulnerabilities)).
* Any LangSmith related repositories or APIs (see [Reporting LangSmith Vulnerabilities](#reporting-langsmith-vulnerabilities)).
## Reporting LangSmith Vulnerabilities
Please report security vulnerabilities associated with LangSmith by email to `security@langchain.dev`.
"Go to the VertexAI Model Garden on Google Cloud [console](https://pantheon.corp.google.com/vertex-ai/publishers/google/model-garden/335), and deploy the desired version of Gemma to VertexAI. It will take a few minutes, and after the endpoint it ready, you need to copy its number."
"Go to the VertexAI Model Garden on Google Cloud [console](https://pantheon.corp.google.com/vertex-ai/publishers/google/model-garden/335), and deploy the desired version of Gemma to VertexAI. It will take a few minutes, and after the endpoint is ready, you need to copy its number."
[code-analysis-deeplake.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/code-analysis-deeplake.ipynb) | Analyze its own code base with the help of gpt and activeloop's deep lake.
[custom_agent_with_plugin_retri...](https://github.com/langchain-ai/langchain/tree/master/cookbook/custom_agent_with_plugin_retrieval.ipynb) | Build a custom agent that can interact with ai plugins by retrieving tools and creating natural language wrappers around openapi endpoints.
[custom_agent_with_plugin_retri...](https://github.com/langchain-ai/langchain/tree/master/cookbook/custom_agent_with_plugin_retrieval_using_plugnplai.ipynb) | Build a custom agent with plugin retrieval functionality, utilizing ai plugins from the `plugnplai` directory.
[databricks_sql_db.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/databricks_sql_db.ipynb) | Connect to databricks runtimes and databricks sql.
[deeplake_semantic_search_over_...](https://github.com/langchain-ai/langchain/tree/master/cookbook/deeplake_semantic_search_over_chat.ipynb) | Perform semantic search and question-answering over a group chat using activeloop's deep lake with gpt4.
[elasticsearch_db_qa.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/elasticsearch_db_qa.ipynb) | Interact with elasticsearch analytics databases in natural language and build search queries via the elasticsearch dsl API.
[extraction_openai_tools.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/extraction_openai_tools.ipynb) | Structured Data Extraction with OpenAI Tools
@@ -64,4 +63,5 @@ Notebook | Description
[rag-locally-on-intel-cpu.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/rag-locally-on-intel-cpu.ipynb) | Perform Retrieval-Augmented-Generation (RAG) on locally downloaded open-source models using langchain and open source tools and execute it on Intel Xeon CPU. We showed an example of how to apply RAG on Llama 2 model and enable it to answer the queries related to Intel Q1 2024 earnings release.
[visual_RAG_vdms.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/visual_RAG_vdms.ipynb) | Performs Visual Retrieval-Augmented-Generation (RAG) using videos and scene descriptions generated by open source models.
[contextual_rag.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/contextual_rag.ipynb) | Performs contextual retrieval-augmented generation (RAG) prepending chunk-specific explanatory context to each chunk before embedding.
[rag-agents-locally-on-intel-cpu.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/local_rag_agents_intel_cpu.ipynb) | Build a RAG agent locally with open source models that routes questions through one of two paths to find answers. The agent generates answers based on documents retrieved from either the vector database or retrieved from web search. If the vector database lacks relevant information, the agent opts for web search. Open-source models for LLM and embeddings are used locally on an Intel Xeon CPU to execute this pipeline.
[rag-agents-locally-on-intel-cpu.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/local_rag_agents_intel_cpu.ipynb) | Build a RAG agent locally with open source models that routes questions through one of two paths to find answers. The agent generates answers based on documents retrieved from either the vector database or retrieved from web search. If the vector database lacks relevant information, the agent opts for web search. Open-source models for LLM and embeddings are used locally on an Intel Xeon CPU to execute this pipeline.
[rag_mlflow_tracking_evaluation.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/rag_mlflow_tracking_evaluation.ipynb) | Guide on how to create a RAG pipeline and track + evaluate it with MLflow.
" description=\"useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question.\",\n",
" ),\n",
" Tool(\n",
" name=\"Ruff QA System\",\n",
" name=\"ruff_qa_system\",\n",
" func=ruff.run,\n",
" description=\"useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question.\",\n",
" ),\n",
@@ -186,94 +192,116 @@
},
{
"cell_type": "code",
"execution_count": 45,
"id": "fc47f230",
"execution_count": 11,
"id": "70c461d8-aaca-4f2a-9a93-bf35841cc615",
"metadata": {},
"outputs": [],
"source": [
"# Construct the agent. We will use the default agent type here.\n",
"# See documentation for a full list of options.\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out what Biden said about Ketanji Brown Jackson in the State of the Union address.\n",
"Action: State of Union QA System\n",
"Action Input: What did Biden say about Ketanji Brown Jackson in the State of the Union address?\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
" Biden said that he nominated Ketanji Brown Jackson for the United States Supreme Court and praised her as one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence.\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
"In the State of the Union address, Biden said that he nominated Ketanji Brown Jackson for the United States Supreme Court and praised her as one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence.\n"
]
},
{
"data": {
"text/plain": [
"\"Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\""
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\n",
" \"What did biden say about ketanji brown jackson in the state of the union address?\"\n",
")"
"input_message = {\n",
" \"role\": \"user\",\n",
" \"content\": \"What did biden say about ketanji brown jackson in the state of the union address?\",\n",
"}\n",
"\n",
"for step in agent.stream(\n",
" {\"messages\": [input_message]},\n",
" stream_mode=\"values\",\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "4e91b811",
"execution_count": 13,
"id": "e836b4cd-abf7-49eb-be0e-b9ad501213f3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"================================\u001b[1m Human Message \u001b[0m=================================\n",
"\n",
"Why use ruff over flake8?\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out the advantages of using ruff over flake8\n",
"Action: Ruff QA System\n",
"Action Input: What are the advantages of using ruff over flake8?\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.\u001b[0m\n",
"There are a few reasons why someone might choose to use Ruff over Flake8:\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
"1. Larger rule set: Ruff implements over 800 rules, while Flake8 only implements around 200. This means that Ruff can catch more potential issues in your code.\n",
"\n",
"2. Better compatibility with other tools: Ruff is designed to work well with other tools like Black, isort, and type checkers like Mypy. This means that you can use Ruff alongside these tools to get more comprehensive feedback on your code.\n",
"\n",
"3. Automatic fixing of lint violations: Unlike Flake8, Ruff is capable of automatically fixing its own lint violations. This can save you time and effort when fixing issues in your code.\n",
"\n",
"4. Native implementation of popular Flake8 plugins: Ruff re-implements some of the most popular Flake8 plugins natively, which means you don't have to install and configure multiple plugins to get the same functionality.\n",
"\n",
"Overall, Ruff offers a more comprehensive and user-friendly experience compared to Flake8, making it a popular choice for many developers.\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"\n",
"You might choose to use Ruff over Flake8 for several reasons:\n",
"\n",
"1. Ruff has a much larger rule set, implementing over 800 rules compared to Flake8's roughly 200, so it can catch more potential issues.\n",
"2. Ruff is designed to work better with other tools like Black, isort, and type checkers like Mypy, providing more comprehensive code feedback.\n",
"3. Ruff can automatically fix its own lint violations, which Flake8 cannot, saving time and effort.\n",
"4. Ruff natively implements some popular Flake8 plugins, so you don't need to install and configure multiple plugins separately.\n",
"\n",
"Overall, Ruff offers a more comprehensive and user-friendly experience compared to Flake8.\n"
]
},
{
"data": {
"text/plain": [
"'Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.'"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"Why use ruff over flake8?\")"
"input_message = {\n",
" \"role\": \"user\",\n",
" \"content\": \"Why use ruff over flake8?\",\n",
"}\n",
"\n",
"for step in agent.stream(\n",
" {\"messages\": [input_message]},\n",
" stream_mode=\"values\",\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
]
},
{
@@ -296,20 +324,20 @@
},
{
"cell_type": "code",
"execution_count": 48,
"execution_count": 14,
"id": "f59b377e",
"metadata": {},
"outputs": [],
"source": [
"tools = [\n",
" Tool(\n",
" name=\"Stateof Union QA System\",\n",
" name=\"state_of_union_qa_system\",\n",
" func=state_of_union.run,\n",
" description=\"useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question.\",\n",
" return_direct=True,\n",
" ),\n",
" Tool(\n",
" name=\"Ruff QA System\",\n",
" name=\"ruff_qa_system\",\n",
" func=ruff.run,\n",
" description=\"useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question.\",\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out what Biden said about Ketanji Brown Jackson in the State of the Union address.\n",
"Action: State of Union QA System\n",
"Action Input: What did Biden say about Ketanji Brown Jackson in the State of the Union address?\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
" Biden said that he nominated Ketanji Brown Jackson for the United States Supreme Court and praised her as one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence.\n"
]
},
{
"data": {
"text/plain": [
"\" Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\""
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\n",
" \"What did biden say about ketanji brown jackson in the state of the union address?\"\n",
")"
"input_message = {\n",
" \"role\": \"user\",\n",
" \"content\": \"What did biden say about ketanji brown jackson in the state of the union address?\",\n",
"}\n",
"\n",
"for step in agent.stream(\n",
" {\"messages\": [input_message]},\n",
" stream_mode=\"values\",\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "edfd0a1a",
"execution_count": 17,
"id": "88f08d86-7972-4148-8128-3ac8898ad68a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"================================\u001b[1m Human Message \u001b[0m=================================\n",
"\n",
"Why use ruff over flake8?\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out the advantages of using ruff over flake8\n",
"Action: Ruff QA System\n",
"Action Input: What are the advantages of using ruff over flake8?\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
" Ruff has a larger rule set, supports automatic fixing of lint violations, and does not require the installation of additional plugins. It also has better compatibility with Black and can be used alongside a type checker for more comprehensive code analysis.\n"
]
},
{
"data": {
"text/plain": [
"' Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.'"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"Why use ruff over flake8?\")"
"input_message = {\n",
" \"role\": \"user\",\n",
" \"content\": \"Why use ruff over flake8?\",\n",
"}\n",
"\n",
"for step in agent.stream(\n",
" {\"messages\": [input_message]},\n",
" stream_mode=\"values\",\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
]
},
{
@@ -417,19 +447,19 @@
},
{
"cell_type": "code",
"execution_count": 57,
"execution_count": 18,
"id": "d397a233",
"metadata": {},
"outputs": [],
"source": [
"tools = [\n",
" Tool(\n",
" name=\"Stateof Union QA System\",\n",
" name=\"state_of_union_qa_system\",\n",
" func=state_of_union.run,\n",
" description=\"useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question, not referencing any obscure pronouns from the conversation before.\",\n",
" ),\n",
" Tool(\n",
" name=\"Ruff QA System\",\n",
" name=\"ruff_qa_system\",\n",
" func=ruff.run,\n",
" description=\"useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question, not referencing any obscure pronouns from the conversation before.\",\n",
" ),\n",
@@ -438,60 +468,60 @@
},
{
"cell_type": "code",
"execution_count": 58,
"id": "06157240",
"execution_count": 19,
"id": "41743f29-150d-40ba-aa8e-3a63c32216aa",
"metadata": {},
"outputs": [],
"source": [
"# Construct the agent. We will use the default agent type here.\n",
"# See documentation for a full list of options.\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out what tool ruff uses to run over Jupyter Notebooks, and if the president mentioned it in the state of the union.\n",
"Action: Ruff QA System\n",
"Action Input: What tool does ruff use to run over Jupyter Notebooks?\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m Ruff is integrated into nbQA, a tool for running linters and code formatters over Jupyter Notebooks. After installing ruff and nbqa, you can run Ruff over a notebook like so: > nbqa ruff Untitled.html\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now need to find out if the president mentioned this tool in the state of the union.\n",
"Action: State of Union QA System\n",
"Action Input: Did the president mention nbQA in the state of the union?\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m No, the president did not mention nbQA in the state of the union.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: No, the president did not mention nbQA in the state of the union.\u001b[0m\n",
" No, the president did not mention the tool that ruff uses to run over Jupyter Notebooks in the state of the union.\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
"Ruff does not support source.organizeImports and source.fixAll code actions in Jupyter Notebooks. Additionally, the president did not mention the tool that ruff uses to run over Jupyter Notebooks in the state of the union.\n"
]
},
{
"data": {
"text/plain": [
"'No, the president did not mention nbQA in the state of the union.'"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\n",
" \"What tool does ruff use to run over Jupyter Notebooks? Did the president mention that tool in the state of the union?\"\n",
")"
"input_message = {\n",
" \"role\": \"user\",\n",
" \"content\": \"What tool does ruff use to run over Jupyter Notebooks? Did the president mention that tool in the state of the union?\",\n",
"`Optional`: You can also use Deep Lake's Managed Tensor Database as a hosting service and run queries there. In order to do so, it is necessary to specify the runtime parameter as {'tensor_db': True} during the creation of the vector store. This configuration enables the execution of queries on the Managed Tensor Database, rather than on the client side. It should be noted that this functionality is not applicable to datasets stored locally or in-memory. In the event that a vector store has already been created outside of the Managed Tensor Database, it is possible to transfer it to the Managed Tensor Database by following the prescribed steps."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"# from langchain_community.vectorstores import DeepLake\n",
"You can also specify user defined functions using [Deep Lake filters](https://docs.deeplake.ai/en/latest/deeplake.core.dataset.html#deeplake.core.dataset.Dataset.filter)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def filter(x):\n",
" # filter based on source code\n",
" if \"something\" in x[\"text\"].data()[\"value\"]:\n",
"This notebook covers how to connect to the [Databricks runtimes](https://docs.databricks.com/runtime/index.html) and [Databricks SQL](https://www.databricks.com/product/databricks-sql) using the SQLDatabase wrapper of LangChain.\n",
"It is broken into 3 parts: installation and setup, connecting to Databricks, and examples."
]
},
{
"cell_type": "markdown",
"id": "0076d072",
"metadata": {},
"source": [
"## Installation and Setup"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "739b489b",
"metadata": {},
"outputs": [],
"source": [
"!pip install databricks-sql-connector"
]
},
{
"cell_type": "markdown",
"id": "73113163",
"metadata": {},
"source": [
"## Connecting to Databricks\n",
"\n",
"You can connect to [Databricks runtimes](https://docs.databricks.com/runtime/index.html) and [Databricks SQL](https://www.databricks.com/product/databricks-sql) using the `SQLDatabase.from_databricks()` method.\n",
"\n",
"### Syntax\n",
"```python\n",
"SQLDatabase.from_databricks(\n",
" catalog: str,\n",
" schema: str,\n",
" host: Optional[str] = None,\n",
" api_token: Optional[str] = None,\n",
" warehouse_id: Optional[str] = None,\n",
" cluster_id: Optional[str] = None,\n",
" engine_args: Optional[dict] = None,\n",
" **kwargs: Any)\n",
"```\n",
"### Required Parameters\n",
"* `catalog`: The catalog name in the Databricks database.\n",
"* `schema`: The schema name in the catalog.\n",
"\n",
"### Optional Parameters\n",
"There following parameters are optional. When executing the method in a Databricks notebook, you don't need to provide them in most of the cases.\n",
"* `host`: The Databricks workspace hostname, excluding 'https://' part. Defaults to 'DATABRICKS_HOST' environment variable or current workspace if in a Databricks notebook.\n",
"* `api_token`: The Databricks personal access token for accessing the Databricks SQL warehouse or the cluster. Defaults to 'DATABRICKS_TOKEN' environment variable or a temporary one is generated if in a Databricks notebook.\n",
"* `warehouse_id`: The warehouse ID in the Databricks SQL.\n",
"* `cluster_id`: The cluster ID in the Databricks Runtime. If running in a Databricks notebook and both 'warehouse_id' and 'cluster_id' are None, it uses the ID of the cluster the notebook is attached to.\n",
"* `engine_args`: The arguments to be used when connecting Databricks.\n",
"* `**kwargs`: Additional keyword arguments for the `SQLDatabase.from_uri` method."
]
},
{
"cell_type": "markdown",
"id": "b11c7e48",
"metadata": {},
"source": [
"## Examples"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "8102bca0",
"metadata": {},
"outputs": [],
"source": [
"# Connecting to Databricks with SQLDatabase wrapper\n",
"This example demonstrates the use of the [SQL Chain](https://python.langchain.com/en/latest/modules/chains/examples/sqlite.html) for answering a question over a Databricks database."
"Answer:\u001b[32;1m\u001b[1;3mThe average duration of taxi rides that start between midnight and 6am is 987.81 seconds.\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'The average duration of taxi rides that start between midnight and 6am is 987.81 seconds.'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db_chain.run(\n",
" \"What is the average duration of taxi rides that start between midnight and 6am?\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "e496d5e5",
"metadata": {},
"source": [
"### SQL Database Agent example\n",
"\n",
"This example demonstrates the use of the [SQL Database Agent](/docs/integrations/tools/sql_database) for answering questions over a Databricks database."
"Thought:\u001b[32;1m\u001b[1;3mI should check the schema of the trips table to see if it has the necessary columns for trip distance and duration.\n",
"Thought:\u001b[32;1m\u001b[1;3mThe trips table has the necessary columns for trip distance and duration. I will write a query to find the longest trip distance and its duration.\n",
"Action: query_checker_sql_db\n",
"Action Input: SELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001b[0m\n",
"Observation: \u001b[31;1m\u001b[1;3mSELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mThe query is correct. I will now execute it to find the longest trip distance and its duration.\n",
"Action: query_sql_db\n",
"Action Input: SELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001b[0m\n",
"PROMPT_TEMPLATE = \"\"\"Given an input question, create a syntactically correct Elasticsearch query to run. Unless the user specifies in their question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.\n",
"\n",
"Unless told to do not query for all the columns from a specific index, only ask for a the few relevant columns given the question.\n",
"Unless told to do not query for all the columns from a specific index, only ask for a few relevant columns given the question.\n",
"\n",
"Pay attention to use only the column names that you can see in the mapping description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which index. Return the query as valid json.\n",
"source": "##### You can embed text in the same VectorDB space as images, and retrieve text and images as well based on input text or image.\n##### Following link demonstrates that.\n<a> https://python.langchain.com/v0.2/docs/integrations/text_embedding/open_clip/ </a>"
},
{
"cell_type": "markdown",
@@ -358,7 +354,7 @@
"id": "6e5cd014-db86-4d6b-8399-25cae3da5570",
"metadata": {},
"source": [
"## Helper function to plot retrived similar images"
"## Helper function to plot retrieved similar images"
]
},
{
@@ -389,7 +385,7 @@
" ax = axs[idx]\n",
" ax.imshow(img)\n",
" # Assuming similarity is not available in the new data, removed sim_score\n",
"contemporary criticism of the less-than- thoughtful circumstances under which Lange photographed Thomson, the picture’s power to engage has not diminished. Artists in other countries have appropriated the image, changing the mother’s features into those of other ethnicities, but keeping her expression and the positions of her clinging children. Long after anyone could help the Thompson family, this picture has resonance in another time of national crisis, unemployment and food shortages.\n",
"\n",
"A striking, but very different picture is a 1900 portrait of the legendary Hin-mah-too-yah- lat-kekt (Chief Joseph) of the Nez Percé people. The Bureau of American Ethnology in Washington, D.C., regularly arranged for its photographer, De Lancey Gill, to photograph Native American delegations that came to the capital to confer with officials about tribal needs and concerns. Although Gill described Chief Joseph as having “an air of gentleness and quiet reserve,” the delegate skeptically appraises the photographer, which is not surprising given that the United States broke five treaties with Chief Joseph and his father between 1855 and 1885.\n",
"\n",
"More than a glance, second looks may reveal new knowledge into complex histories.\n",
"\n",
"Anne Wilkes Tucker is the photography curator emeritus of the Museum of Fine Arts, Houston and curator of the “Not an Ostrich” exhibition.\n",
"\n",
"28\n",
"\n",
"28 LIBRARY OF CONGRESS MAGAZINE\n",
"\n",
"LIBRARY OF CONGRESS MAGAZINE\n",
"THEYRE WILLING TO HAVE MEENTERTAIN THEM DURING THE DAY,BUT AS SOON AS IT STARTSGETTING DARK, THEY ALLGO OFF, AND LEAVE ME! \n",
"ROSA PARKS: IN HER OWN WORDS\n",
"\n",
"COMIC ART: 120 YEARS OF PANELS AND PAGES\n",
"\n",
"SHALL NOT BE DENIED: WOMEN FIGHT FOR THE VOTE\n",
"\n",
"More information loc.gov/exhibits\n",
"Nuestra Sefiora de las Iguanas\n",
"\n",
"Graciela Iturbide’s 1979 portrait of Zobeida Díaz in the town of Juchitán in southeastern Mexico conveys the strength of women and reflects their important contributions to the economy. Díaz, a merchant, was selling iguanas to cook and eat, carrying them on her head, as is customary.\n",
"Iturbide requested permission to take a photograph, but this proved challenging because the iguanas were constantly moving, causing Díaz to laugh. The result, however, was a brilliant portrait that the inhabitants of Juchitán claimed with pride. They have reproduced it on posters and erected a statue honoring Díaz and her iguanas. The photo now appears throughout the world, inspiring supporters of feminism, women’s rights and gender equality.\n",
"\n",
"—Adam Silvia is a curator in the Prints and Photographs Division.\n",
"\n",
"6\n",
"\n",
"6 LIBRARY OF CONGRESS MAGAZINE\n",
"\n",
"LIBRARY OF CONGRESS MAGAZINE\n",
"\n",
"‘Migrant Mother’ is Florence Owens Thompson\n",
"\n",
"The iconic portrait that became the face of the Great Depression is also the most famous photograph in the collections of the Library of Congress.\n",
"\n",
"The Library holds the original source of the photo — a nitrate negative measuring 4 by 5 inches. Do you see a faint thumb in the bottom right? The photographer, Dorothea Lange, found the thumb distracting and after a few years had the negative altered to make the thumb almost invisible. Lange’s boss at the Farm Security Administration, Roy Stryker, criticized her action because altering a negative undermines the credibility of a documentary photo.\n",
"Shrimp Picker\n",
"\n",
"The photos and evocative captions of Lewis Hine served as source material for National Child Labor Committee reports and exhibits exposing abusive child labor practices in the United States in the first decades of the 20th century.\n",
"\n",
"LEWIS WICKES HINE. “MANUEL, THE YOUNG SHRIMP-PICKER, FIVE YEARS OLD, AND A MOUNTAIN OF CHILD-LABOR OYSTER SHELLS BEHIND HIM. HE WORKED LAST YEAR. UNDERSTANDS NOT A WORD OF ENGLISH. DUNBAR, LOPEZ, DUKATE COMPANY. LOCATION: BILOXI, MISSISSIPPI.” FEBRUARY 1911. NATIONAL CHILD LABOR COMMITTEE COLLECTION. PRINTS AND PHOTOGRAPHS DIVISION.\n",
"\n",
"For 15 years, Hine\n",
"\n",
"crisscrossed the country, documenting the practices of the worst offenders. His effective use of photography made him one of the committee's greatest publicists in the campaign for legislation to ban child labor.\n",
"\n",
"Hine was a master at taking photos that catch attention and convey a message and, in this photo, he framed Manuel in a setting that drove home the boy’s small size and unsafe environment.\n",
"\n",
"Captions on photos of other shrimp pickers emphasized their long working hours as well as one hazard of the job: The acid from the shrimp made pickers’ hands sore and “eats the shoes off your feet.”\n",
"\n",
"Such images alerted viewers to all that workers, their families and the nation sacrificed when children were part of the labor force. The Library holds paper records of the National Child Labor Committee as well as over 5,000 photographs.\n",
"\n",
"—Barbara Natanson is head of the Reference Section in the Prints and Photographs Division.\n",
"\n",
"8\n",
"\n",
"LIBRARY OF CONGRESS MAGAZINE\n",
"\n",
"LIBRARY OF CONGRESS MAGAZINE\n",
"\n",
"Intergenerational Portrait\n",
"\n",
"Raised on the Apsáalooke (Crow) reservation in Montana, photographer Wendy Red Star created her “Apsáalooke Feminist” self-portrait series with her daughter Beatrice. With a dash of wry humor, mother and daughter are their own first-person narrators.\n",
"\n",
"Red Star explains the significance of their appearance: “The dress has power: You feel strong and regal wearing it. In my art, the elk tooth dress specifically symbolizes Crow womanhood and the matrilineal line connecting me to my ancestors. As a mother, I spend hours searching for the perfect elk tooth dress materials to make a prized dress for my daughter.”\n",
"\n",
"In a world that struggles with cultural identities, this photograph shows us the power and beauty of blending traditional and contemporary styles.\n",
"Gordon Parks created an iconic image with this 1942 photograph of cleaning woman Ella Watson.\n",
"\n",
"Snow blankets the U.S. Capitol in this classic image by Ernest L. Crandall.\n",
"\n",
"Start your new year out right with a poster promising good reading for months to come.\n",
"\n",
"▪ Order online: loc.gov/shop ▪ Order by phone: 888.682.3557\n",
"\n",
"26\n",
"\n",
"LIBRARY OF CONGRESS MAGAZINE\n",
"\n",
"LIBRARY OF CONGRESS MAGAZINE\n",
"\n",
"SUPPORT\n",
"\n",
"A PICTURE OF PHILANTHROPY Annenberg Foundation Gives $1 Million and a Photographic Collection to the Library.\n",
"\n",
"A major gift by Wallis Annenberg and the Annenberg Foundation in Los Angeles will support the effort to reimagine the visitor experience at the Library of Congress. The foundation also is donating 1,000 photographic prints from its Annenberg Space for Photography exhibitions to the Library.\n",
"\n",
"The Library is pursuing a multiyear plan to transform the experience of its nearly 2 million annual visitors, share more of its treasures with the public and show how Library collections connect with visitors’ own creativity and research. The project is part of a strategic plan established by Librarian of Congress Carla Hayden to make the Library more user-centered for Congress, creators and learners of all ages.\n",
"\n",
"A 2018 exhibition at the Annenberg Space for Photography in Los Angeles featured over 400 photographs from the Library. The Library is planning a future photography exhibition, based on the Annenberg-curated show, along with a documentary film on the Library and its history, produced by the Annenberg Space for Photography.\n",
"\n",
"“The nation’s library is honored to have the strong support of Wallis Annenberg and the Annenberg Foundation as we enhance the experience for our visitors,” Hayden said. “We know that visitors will find new connections to the Library through the incredible photography collections and countless other treasures held here to document our nation’s history and creativity.”\n",
"\n",
"To enhance the Library’s holdings, the foundation is giving the Library photographic prints for long-term preservation from 10 other exhibitions hosted at the Annenberg Space for Photography. The Library holds one of the world’s largest photography collections, with about 14 million photos and over 1 million images digitized and available online.\n",
"18 LIBRARY OF CONGRESS MAGAZINE\n"
]
}
],
"source": [
@@ -461,10 +577,17 @@
"name": "stdout",
"output_type": "stream",
"text": [
" The image depicts a woman with several children. The woman appears to be of Cherokee heritage, as suggested by the text provided. The image is described as having been initially regretted by the subject, Florence Owens Thompson, due to her feeling that it did not accurately represent her leadership qualities.\n",
"The historical and cultural context of the image is tied to the Great Depression and the Dust Bowl, both of which affected the Cherokee people in Oklahoma. The photograph was taken during this period, and its subject, Florence Owens Thompson, was a leader within her community who worked tirelessly to help those affected by these crises.\n",
"The image's symbolism and meaning can be interpreted as a representation of resilience and strength in the face of adversity. The woman is depicted with multiple children, which could signify her role as a caregiver and protector during difficult times.\n",
"Connections between the image and the related text include Florence Owens Thompson's leadership qualities and her regretted feelings about the photograph. Additionally, the mention of Dorothea Lange, the photographer who took this photo, ties the image to its historical context and the broader narrative of the Great Depression and Dust Bowl in Oklahoma. \n"
" The image is a black and white photograph by Dorothea Lange titled \"Destitute Pea Pickers in California. Mother of Seven Children. Age Thirty-Two. Nipomo, California.\" It was taken in March 1936 as part of the Farm Security Administration-Office of War Information Collection.\n",
"\n",
"The photograph features a woman with seven children, who appear to be in a state of poverty and hardship. The woman is seated, looking directly at the camera, while three of her children are standing behind her. They all seem to be dressed in ragged clothing, indicative of their impoverished condition.\n",
"\n",
"The historical context of this image is related to the Great Depression, which was a period of economic hardship in the United States that lasted from 1929 to 1939. During this time, many people struggled to make ends meet, and poverty was widespread. This photograph captures the plight of one such family during this difficult period.\n",
"\n",
"The symbolism of the image is multifaceted. The woman's direct gaze at the camera can be seen as a plea for help or an expression of desperation. The ragged clothing of the children serves as a stark reminder of the poverty and hardship experienced by many during this time.\n",
"\n",
"In terms of connections to the related text, it is mentioned that Florence Owens Thompson, the woman in the photograph, initially regretted having her picture taken. However, she later came to appreciate the importance of the image as a representation of the struggles faced by many during the Great Depression. The mention of Helena Zinkham suggests that she may have played a role in the creation or distribution of this photograph.\n",
"\n",
"Overall, this image is a powerful depiction of poverty and hardship during the Great Depression, capturing the resilience and struggles of one family amidst difficult times. \n"
"This guide demonstrates how Oracle AI Vector Search can be used with Langchain to serve an end-to-end RAG pipeline. This guide goes through examples of:\n",
"This guide demonstrates how Oracle AI Vector Search can be used with LangChain to serve an end-to-end RAG pipeline. This guide goes through examples of:\n",
"\n",
" * Loading the documents from various sources using OracleDocLoader\n",
" * Summarizing them within/outside the database using OracleSummary\n",
@@ -47,7 +47,21 @@
"source": [
"### Prerequisites\n",
"\n",
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
"You'll need to install `langchain-oracledb` with `python -m pip install -U langchain-oracledb` to use this integration.\n",
"\n",
"The `python-oracledb` driver is installed automatically as a dependency of langchain-oracledb.\n",
"First, connect as a privileged user to create a demo user with all the required privileges. Change the credentials for your environment. Also set the DEMO_PY_DIR path to a directory on the database host where your model file is located:"
]
},
{
@@ -56,65 +70,30 @@
"metadata": {},
"outputs": [],
"source": [
"# pip install oracledb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Demo User\n",
"First, create a demo user with all the required privileges. "
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connection successful!\n",
"User setup done!\n"
]
}
],
"source": [
"import sys\n",
"\n",
"import oracledb\n",
"\n",
"# Update with your username, password, hostname, and service_name\n",
"username = \"\"\n",
"# Please update with your SYSTEM (or privileged user) username, password, and database connection string\n",
" execute immediate 'drop user if exists testuser cascade';\n",
"\n",
" -- Create user and grant privileges\n",
" execute immediate 'create user testuser identified by testuser';\n",
" execute immediate 'grant connect, unlimited tablespace, create credential, create procedure, create any index to testuser';\n",
" execute immediate 'create or replace directory DEMO_PY_DIR as ''/scratch/hroy/view_storage/hroy_devstorage/demo/orachain''';\n",
" execute immediate 'create or replace directory DEMO_PY_DIR as ''/home/yourname/demo/orachain''';\n",
" execute immediate 'grant read, write on directory DEMO_PY_DIR to public';\n",
" execute immediate 'grant create mining model to testuser';\n",
"\n",
"\n",
" -- Network access\n",
" begin\n",
" DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(\n",
@@ -127,15 +106,7 @@
" end;\n",
" \"\"\"\n",
" )\n",
" print(\"User setup done!\")\n",
" except Exception as e:\n",
" print(f\"User setup failed with error: {e}\")\n",
" finally:\n",
" cursor.close()\n",
" conn.close()\n",
"except Exception as e:\n",
" print(f\"Connection failed with error: {e}\")\n",
" sys.exit(1)"
" print(\"User setup done!\")"
]
},
{
@@ -143,13 +114,13 @@
"metadata": {},
"source": [
"## Process Documents using Oracle AI\n",
"Consider the following scenario: users possess documents stored either in an Oracle Database or a file system and intend to utilize this data with Oracle AI Vector Search powered by Langchain.\n",
"Consider the following scenario: users possess documents stored either in an Oracle Database or a file system and intend to utilize this data with Oracle AI Vector Search powered by LangChain.\n",
"\n",
"To prepare the documents for analysis, a comprehensive preprocessing workflow is necessary. Initially, the documents must be retrieved, summarized (if required), and chunked as needed. Subsequent steps involve generating embeddings for these chunks and integrating them into the Oracle AI Vector Store. Users can then conduct semantic searches on this data.\n",
"\n",
"The Oracle AI Vector Search Langchain library encompasses a suite of document processing tools that facilitate document loading, chunking, summary generation, and embedding creation.\n",
"The Oracle AI Vector Search LangChain library encompasses a suite of document processing tools that facilitate document loading, chunking, summary generation, and embedding creation.\n",
"\n",
"In the sections that follow, we will detail the utilization of Oracle AI Langchain APIs to effectively implement each of these processes."
"In the sections that follow, we will detail the utilization of Oracle AI LangChain APIs to effectively implement each of these processes."
]
},
{
@@ -157,38 +128,24 @@
"metadata": {},
"source": [
"### Connect to Demo User\n",
"The following sample code will show how to connect to Oracle Database. By default, python-oracledb runs in a ‘Thin’ mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in ‘Thick’ mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You might want to switch to thick-mode if you are unable to use thin-mode."
"The following sample code shows how to connect to Oracle Database using the python-oracledb driver. By default, python-oracledb runs in a ‘Thin’ mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in ‘Thick’ mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You can switch to Thickmode if you are unable to use Thinmode."
]
},
{
"cell_type": "code",
"execution_count": 45,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connection successful!\n"
]
}
],
"outputs": [],
"source": [
"import sys\n",
"\n",
"import oracledb\n",
"\n",
"# please update with your username, password, hostname and service_name\n",
"username = \"\"\n",
"# please update with your username, password, and database connection string\n",
"Oracle accommodates a variety of embedding providers, enabling users to choose between proprietary database solutions and third-party services such as OCIGENAI and HuggingFace. This selection dictates the methodology for generating and managing embeddings.\n",
"Oracle accommodates a variety of embedding providers, enabling you to choose between proprietary database solutions and third-party services such as Oracle Generative AI Service and HuggingFace. This selection dictates the methodology for generating and managing embeddings.\n",
"\n",
"***Important*** : Should users opt for the database option, they must upload an ONNX model into the Oracle Database. Conversely, if a third-party provider is selected for embedding generation, uploading an ONNX model to Oracle Database is not required.\n",
"***Important*** : Should you opt for the database option, you must upload an ONNX model into the Oracle Database. Conversely, if a third-party provider is selected for embedding generation, uploading an ONNX model to Oracle Database is not required.\n",
"\n",
"A significant advantage of utilizing an ONNX model directly within Oracle is the enhanced security and performance it offers by eliminating the need to transmit data to external parties. Additionally, this method avoids the latency typically associated with network or REST API calls.\n",
"A significant advantage of utilizing an ONNX model directly within Oracle Database is the enhanced security and performance it offers by eliminating the need to transmit data to external parties. Additionally, this method avoids the latency typically associated with network or REST API calls.\n",
"\n",
"Below is the example code to upload an ONNX model into Oracle Database:"
"Users have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F).\n",
"You have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F).\n",
"\n",
"A significant advantage of utilizing OracleDocLoader is its capability to process over 150 distinct file formats, eliminating the need for multiple loaders for different document types. For a complete list of the supported formats, please refer to the [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html).\n",
"\n",
"Below is a sample code snippet that demonstrates how to use OracleDocLoader"
"Below is a sample code snippet that demonstrates how to use OracleDocLoader:"
"Now that the user loaded the documents, they may want to generate a summary for each document. The Oracle AI Vector Search Langchain library offers a suite of APIs designed for document summarization. It supports multiple summarization providers such as Database, OCIGENAI, HuggingFace, among others, allowing users to select the provider that best meets their needs. To utilize these capabilities, users must configure the summary parameters as specified. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-EC9DDB58-6A15-4B36-BA66-ECBA20D2CE57)."
"Now that you have loaded the documents, you may want to generate a summary for each document. The Oracle AI Vector Search LangChain library offers a suite of APIs designed for document summarization. It supports multiple summarization providers such as Database, Oracle Generative AI Service, HuggingFace, among others, allowing you to select the provider that best meets their needs. To utilize these capabilities, you must configure the summary parameters as specified. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-EC9DDB58-6A15-4B36-BA66-ECBA20D2CE57)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Note:*** The users may need to set proxy if they want to use some 3rd party summary generation providers other than Oracle's in-house and default provider: 'database'. If you don't have proxy, please remove the proxy parameter when you instantiate the OracleSummary."
"***Note:*** You may need to set proxy if you want to use some 3rd party summary generation providers other than Oracle's in-house and default provider: 'database'. If you don't have proxy, please remove the proxy parameter when you instantiate the OracleSummary."
]
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# proxy to be used when we instantiate summary and embedder object\n",
"# proxy to be used when we instantiate summary and embedder objects\n",
"proxy = \"\""
]
},
@@ -433,24 +347,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The following sample code will show how to generate summary:"
"The following sample code shows how to generate a summary:"
"Now that the documents are chunked as per requirements, the users may want to generate embeddings for these chunks. Oracle AI Vector Search provides multiple methods for generating embeddings, utilizing either locally hosted ONNX models or third-party APIs. For comprehensive instructions on configuring these alternatives, please refer to the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-C6439E94-4E86-4ECD-954E-4B73D53579DE)."
"Now that the documents are chunked as per requirements, you may want to generate embeddings for these chunks. Oracle AI Vector Search provides multiple methods for generating embeddings, utilizing either locally hosted ONNX models or third-party APIs. For comprehensive instructions on configuring these alternatives, please refer to the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-C6439E94-4E86-4ECD-954E-4B73D53579DE)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Note:*** Users may need to configure a proxy to utilize third-party embedding generation providers, excluding the 'database' provider that utilizes an ONNX model."
"***Note:*** You may need to configure a proxy to utilize third-party embedding generation providers, excluding the 'database' provider that utilizes an ONNX model."
]
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -547,24 +445,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The following sample code will show how to generate embeddings:"
"The following sample code shows how to generate embeddings:"
"Now that you know how to use Oracle AI Langchain library APIs individually to process the documents, let us show how to integrate with Oracle AI Vector Store to facilitate the semantic searches."
"Now that you know how to use Oracle AI LangChain library APIs individually to process the documents, let us show how to integrate with Oracle AI Vector Store to facilitate the semantic searches."
"This example demonstrates the creation of a default HNSW index on embeddings within the 'oravs' table. Users may adjust various parameters according to their specific needs. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/manage-different-categories-vector-indexes.html).\n",
"This example demonstrates the creation of a default HNSW index on embeddings within the 'oravs' table. You may adjust various parameters according to your specific needs. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/manage-different-categories-vector-indexes.html).\n",
"\n",
"Additionally, various types of vector indices can be created to meet diverse requirements. More details can be found in our [comprehensive guide](https://python.langchain.com/v0.1/docs/integrations/vectorstores/oracle/).\n"
]
@@ -805,44 +667,31 @@
"## Perform Semantic Search\n",
"All set!\n",
"\n",
"We have successfully processed the documents and stored them in the vector store, followed by the creation of an index to enhance query performance. We are now prepared to proceed with semantic searches.\n",
"You have successfully processed the documents and stored them in the vector store, followed by the creation of an index to enhance query performance. You are now prepared to proceed with semantic searches.\n",
"\n",
"Below is the sample code for this process:"
]
},
{
"cell_type": "code",
"execution_count": 58,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Document(page_content='The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table. Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.', metadata={'_oid': '662f2f257677f3c2311a8ff999fd34e5', '_rowid': 'AAAR/xAAEAAAAAnAAC', 'id': '662f2f257677f3c2311a8ff999fd34e5$3$1', 'document_id': '3', 'document_summary': 'Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\\n\\n'})]\n",
"[]\n",
"[(Document(page_content='The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table. Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.', metadata={'_oid': '662f2f257677f3c2311a8ff999fd34e5', '_rowid': 'AAAR/xAAEAAAAAnAAC', 'id': '662f2f257677f3c2311a8ff999fd34e5$3$1', 'document_id': '3', 'document_summary': 'Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\\n\\n'}), 0.055675752460956573)]\n",
"[]\n",
"[Document(page_content='If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.', metadata={'_oid': '662f2f253acf96b33b430b88699490a2', '_rowid': 'AAAR/xAAEAAAAAnAAA', 'id': '662f2f253acf96b33b430b88699490a2$1$1', 'document_id': '1', 'document_summary': 'If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\\n\\n'})]\n",
"[Document(page_content='If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.', metadata={'_oid': '662f2f253acf96b33b430b88699490a2', '_rowid': 'AAAR/xAAEAAAAAnAAA', 'id': '662f2f253acf96b33b430b88699490a2$1$1', 'document_id': '1', 'document_summary': 'If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\\n\\n'})]\n"
"# RAG Pipeline with MLflow Tracking, Tracing & Evaluation\n",
"\n",
"This notebook demonstrates how to build a complete Retrieval-Augmented Generation (RAG) pipeline using LangChain and integrate it with MLflow for experiment tracking, tracing, and evaluation.\n",
"\n",
"\n",
"- **RAG Pipeline Construction**: Build a complete RAG system using LangChain components\n",
"- **MLflow Integration**: Track experiments, parameters, and artifacts\n",
" \"system_prompt\": \"You are a helpful assistant. Use the following context to answer the question. Use three sentences maximum and keep the answer concise.\",\n",
" \"llm\": \"gpt-5-nano\",\n",
" \"temperature\": 0,\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "8a2985f1",
"metadata": {},
"source": [
"#### ArXiv Dcoument Loading and Processing"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "1f32aa36",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'Published': '2023-08-02', 'Title': 'Attention Is All You Need', 'Authors': 'Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin', 'Summary': 'The dominant sequence transduction models are based on complex recurrent or\\nconvolutional neural networks in an encoder-decoder configuration. The best\\nperforming models also connect the encoder and decoder through an attention\\nmechanism. We propose a new simple network architecture, the Transformer, based\\nsolely on attention mechanisms, dispensing with recurrence and convolutions\\nentirely. Experiments on two machine translation tasks show these models to be\\nsuperior in quality while being more parallelizable and requiring significantly\\nless time to train. Our model achieves 28.4 BLEU on the WMT 2014\\nEnglish-to-German translation task, improving over the existing best results,\\nincluding ensembles by over 2 BLEU. On the WMT 2014 English-to-French\\ntranslation task, our model establishes a new single-model state-of-the-art\\nBLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction\\nof the training costs of the best models from the literature. We show that the\\nTransformer generalizes well to other tasks by applying it successfully to\\nEnglish constituency parsing both with large and limited training data.'}\n"
]
}
],
"source": [
"# Load documents from ArXiv\n",
"loader = ArxivLoader(\n",
" query=\"1706.03762\",\n",
" load_max_docs=1,\n",
")\n",
"docs = loader.load()\n",
"print(docs[0].metadata)\n",
"\n",
"# Split documents into chunks\n",
"splitter = RecursiveCharacterTextSplitter(\n",
" chunk_size=CONFIG[\"chunk_size\"],\n",
" chunk_overlap=CONFIG[\"chunk_overlap\"],\n",
")\n",
"chunks = splitter.split_documents(docs)\n",
"\n",
"\n",
"# Join chunks into a single string\n",
"def join_chunks(chunks):\n",
" return \"\\n\\n\".join([chunk.page_content for chunk in chunks])"
"Create a prediction function decorated with `@mlflow.trace` to automatically log:\n",
"- Input queries\n",
"- Retrieved documents\n",
"- Generated responses\n",
"- Execution time\n",
"- Chain intermediate steps"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "7b45fc04",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Question: What is the main idea of the paper?\n",
"Response: The main idea is to replace recurrent/convolutional sequence models with a pure attention-based architecture called the Transformer. It uses self-attention to model dependencies between all positions in the input and output, enabling full parallelization and better handling of long-range relations. This approach achieves strong results on translation and can extend to other modalities.\n"
]
}
],
"source": [
"@mlflow.trace\n",
"def predict_fn(question: str) -> str:\n",
" return rag_chain.invoke(question)\n",
"\n",
"\n",
"# Test the prediction function\n",
"sample_question = \"What is the main idea of the paper?\"\n",
"response = predict_fn(sample_question)\n",
"print(f\"Question: {sample_question}\")\n",
"print(f\"Response: {response}\")"
]
},
{
"cell_type": "markdown",
"id": "421469de",
"metadata": {},
"source": [
"#### Evaluation Dataset and Scoring\n",
"\n",
"Define an evaluation dataset and run systematic evaluation using [MLflow's built-in scorers](https://mlflow.org/docs/latest/genai/eval-monitor/scorers/llm-judge/predefined/#available-scorers):\n",
"\n",
"<u>Evaluation Components:</u>\n",
"- **Dataset**: Questions with expected concepts and facts\n",
"- **Scorers**: \n",
" - `RelevanceToQuery`: Measures how relevant the response is to the question\n",
" - `Correctness`: Evaluates factual accuracy of the response\n",
" - `ExpectationsGuidelines`: Checks that output matches expectation guidelines\n",
"\n",
"<u>Best Practices:</u>\n",
"- Create diverse test cases covering different query types\n",
"- Include expected concepts to guide evaluation\n",
"- Use multiple scoring metrics for comprehensive assessment"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "5c1dc4f2",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2025/08/23 20:14:39 INFO mlflow.models.evaluation.utils.trace: Auto tracing is temporarily enabled during the model evaluation for computing some metrics and debugging. To disable tracing, call `mlflow.autolog(disable=True)`.\n",
"2025/08/23 20:14:39 INFO mlflow.genai.utils.data_validation: Testing model prediction with the first sample in the dataset.\n"
_DEFAULT_TEMPLATE = """Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer. Unless the user specifies in his question a specific number of examples he wishes to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for a the few relevant columns given the question.
Never query for all the columns from a specific table, only ask for a few relevant columns given the question.
Pay attention to use only the column names that you can see in the schema description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
" AIMessage(content='The result of \\\\(3 + 5^{2.743}\\\\) is approximately 300.04, and the result of \\\\(17.24 - 918.1241\\\\) is approximately -900.88.', response_metadata={'token_usage': {'completion_tokens': 44, 'prompt_tokens': 251, 'total_tokens': 295}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_b28b39ffa8', 'finish_reason': 'stop', 'logprobs': None}, id='run-d1161669-ed09-4b18-94bd-6d8530df5aa8-0')]}"
"{'messages': [HumanMessage(content=\"what's 3 plus 5 raised to the 2.743. also what's 17.24 - 918.1241\", additional_kwargs={}, response_metadata={}),\n",
"{'messages': [HumanMessage(content=\"what's 3 plus 5 raised to the 2.743. also what's 17.24 - 918.1241\", additional_kwargs={}, response_metadata={}),\n",
" AIMessage(content=[{'text': \"I'll solve these calculations for you.\\n\\nFor the first part, I need to calculate 3 plus 5 raised to the power of 2.743.\\n\\nLet me break this down:\\n1) First, I'll calculate 5 raised to the power of 2.743\\n2) Then add 3 to the result\", 'type': 'text'}, {'id': 'toolu_01L1mXysBQtpPLQ2AZTaCGmE', 'input': {'x': 5, 'y': 2.743}, 'name': 'exponentiate', 'type': 'tool_use'}], additional_kwargs={}, response_metadata={'id': 'msg_01HCbDmuzdg9ATMyKbnecbEE', 'model': 'claude-3-7-sonnet-20250219', 'stop_reason': 'tool_use', 'stop_sequence': None, 'usage': {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 563, 'output_tokens': 146, 'server_tool_use': None, 'service_tier': 'standard'}, 'model_name': 'claude-3-7-sonnet-20250219'}, id='run--9f6469fb-bcbb-4c1c-9eec-79f6979c38e6-0', tool_calls=[{'name': 'exponentiate', 'args': {'x': 5, 'y': 2.743}, 'id': 'toolu_01L1mXysBQtpPLQ2AZTaCGmE', 'type': 'tool_call'}], usage_metadata={'input_tokens': 563, 'output_tokens': 146, 'total_tokens': 709, 'input_token_details': {'cache_read': 0, 'cache_creation': 0}}),\n",
"/data1/cwlacewe/apps/cwlacewe_langchain/.langchain-venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
"\t\tThere are 2 shoppers in this video. Shopper 1 is wearing a plaid shirt and a spectacle. Shopper 2 who is not completely captured in the frame seems to wear a black shirt and is moving away with his back turned towards the camera. There is a shelf towards the right of the camera frame. Shopper 2 is hanging an item back to a hanger and then quickly walks away in a similar fashion as shopper 2. Contents of the nearer side of the shelf with respect to camera seems to be camping lanterns and cleansing agents, arranged at the top. In the middle part of the shelf, various tools including grommets, a pocket saw, candles, and other helpful camping items can be observed. Midway through the shelf contains items which appear to be steel containers and items made up of plastic with red, green, orange, and yellow colors, while those at the bottom are packed in cardboard boxes. Contents at the farther part of the shelf are well stocked and organized but are not glaringly visible.\n",
"WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the cpu.\n"
]
}
],
"source": [
@@ -555,7 +558,7 @@
"\t\tA single shopper is seen in this video standing facing the shelf and in the bottom part of the frame. He's wearing a light-colored shirt and a spectacle. The shopper is carrying a red colored basket in his left hand. The entire basket is not clearly visible, but it does seem to contain something in a blue colored package which the shopper has just placed in the basket given his right hand was seen inside the basket. Then the shopper leans towards the shelf and checks out an item in orange package. He picks this single item with his right hand and proceeds to place the item in the basket. The entire shelf looks well stocked except for the top part of the shelf which is empty. The shopper has not picked any item from this part of the shelf. The rest of the shelf looks well stocked and does not need any restocking. The contents on the farther part of the shelf consists of items, majority of which are packed in black, yellow, and green packages. No other details are visible of these items.\n",
"User : Find a man holding a red shopping basket\n",
"Assistant : Most relevant retrieved video is **clip9.mp4** \n",
"\n",
"I see a person standing in front of a well-stocked shelf, they are wearing a light-colored shirt and glasses, and they have a red shopping basket in their left hand. They are leaning forward and picking up an item from the shelf with their right hand. The item is packaged in a blue-green box. Based on the scene description, I can confirm that the person is indeed holding a red shopping basket.</s>\n"
"I see a person standing in front of a well-stocked shelf, they are wearing a light-colored shirt and glasses, and they have a red shopping basket in their left hand. They are leaning forward and picking up an item from the shelf with their right hand. The item is packaged in a blue-green box. Based on the available information, I cannot confirm whether the basket is empty or contains items. However, the rest of the\n"
For more information on contributing to our documentation, see the [Documentation Contributing Guide](https://python.langchain.com/docs/contributing/how_to/documentation)
For more information on contributing to our documentation, see the [Documentation Contributing Guide](https://python.langchain.com/docs/contributing/how_to/documentation).
## Structure
The primary documentation is located in the `docs/` directory. This directory contains
both the source files for the main documentation as well as the API reference doc
build process.
### API Reference
API reference documentation is located in `docs/api_reference/` and is generated from
the codebase using Sphinx.
The API reference have additional build steps that differ from the main documentation.
#### Deployment Process
Currently, the build process roughly follows these steps:
1. Using the `api_doc_build.yml` GitHub workflow, the API reference docs are
[built](#build-technical-details) and copied to the `langchain-api-docs-html`
repository. This workflow is triggered either (1) on a cron routine interval or (2)
triggered manually.
In short, the workflow extracts all `langchain-ai`-org-owned repos defined in
`langchain/libs/packages.yml`, clones them locally (in the workflow runner's file
system), and then builds the API reference RST files (using `create_api_rst.py`).
Following post-processing, the HTML files are pushed to the
`langchain-api-docs-html` repository.
2. After the HTML files are in the `langchain-api-docs-html` repository, they are **not**
automatically published to the [live docs site](https://python.langchain.com/api_reference/).
The docs site is served by Vercel. The Vercel deployment process copies the HTML
files from the `langchain-api-docs-html` repository and deploys them to the live
site. Deployments are triggered on each new commit pushed to `master`.
#### Build Technical Details
The build process creates a virtual monorepo by syncing multiple repositories, then generates comprehensive API documentation:
1.**Repository Sync Phase:**
-`.github/scripts/prep_api_docs_build.py` - Clones external partner repos and organizes them into the `libs/partners/` structure to create a virtual monorepo for documentation building
2.**RST Generation Phase:**
-`docs/api_reference/create_api_rst.py` - Main script that **generates RST files** from Python source code
- Scans `libs/` directories and extracts classes/functions from each module (using `inspect`)
- Creates `.rst` files using specialized templates for different object types
- Templates in `docs/api_reference/templates/` (`pydantic.rst`, `runnable_pydantic.rst`, etc.)
3.**HTML Build Phase:**
- Sphinx-based, uses `sphinx.ext.autodoc` (auto-extracts docstrings from the codebase)
-`docs/api_reference/conf.py` (sphinx config) configures `autodoc` and other extensions
-`sphinx-build` processes the generated `.rst` files into HTML using autodoc
-`docs/api_reference/scripts/custom_formatter.py` - Post-processes the generated HTML
- Copies `reference.html` to `index.html` to create the default landing page (artifact? might not need to do this - just put everyhing in index directly?)
4.**Deployment:**
-`.github/workflows/api_doc_build.yml` - Workflow responsible for orchestrating the entire build and deployment process
- Built HTML files are committed and pushed to the `langchain-api-docs-html` repository
#### Local Build
For local development and testing of API documentation, use the Makefile targets in the repository root:
```bash
# Full build
make api_docs_build
```
Like the CI process, this target:
- Installs the CLI package in editable mode
- Generates RST files for all packages using `create_api_rst.py`
- Builds HTML documentation with Sphinx
- Post-processes the HTML with `custom_formatter.py`
- Opens the built documentation (`reference.html`) in your browser
**Quick Preview:**
```bash
make api_docs_quick_preview API_PKG=openai
```
- Generates RST files for only the specified package (default: `text-splitters`)
- Builds and post-processes HTML documentation
- Opens the preview in your browser
Both targets automatically clean previous builds and handle the complete build pipeline locally, mirroring the CI process but for faster iteration during development.
#### Documentation Standards
**Docstring Format:**
The API reference uses **Google-style docstrings** with reStructuredText markup. Sphinx processes these through the `sphinx.ext.napoleon` extension to generate documentation.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.