This pull request includes updates to the
`libs/community/langchain_community/callbacks/bedrock_anthropic_callback.py`
file to add a new model version to the list of supported models.
Updates to supported models:
* Added support for the `anthropic.claude-3-7-sonnet-20250219-v1:0`
model with a rate of `0.003` for 1000 input tokens.
* Added support for the `anthropic.claude-3-7-sonnet-20250219-v1:0`
model with a rate of `0.015` for 1000 output tokens.
AWS Bedrock pricing reference : https://aws.amazon.com/bedrock/pricing
## PyMuPDF4LLM integration to LangChain for PDF content extraction in
Markdown format
### Description
[PyMuPDF4LLM](https://github.com/pymupdf/RAG) makes it easier to extract
PDF content in Markdown format, needed for LLM & RAG applications.
(License: GNU Affero General Public License v3.0)
[langchain-pymupdf4llm](https://github.com/lakinduboteju/langchain-pymupdf4llm)
integrates PyMuPDF4LLM to LangChain as a Document Loader.
(License: MIT License)
This pull request introduces the integration of
[PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm) into
the LangChain project as an integration package:
[`langchain-pymupdf4llm`](https://github.com/lakinduboteju/langchain-pymupdf4llm).
The most important changes include adding new Jupyter notebooks to
document the integration and updating the package configuration file to
include the new package.
### Documentation:
* `docs/docs/integrations/providers/pymupdf4llm.ipynb`: Added a new
Jupyter notebook to document the integration of `PyMuPDF4LLM` with
LangChain, including installation instructions and class imports.
* `docs/docs/integrations/document_loaders/pymupdf4llm.ipynb`: Added a
new Jupyter notebook to document the usage of `langchain-pymupdf4llm` as
a LangChain integration package in detail.
### Package registration:
* `libs/packages.yml`: Updated the package configuration file to include
the `langchain-pymupdf4llm` package.
### Additional information
* Related to: https://github.com/langchain-ai/langchain/pull/29848
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** Same changes as #26593 but for FileCallbackHandler
- **Issue:** Fixes#29941
- **Dependencies:** None
- **Twitter handle:** None
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
See https://docs.astral.sh/ruff/rules/#flake8-type-checking-tc
Some fixes done for TC001,TC002 and TC003 but these rules are excluded
since they don't play well with Pydantic.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Issue**: This trigger can only be used by the first table created.
Cannot create additional triggers for other tables.
**fixed**: Update the trigger name so that it can be used for new
tables.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:**
Tavily search results returned from API include useful information like
title, score and (optionally) raw_content that is missed in wrapper
although it's documented there properly. Add this data to the result
structure.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Resolves https://github.com/langchain-ai/langchain/issues/29951
Was able to reproduce the issue with Anthropic installing from pydantic
`main` and correct it with the fix recommended in the issue.
Thanks very much @Viicos for finding the bug and the detailed writeup!
Resolves https://github.com/langchain-ai/langchain/issues/29003,
https://github.com/langchain-ai/langchain/issues/27264
Related: https://github.com/langchain-ai/langchain-redis/issues/52
```python
from langchain.chat_models import init_chat_model
from langchain.globals import set_llm_cache
from langchain_community.cache import SQLiteCache
from pydantic import BaseModel
cache = SQLiteCache()
set_llm_cache(cache)
class Temperature(BaseModel):
value: int
city: str
llm = init_chat_model("openai:gpt-4o-mini")
structured_llm = llm.with_structured_output(Temperature)
```
```python
# 681 ms
response = structured_llm.invoke("What is the average temperature of Rome in May?")
```
```python
# 6.98 ms
response = structured_llm.invoke("What is the average temperature of Rome in May?")
```
Some o-series models will raise a 400 error for `"role": "system"`
(`o1-mini` and `o1-preview` will raise, `o1` and `o3-mini` will not).
Here we update `ChatOpenAI` to update the role to `"developer"` for all
model names matching `^o\d`.
We only make this change on the ChatOpenAI class (not BaseChatOpenAI).
For Context please check #29626
The Deepseek is using langchain_openai. The error happens that it show
`json decode error`.
I added a handler for this to give a more sensible error message which
is DeepSeek API returned empty/invalid json.
Reproducing the issue is a bit challenging as it is inconsistent,
sometimes DeepSeek returns valid data and in other times it returns
invalid data which triggers the JSON Decode Error.
This PR is an exception handling, but not an ultimate fix for the issue.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:** As commented on the commit
[41b6a86](41b6a86bbe)
it introduced a bug for when we do an embedding request and the model
returns a non-nested list. Typically it's the case for model
**_nomic-embed-text_**.
- I added the unit test, and ran `make format`, `make lint` and `make
test` from the `community` package.
- No new dependency.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- [x] **PR title**: docs: (community) update ChatLiteLLM
- [x] **PR message**:
- **Description:** updated description of model_kwargs parameter which
was wrongly describing for temperature.
- **Issue:** #29862
- **Dependencies:** N/A
- [x] **Add tests and docs**: N/A
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
See https://docs.astral.sh/ruff/rules/#flake8-annotations-ann
The interest compared to only mypy is that ruff is very fast at
detecting missing annotations.
ANN101 and ANN102 are deprecated so we ignore them
ANN401 (no Any type) ignored to be in sync with mypy config
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
## Which area of LangChain is being modified?
- This PR adds a new "Permit" integration to the `docs/integrations/`
folder.
- Introduces two new Tools (`LangchainJWTValidationTool` and
`LangchainPermissionsCheckTool`)
- Introduces two new Retrievers (`PermitSelfQueryRetriever` and
`PermitEnsembleRetriever`)
- Adds demo scripts in `examples/` showcasing usage.
## Description of Changes
- Created `langchain_permit/tools.py` for JWT validation and permission
checks with Permit.
- Created `langchain_permit/retrievers.py` for custom Permit-based
retrievers.
- Added documentation in `docs/integrations/providers/permit.ipynb` (or
`.mdx`) to explain setup, usage, and examples.
- Provided sample scripts in `examples/demo_scripts/` to illustrate
usage of these tools and retrievers.
- Ensured all code is linted and tested locally.
Thank you again for reviewing!
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:**
Since mlx_lm 0.20, all calls to mlx crash due to deprecation of the way
parameters are passed to methods generate and generate_step.
Parameters top_p, temp, repetition_penalty and repetition_context_size
are not passed directly to those method anymore but wrapped into
"sampler" and "logit_processor".
- **Dependencies:** mlx_lm (optional)
- **Tests:**
I've had a new test to existing test file:
tests/integration_tests/llms/test_mlx_pipeline.py
---------
Co-authored-by: Jean-Philippe Dournel <jp@insightkeeper.io>
# community: Fix AttributeError in RankLLMRerank (`list` object has no
attribute `candidates`)
## **Description**
This PR fixes an issue in `RankLLMRerank` where reranking fails with the
following error:
```
AttributeError: 'list' object has no attribute 'candidates'
```
The issue arises because `rerank_batch()` returns a `List[Result]`
instead of an object containing `.candidates`.
### **Changes Introduced**
- Adjusted `compress_documents()` to support both:
- Old API format: `rerank_results.candidates`
- New API format: `rerank_results` as a list
- Also fix wrong .txt location parsing while I was at it.
---
## **Issue**
Fixes **AttributeError** in `RankLLMRerank` when using
`compression_retriever.invoke()`. The issue is observed when
`rerank_batch()` returns a list instead of an object with `.candidates`.
**Relevant log:**
```
AttributeError: 'list' object has no attribute 'candidates'
```
## **Dependencies**
- No additional dependencies introduced.
---
## **Checklist**
- [x] **Backward compatible** with previous API versions
- [x] **Tested** locally with different RankLLM models
- [x] **No new dependencies introduced**
- [x] **Linted** with `make format && make lint`
- [x] **Ready for review**
---
## **Testing**
- Ran `compression_retriever.invoke(query)`
## **Reviewers**
If no review within a few days, please **@mention** one of:
- @baskaryan
- @efriis
- @eyurtsev
- @ccurme
- @vbarda
- @hwchase17
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This PR adds a new cognee integration, knowledge graph based retrieval
enabling developers to ingest documents into cognee’s knowledge graph,
process them, and then retrieve context via CogneeRetriever.
It includes:
- langchain_cognee package with a CogneeRetriever class
- a test for the integration, demonstrating how to create, process, and
retrieve with cognee
- an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
Followed additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
Thank you for the review!
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:** Two small changes have been proposed here:
(1)
Previous code assumes that every issue has a priority field. If an issue
lacks this field, the code will raise a KeyError.
Now, the code checks if priority exists before accessing it. If priority
is missing, it assigns None instead of crashing. This prevents runtime
errors when processing issues without a priority.
(2)
Also If the "style" field is missing, the code throws a KeyError.
`.get("style", None)` safely retrieves the value if present.
**Issue:** #29875
**Dependencies:** N/A
Thank you for contributing to LangChain!
- [ ] **Handled query records properly**: "community:
vectorstores/kinetica"
- [ ] **Bugfix for empty query results handling**:
- **Description:** checked for the number of records returned by a query
before processing further
- **Issue:** resulted in an `AttributeError` earlier which has now been
fixed
@efriis
This PR adds documentation for the Azure AI package in Langchain to the
main mono-repo
No issue connected or updated dependencies.
Utilises existing tests and makes updates to the docs
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:** Update docstring for `reasoning_effort` argument to
specify that it applies to reasoning models only (e.g., OpenAI o1 and
o3-mini), clarifying its supported models.
**Issue:** None
**Dependencies:** None
Adds a `attachment_filter_func` parameter to the ConfluenceLoader class
which can be used to determine which files are indexed. This is useful
if you are interested in excluding files based on their media type or
other metadata.
https://docs.x.ai/docs/guides/structured-outputs
Interface appears identical to OpenAI's.
```python
from langchain.chat_models import init_chat_model
from pydantic import BaseModel
class Joke(BaseModel):
setup: str
punchline: str
llm = init_chat_model("xai:grok-2").with_structured_output(
Joke, method="json_schema"
)
llm.invoke("Tell me a joke about cats.")
```
- **Description:** add deprecation warning when using weaviate from
langchain_community
- **Issue:** NA
- **Dependencies:** NA
- **Twitter handle:** NA
---------
Signed-off-by: hsm207 <hsm207@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Add `model` properties for OpenAIWhisperParser. Defaulted to `whisper-1`
(previous value).
Please help me update the docs and other related components of this
repo.
**Description:**
This PR adds a Jupyter notebook that explains the features,
installation, and usage of the
[`langchain-salesforce`](https://github.com/colesmcintosh/langchain-salesforce)
package. The notebook includes:
- Setup instructions for configuring Salesforce credentials
- Example code demonstrating common operations such as querying,
describing objects, creating, updating, and deleting records
**Issue:**
N/A
**Dependencies:**
No new dependencies are required.
**Tests and Docs:**
- Added an example notebook demonstrating the usage of the
`langchain-salesforce` package, located in `docs/docs/integrations`.
**Lint and Test:**
- Ran `make format`, `make lint`, and `make test` successfully.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [X] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [x] **PR message**:
This PR adds top_k as a param to the Needle Retriever. By default we use
top 10.
- [X] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- ** Description**: I have added a new operator in the operator map with
key `$in` and value `IN`, so that you can define filters using lists as
values. This was already contemplated but as IN operator was not in the
map they cannot be used.
- **Issue**: Fixes#29804.
- **Dependencies**: No extra.
This PR adds documentation for the `langchain-discord-shikenso`
integration, including an example notebook at
`docs/docs/integrations/tools/discord.ipynb` and updates to
`libs/packages.yml` to track the new package.
**Issue:**
N/A
**Dependencies:**
None
**Twitter handle:**
N/A
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- [ ] **PR title**: langchain_community: add image support to
DuckDuckGoSearchAPIWrapper
- **Description:** This PR enhances the DuckDuckGoSearchAPIWrapper
within the langchain_community package by introducing support for image
searches. The enhancement includes:
- Adding a new method _ddgs_images to handle image search queries.
- Updating the run and results methods to process and return image
search results appropriately.
- Modifying the source parameter to accept "images" as a valid option,
alongside "text" and "news".
- **Dependencies:** No additional dependencies are required for this
change.
- **Description:** Add the new introduction about checking `store` in
in_memory.py, It’s necessary and useful for beginners.
```python
Check Documents:
.. code-block:: python
for doc in vector_store.store.values():
print(doc)
```
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:** Fixed and updated Apify integration documentation to
use the new [langchain-apify](https://github.com/apify/langchain-apify)
package.
**Twitter handle:** @apify
- **Description:** Small fix in `add_texts` to make embedding
nullability is checked properly.
- **Issue:** #29765
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This fix ensures that the chunk size is correctly determined when
processing text embeddings. Previously, the code did not properly handle
cases where chunk_size was None, potentially leading to incorrect
chunking behavior.
Now, chunk_size_ is explicitly set to either the provided chunk_size or
the default self.chunk_size, ensuring consistent chunking. This update
improves reliability when processing large text inputs in batches and
prevents unintended behavior when chunk_size is not specified.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Add the documentation for the community package `langchain-abso`. It
provides a new Chat Model class, that uses https://abso.ai
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** Adding Structured Support for ChatPerplexity
- **Issue:** #29357
- This is implemented as per the Perplexity official docs:
https://docs.perplexity.ai/guides/structured-outputs
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
**Description:** Updated init_chat_model to support Granite models
deployed on IBM WatsonX
**Dependencies:**
[langchain-ibm](https://github.com/langchain-ai/langchain-ibm)
Tagging @baskaryan @efriis for review when you get a chance.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
community: langchain_community/vectorstore/oraclevs.py
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Refactored code to allow a connection or a connection
pool.
- **Issue:** Normally an idel connection is terminated by the server
side listener at timeout. A user thus has to re-instantiate the vector
store. The timeout in case of connection is not configurable. The
solution is to use a connection pool where a user can specify a user
defined timeout and the connections are managed by the pool.
- **Dependencies:** None
- **Twitter handle:**
- [ ] **Add tests and docs**: This is not a new integration. A user can
pass either a connection or a connection pool. The determination of what
is passed is made at run time. Everything should work as before.
- [ ] **Lint and test**: Already done.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
1. Make `_convert_chunk_to_generation_chunk` an instance method on
BaseChatOpenAI
2. Override on ChatDeepSeek to add `"reasoning_content"` to message
additional_kwargs.
Resolves https://github.com/langchain-ai/langchain/issues/29513
**Description:**
According to the [wikidata
documentation](https://www.wikidata.org/wiki/Wikidata_talk:REST_API),
Wikibase REST API version 1 (stable) is released from November 11, 2024.
Their guide is to use the new v1 API and, it just requires replacing v0
in the routes with v1 in almost all cases.
So I replaced WIKIDATA_REST_API_URL from v0 to v1 for stable usage.
Co-authored-by: ccurme <chester.curme@gmail.com>
**issue**
In Langchain, the original content is generally stored under the `text`
key. However, the `PineconeHybridSearchRetriever` searches the `context`
field in the metadata and cannot change this key. To address this, I
have modified the code to allow changing the key to something other than
context.
In my opinion, following Langchain's conventions, the `text` key seems
more appropriate than `context`. However, since I wasn't sure about the
author's intent, I have left the default value as `context`.
- Description: Adding getattr methods and set default value 500 to
cls.bulk_size, it can prevent the error below:
Error: type object 'OpenSearchVectorSearch' has no attribute 'bulk_size'
- Issue: https://github.com/langchain-ai/langchain/issues/29071
Description:
This PR fixes handling of null action_input in
[langchain.agents.output_parser]. Previously, passing null to
action_input could cause OutputParserException with unclear error
message which cause LLM don't know how to modify the action. The changes
include:
Added null-check validation before processing action_input
Implemented proper fallback behavior with default values
Maintained backward compatibility with existing implementations
Error Examples:
```
{
"action":"some action",
"action_input":null
}
```
Issue:
None
Dependencies:
None
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once. This specific part focuses on updating the
PyPDFium2 parser.
For more details, see
https://github.com/langchain-ai/langchain/pull/28970.
Same as https://github.com/langchain-ai/langchain/pull/29043 for the
langchain package.
**Dependencies:**
- blockbuster (test)
**Twitter handle:** cbornet_
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Add tests for respecting max_concurrency and
implement it for abatch_as_completed so that test passes
- **Issue:** #29425
- **Dependencies:** none
- **Twitter handle:** keenanpepper
Description:
The change allows you to use the overloaded `+` operator correctly when
`+`ing two BaseMessageChunk subclasses. Without this you *must*
instantiate a subclass for it to work.
Which feels... wrong. Base classes should be decoupled from sub classes
and should have in no way a dependency on them.
Issue:
You can't `+` a BaseMessageChunk with a BaseMessageChunk
e.g. this will explode
```py
from langchain_core.outputs import (
ChatGenerationChunk,
)
from langchain_core.messages import BaseMessageChunk
chunk1 = ChatGenerationChunk(
message=BaseMessageChunk(
type="customChunk",
content="HI",
),
)
chunk2 = ChatGenerationChunk(
message=BaseMessageChunk(
type="customChunk",
content="HI",
),
)
# this will throw
new_chunk = chunk1 + chunk2
```
In case anyone ran into this issue themselves, it's probably best to use
the AIMessageChunk:
a la
```py
from langchain_core.outputs import (
ChatGenerationChunk,
)
from langchain_core.messages import AIMessageChunk
chunk1 = ChatGenerationChunk(
message=AIMessageChunk(
content="HI",
),
)
chunk2 = ChatGenerationChunk(
message=AIMessageChunk(
content="HI",
),
)
# No explosion!
new_chunk = chunk1 + chunk2
```
Dependencies:
None!
Twitter handle:
`aaron_vogler`
Keeping these for later if need be:
```
baskaryan
efriis
eyurtsev
ccurme
vbarda
hwchase17
baskaryan
efriis
```
Co-authored-by: Erick Friis <erick@langchain.dev>
- This pull request includes various changes to add a `user_agent`
parameter to Azure OpenAI, Azure Search and Whisper in the Community and
Partner packages. This helps in identifying the source of API requests
so we can better track usage and help support the community better. I
will also be adding the user_agent to the new `langchain-azure` repo as
well.
- No issue connected or updated dependencies.
- Utilises existing tests and docs
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
ONNX and OpenVINO models are available by specifying the `backend`
argument (the model is loaded using `optimum`
https://github.com/huggingface/optimum)
```python
from langchain_huggingface import HuggingFaceEmbeddings
embedding = HuggingFaceEmbeddings(
model_name=model_id,
model_kwargs={"backend": "onnx"},
)
```
With this PR we also enable the IPEX backend
```python
from langchain_huggingface import HuggingFaceEmbeddings
embedding = HuggingFaceEmbeddings(
model_name=model_id,
model_kwargs={"backend": "ipex"},
)
```
**Description**
Currently, when parsing a partial JSON, if a string ends with the escape
character, the whole key/value is removed. For example:
```
>>> from langchain_core.utils.json import parse_partial_json
>>> my_str = '{"foo": "bar", "baz": "qux\\'
>>>
>>> parse_partial_json(my_str)
{'foo': 'bar'}
```
My expectation (and with this fix) would be for `parse_partial_json()`
to return:
```
>>> from langchain_core.utils.json import parse_partial_json
>>>
>>> my_str = '{"foo": "bar", "baz": "qux\\'
>>> parse_partial_json(my_str)
{'foo': 'bar', 'baz': 'qux'}
```
Notes:
1. It could be argued that current behavior is still desired.
2. I have experienced this issue when the streaming output from an LLM
and the chunk happens to end with `\\`
3. I haven't included tests. Will do if change is accepted.
4. This is specially troublesome when this function is used by
187131c55c/libs/core/langchain_core/output_parsers/transform.py (L111)
since what happens is that, for example, if the received sequence of
chunks are: `{"foo": "b` , `ar\\` :
Then, the result of calling `self.parse_result()` is:
```
{"foo": "b"}
```
and the second time:
```
{}
```
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Before sending a completion chunk at the end of an
OpenAI stream, removing the tool_calls as those have already been sent
as chunks.
- **Issue:** -
- **Dependencies:** -
- **Twitter handle:** -
@ccurme as mentioned in another PR
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Added `similarity_search_with_score_by_vector()` function to the
`QdrantVectorStore` class.
It is required when we want to query multiple time with the same
embeddings. It was present in the now deprecated original `Qdrant`
vectorstore implementation, but was absent from the new one. It is also
implemented in a number of others `VectorStore` implementations
I have added tests for this new function
Note that I also argued in this discussion that it should be part of the
general `VectorStore`
https://github.com/langchain-ai/langchain/discussions/29638
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
I made a change to how was implemented the support for GPU in
`FastEmbedEmbeddings` to be more consistent with the existing
implementation `langchain-qdrant` sparse embeddings implementation
It is directly enabling to provide the list of ONNX execution providers:
https://github.com/langchain-ai/langchain/blob/master/libs/partners/qdrant/langchain_qdrant/fastembed_sparse.py#L15
It is a bit less clear to a user that just wants to enable GPU, but
gives more capabilities to work with other execution providers that are
not the `CUDAExecutionProvider`, and is more future proof
Sorry for the disturbance @ccurme
> Nice to see you just moved to `uv`! It is so much nicer to run
format/lint/test! No need to manually rerun the `poetry install` with
all required extras now
These are set in Github workflows, but forgot to add them to most
makefiles for convenience when developing locally.
`uv run` will automatically sync the lock file. Because many of our
development dependencies are local installs, it will pick up version
changes and update the lock file. Passing `--frozen` or setting this
environment variable disables the behavior.
Motivation: dedicated structured output features are becoming more
common, such that integrations can support structured output without
supporting tool calling.
Here we make two changes:
1. Update the `has_structured_output` method to default to True if a
model supports tool calling (in addition to defaulting to True if
`with_structured_output` is overridden).
2. Update structured output tests to engage if `has_structured_output`
is True.
Deep Lake recently released version 4, which introduces significant
architectural changes, including a new on-disk storage format, enhanced
indexing mechanisms, and improved concurrency. However, LangChain's
vector store integration currently does not support Deep Lake v4 due to
breaking API changes.
Previously, the installation command was:
`pip install deeplake[enterprise]`
This installs the latest available version, which now defaults to Deep
Lake v4. Since LangChain's vector store integration is still dependent
on v3, this can lead to compatibility issues when using Deep Lake as a
vector database within LangChain.
To ensure compatibility, the installation command has been updated to:
`pip install deeplake[enterprise]<4.0.0`
This constraint ensures that pip installs the latest available version
of Deep Lake within the v3 series while avoiding the incompatible v4
update.
- **Description:** add a `gpu: bool = False` field to the
`FastEmbedEmbeddings` class which enables to use GPU (through ONNX CUDA
provider) when generating embeddings with any fastembed model. It just
requires the user to install a different dependency and we use a
different provider when instantiating `fastembed.TextEmbedding`
- **Issue:** when generating embeddings for a really large amount of
documents this drastically increase performance (honestly that is a must
have in some situations, you can't just use CPU it is way too slow)
- **Dependencies:** no direct change to dependencies, but internally the
users will need to install `fastembed-gpu` instead of `fastembed`, I
made all the changes to the init function to properly let the user know
which dependency they should install depending on if they enabled `gpu`
or not
cf. fastembed docs about GPU for more details:
https://qdrant.github.io/fastembed/examples/FastEmbed_GPU/
I did not added test because it would require access to a GPU in the
testing environment
### PR Title:
**community: add latest OpenAI models pricing**
### Description:
This PR updates the OpenAI model cost calculation mapping by adding the
latest OpenAI models, **o1 (non-preview)** and **o3-mini**, based on the
pricing listed on the [OpenAI pricing
page](https://platform.openai.com/docs/pricing).
### Changes:
- Added pricing for `o1`, `o1-2024-12-17`, `o1-cached`, and
`o1-2024-12-17-cached` for input tokens.
- Added pricing for `o1-completion` and `o1-2024-12-17-completion` for
output tokens.
- Added pricing for `o3-mini`, `o3-mini-2025-01-31`, `o3-mini-cached`,
and `o3-mini-2025-01-31-cached` for input tokens.
- Added pricing for `o3-mini-completion` and
`o3-mini-2025-01-31-completion` for output tokens.
### Issue:
N/A
### Dependencies:
None
### Testing & Validation:
- No functional changes outside of updating the cost mapping.
- No tests were added or modified.
**Description:**
The response from `tool.invoke()` is always a ToolMessage, with content
and artifact fields, not a tuple.
The tuple is converted to a ToolMessage here
b6ae7ca91d/libs/core/langchain_core/tools/base.py (L726)
**Issue:**
Currently `ToolsIntegrationTests` requires `invoke()` to return a tuple
and so standard tests fail for "content_and_artifact" tools. This fixes
that to check the returned ToolMessage.
This PR also adds a test that now passes.
Description: Fixes PreFilter value handling in Azure Cosmos DB NoSQL
vectorstore. The current implementation fails to handle numeric values
in filter conditions, causing an undefined value variable error. This PR
adds support for numeric, boolean, and NULL values while maintaining the
existing string and list handling.
Changes:
Added handling for numeric types (int/float)
Added boolean value support
Added NULL value handling
Added type validation for unsupported values
Fixed scope of value variable initialization
Issue:
Fixes#29610
Implementation Notes:
No changes to public API
Backwards compatible
Maintains consistent behavior with existing MongoDB-style filtering
Preserves SQL injection prevention through proper value handling
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once. This specific part focuses on updating the XXX
parser.
For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Failing with:
> ValueError: Provider page not found for databricks-langchain. Please
add one at docs/integrations/providers/databricks-langchain.{mdx,ipynb}
**PR title**: "community: Option to pass auth_file_location for
oci_generative_ai"
**Description:** Option to pass auth_file_location, to overwrite config
file default location "~/.oci/config" where profile name configs
present. This is not fixing any issues. Just added optional parameter
called "auth_file_location", which internally supported by any OCI
client including GenerativeAiInferenceClient.
- **Description:** Add to check pad_token_id and eos_token_id of model
config. It seems that this is the same bug as the HuggingFace TGI bug.
It's same bug as #29434
- **Issue:** #29431
- **Dependencies:** none
- **Twitter handle:** tell14
Example code is followings:
```python
from langchain_huggingface.llms import HuggingFacePipeline
hf = HuggingFacePipeline.from_model_id(
model_id="meta-llama/Llama-3.2-3B-Instruct",
task="text-generation",
pipeline_kwargs={"max_new_tokens": 10},
)
from langchain_core.prompts import PromptTemplate
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)
chain = prompt | hf
question = "What is electroencephalography?"
print(chain.invoke({"question": question}))
```
## Description:
This PR addresses issue #29429 by fixing the _wrap_query method in
langchain_community/graphs/age_graph.py. The method now correctly
handles Cypher queries with UNION and EXCEPT operators, ensuring that
the fields in the SQL query are ordered as they appear in the Cypher
query. Additionally, the method now properly handles cases where RETURN
* is not supported.
### Issue: #29429
### Dependencies: None
### Add tests and docs:
Added unit tests in tests/unit_tests/graphs/test_age_graph.py to
validate the changes.
No new integrations were added, so no example notebook is necessary.
Lint and test:
Ran make format, make lint, and make test to ensure code quality and
functionality.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This PR uses the [blockbuster](https://github.com/cbornet/blockbuster)
library in langchain-core to detect blocking calls made in the asyncio
event loop during unit tests.
Avoiding blocking calls is hard as these can be deeply buried in the
code or made in 3rd party libraries.
Blockbuster makes it easier to detect them by raising an exception when
a call is made to a known blocking function (eg: `time.sleep`).
Adding blockbuster allowed to find a blocking call in
`aconfig_with_context` (it ends up calling `get_function_nonlocals`
which loads function code).
**Dependencies:**
- blockbuster (test)
**Twitter handle:** cbornet_
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once.
This specific part focuses on updating the PyPDF parser.
For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).
*Description:**
Updates the YahooFinanceNewsTool to handle the current yfinance news
data structure. The tool was failing with a KeyError due to changes in
the yfinance API's response format. This PR updates the code to
correctly extract news URLs from the new structure.
**Issue:** #29495
**Dependencies:**
No new dependencies required. Works with existing yfinance package.
The changes maintain backwards compatibility while fixing the KeyError
that users were experiencing.
The modified code properly handles the new data structure where:
- News type is now at `content.contentType`
- News URL is now at `content.canonicalUrl.url`
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This pull request addresses an issue with import statements in the
langchain_core/retrievers.py file. The following changes have been made:
Corrected the import for Document from langchain_core.documents.base.
Corrected the import for BaseRetriever from langchain_core.retrievers.
These changes ensure that the SimpleRetriever class can correctly
reference the Document and BaseRetriever classes, improving code
reliability and maintainability.
---------
Co-authored-by: Matheus Torquato <mtorquat@jaguarlandrover.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Description
This PR adds support for MongoDB-style $in operator filtering in the
Supabase vectorstore implementation. Currently, filtering with $in
operators returns no results, even when matching documents exist. This
change properly translates MongoDB-style filters to PostgreSQL syntax,
enabling efficient multi-document filtering.
Changes
Modified similarity_search_by_vector_with_relevance_scores to handle
MongoDB-style $in operators
Added automatic conversion of $in filters to PostgreSQL IN clauses
Preserved original vector type handling and numpy array conversion
Maintained compatibility with existing postgrest filters
Added support for the same filtering in
similarity_search_by_vector_returning_embeddings
Issue
Closes#27932
Implementation Notes
No changes to public API or function signatures
Backwards compatible - behavior unchanged for non-$in filters
More efficient than multiple individual queries for multi-ID searches
Preserves all existing functionality including numpy array conversion
for vector types
Dependencies
None
Additional Notes
The implementation handles proper SQL escaping for filter values
Maintains consistent behavior with other vectorstore implementations
that support MongoDB-style operators
Future extensions could support additional MongoDB-style operators ($gt,
$lt, etc.)
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
allow any credential type in AzureAIDocumentInteligence, not only
`api_key`.
This allows to use any of the credentials types integrated with AD.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Description: Fix TypeError in AzureSearch similarity_search_with_score
by removing search_type from kwargs before passing to underlying
requests.
This resolves issue #29407 where search_type was being incorrectly
passed through to Session.request().
Issue: #29407
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- **Description:** Add to check pad_token_id and eos_token_id of model
config. It seems that this is the same bug as the HuggingFace TGI bug.
In addition, the source code of
libs/partners/huggingface/langchain_huggingface/llms/huggingface_pipeline.py
also requires similar changes.
- **Issue:** #29431
- **Dependencies:** none
- **Twitter handle:** tell14