Thank you for contributing to LangChain!
**Description:** Added the model parameters to be passed in the OpenAI
Assistant. Enabled it at the `OpenAIAssistantV2Runnable` class.
**Issue:** NA
**Dependencies:** None
**Twitter handle:** luizf0992
Thank you for contributing to LangChain!
- **Description:** Add token_usage and model_name metadata to
ChatZhipuAI stream() and astream() response
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** None
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: jianfehuang <jianfehuang@tencent.com>
**Description:**
- Add the `lora_request` parameter to the VLLM class to support LoRA
model configurations. This enhancement allows users to specify LoRA
requests directly when using VLLM, enabling more flexible and efficient
model customization.
**Issue:**
- No existing issue for `lora_adapter` in VLLM. This PR addresses the
need for configuring LoRA requests within the VLLM framework.
- Reference : [Using LoRA Adapters in
vLLM](https://docs.vllm.ai/en/stable/models/lora.html#using-lora-adapters)
**Example Code :**
Before this change, the `lora_request` parameter was not applied
correctly:
```python
ADAPTER_PATH = "/path/of/lora_adapter"
llm = VLLM(model="Bllossom/llama-3.2-Korean-Bllossom-3B",
max_new_tokens=512,
top_k=2,
top_p=0.90,
temperature=0.1,
vllm_kwargs={
"gpu_memory_utilization":0.5,
"enable_lora":True,
"max_model_len":1024,
}
)
print(llm.invoke(
["...prompt_content..."],
lora_request=LoRARequest("lora_adapter", 1, ADAPTER_PATH)
))
```
**Before Change Output:**
```bash
response was not applied lora_request
```
So, I attempted to apply the lora_adapter to
langchain_community.llms.vllm.VLLM.
**current output:**
```bash
response applied lora_request
```
**Dependencies:**
- None
**Lint and test:**
- All tests and lint checks have passed.
---------
Co-authored-by: Um Changyong <changyong.um@sfa.co.kr>
**Description:**
The `aiohttp.ClientSession` is closed at the end of the with statement,
which causes an error during a second call.
The implemented fix is to define the session directly within the with
block, exactly like in the textembed code:
c6350d636e/libs/community/langchain_community/embeddings/textembed.py (L335-L346)
**Issue:** Fix#26932
Co-authored-by: ccurme <chester.curme@gmail.com>
Reopened as a personal repo outside the organization.
## Description
- Naver HyperCLOVA X community package
- Add chat model & embeddings
- Add unit test & integration test
- Add chat model & embeddings docs
- I changed partner
package(https://github.com/langchain-ai/langchain/pull/24252) to
community package on this PR
- Could this
embeddings(https://github.com/langchain-ai/langchain/pull/21890) be
deprecated? We are trying to replace it with embedding
model(**ClovaXEmbeddings**) in this PR.
Twitter handle: None. (if needed, contact with
joonha.jeon@navercorp.com)
---
you can check our previous discussion below:
> one question on namespaces - would it make sense to have these in
.clova namespaces instead of .naver?
I would like to keep it as is, unless it is essential to unify the
package name.
(ClovaX is a branding for the model, and I plan to add other models and
components. They need to be managed as separate classes.)
> also, could you clarify the difference between ClovaEmbeddings and
ClovaXEmbeddings?
There are 3 models that are being serviced by embedding, and all are
supported in the current PR. In addition, all the functionality of CLOVA
Studio that serves actual models, such as distinguishing between test
apps and service apps, is supported. The existing PR does not support
this content because it is hard-coded.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Vadym Barda <vadym@langchain.dev>
**Issue:** : https://github.com/langchain-ai/langchain/issues/22961
**Description:**
Previously, the documentation for `DuckDuckGoSearchResults` said that it
returns a JSON string, however the code returns a regular string that
can't be parsed as is.
for example running
```python
from langchain_community.tools import DuckDuckGoSearchResults
# Create a DuckDuckGo search instance
search = DuckDuckGoSearchResults()
# Invoke the search
result = search.invoke("Obama")
# Print the result
print(result)
# Print the type of the result
print("Result Type:", type(result))
```
will return
```
snippet: Harris will hold a campaign event with former President Barack Obama in Georgia next Thursday, the first time the pair has campaigned side by side, a senior campaign official said. A week from ..., title: Obamas to hit the campaign trail in first joint appearances with Harris, link: https://www.nbcnews.com/politics/2024-election/obamas-hit-campaign-trail-first-joint-appearances-harris-rcna176034, snippet: Item 1 of 3 Former U.S. first lady Michelle Obama and her husband, former U.S. President Barack Obama, stand on stage during Day 2 of the Democratic National Convention (DNC) in Chicago, Illinois ..., title: Obamas set to hit campaign trail with Kamala Harris for first time, link: https://www.reuters.com/world/us/obamas-set-hit-campaign-trail-with-kamala-harris-first-time-2024-10-18/, snippet: Barack and Michelle Obama will make their first campaign appearances alongside Kamala Harris at rallies in Georgia and Michigan. By Reid J. Epstein Reporting from Ashwaubenon, Wis. Here come the ..., title: Harris Will Join Michelle Obama and Barack Obama on Campaign Trail, link: https://www.nytimes.com/2024/10/18/us/politics/kamala-harris-michelle-obama-barack-obama.html, snippet: Obama's leaving office was "a turning point," Mirsky said. "That was the last time anybody felt normal." A few feet over, a 64-year-old physics professor named Eric Swanson who had grown ..., title: Obama's reemergence on the campaign trail for Harris comes as he ..., link: https://www.cnn.com/2024/10/13/politics/obama-campaign-trail-harris-biden/index.html
Result Type: <class 'str'>
```
After the change in this PR, `DuckDuckGoSearchResults` takes an
additional `output_format = "list" | "json" | "string"` ("string" =
current behavior, default). For example, invoking
`DuckDuckGoSearchResults(output_format="list")` return a list of
dictionaries in the format
```
[{'snippet': '...', 'title': '...', 'link': '...'}, ...]
```
e.g.
```
[{'snippet': "Obama has in a sense been wrestling with Trump's impact since the real estate magnate broke onto the political stage in 2015. Trump's victory the next year, defeating Obama's secretary of ...", 'title': "Obama's fears about Trump drive his stepped-up campaigning", 'link': 'https://www.washingtonpost.com/politics/2024/10/18/obama-trump-anxiety-harris-campaign/'}, {'snippet': 'Harris will hold a campaign event with former President Barack Obama in Georgia next Thursday, the first time the pair has campaigned side by side, a senior campaign official said. A week from ...', 'title': 'Obamas to hit the campaign trail in first joint appearances with Harris', 'link': 'https://www.nbcnews.com/politics/2024-election/obamas-hit-campaign-trail-first-joint-appearances-harris-rcna176034'}, {'snippet': 'Item 1 of 3 Former U.S. first lady Michelle Obama and her husband, former U.S. President Barack Obama, stand on stage during Day 2 of the Democratic National Convention (DNC) in Chicago, Illinois ...', 'title': 'Obamas set to hit campaign trail with Kamala Harris for first time', 'link': 'https://www.reuters.com/world/us/obamas-set-hit-campaign-trail-with-kamala-harris-first-time-2024-10-18/'}, {'snippet': 'Barack and Michelle Obama will make their first campaign appearances alongside Kamala Harris at rallies in Georgia and Michigan. By Reid J. Epstein Reporting from Ashwaubenon, Wis. Here come the ...', 'title': 'Harris Will Join Michelle Obama and Barack Obama on Campaign Trail', 'link': 'https://www.nytimes.com/2024/10/18/us/politics/kamala-harris-michelle-obama-barack-obama.html'}]
Result Type: <class 'list'>
```
---------
Co-authored-by: vbarda <vadym@langchain.dev>
This PR introduces a new `azure_ad_async_token_provider` attribute to
the `AzureOpenAI` and `AzureChatOpenAI` classes in `partners/openai` and
`community` packages, given it's currently supported on `openai` package
as
[AsyncAzureADTokenProvider](https://github.com/openai/openai-python/blob/main/src/openai/lib/azure.py#L33)
type.
The reason for creating a new attribute is to avoid breaking changes.
Let's say you have an existing code that uses a `AzureOpenAI` or
`AzureChatOpenAI` instance to perform both sync and async operations.
The `azure_ad_token_provider` will work exactly as it is today, while
`azure_ad_async_token_provider` will override it for async requests.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:**
This PR updates `CassandraGraphVectorStore` to be based off
`CassandraVectorStore`, instead of using a custom CQL implementation.
This allows users using a `CassandraVectorStore` to upgrade to a
`GraphVectorStore` without having to change their database schema or
re-embed documents.
This PR also updates the documentation of the `GraphVectorStore` base
class and contains native async implementations for the standard graph
methods: `traversal_search` and `mmr_traversal_search` in
`CassandraVectorStore`.
**Issue:** No issue number.
**Dependencies:** https://github.com/langchain-ai/langchain/pull/27078
(already-merged)
**Lint and test**:
- Lint and tests all pass, including existing
`CassandraGraphVectorStore` tests.
- Also added numerous additional tests based of the tests in
`langchain-astradb` which cover many more scenarios than the existing
tests for `Cassandra` and `CassandraGraphVectorStore`
** BREAKING CHANGE**
Note that this is a breaking change for existing users of
`CassandraGraphVectorStore`. They will need to wipe their database table
and restart.
However:
- The interfaces have not changed. Just the underlying storage
mechanism.
- Any one using `langchain_community.vectorstores.Cassandra` can instead
use `langchain_community.graph_vectorstores.CassandraGraphVectorStore`
and they will gain Graph capabilities without having to re-embed their
existing documents. This is the primary goal of this PR.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [ X] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
- [ X]
- **Issue:** issue #26941
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description**
This PR introduces the proxies parameter to the RecursiveUrlLoader
class, allowing the user to specify proxy servers for requests. This
update enables crawling through proxy servers, providing enhanced
flexibility for network configurations.
The key changes include:
1.Added an optional proxies parameter to the constructor (__init__).
2.Updated the documentation to explain the proxies parameter usage with
an example.
3.Modified the _get_child_links_recursive method to pass the proxies
parameter to the requests.get function.
**Sample Usage**
```python
from bs4 import BeautifulSoup as Soup
from langchain_community.document_loaders.recursive_url_loader import RecursiveUrlLoader
proxies = {
"http": "http://localhost:1080",
"https": "http://localhost:1080",
}
url = "https://python.langchain.com/docs/concepts/#langchain-expression-language-lcel"
loader = RecursiveUrlLoader(
url=url, max_depth=1, extractor=lambda x: Soup(x, "html.parser").text,proxies=proxies
)
docs = loader.load()
```
---------
Co-authored-by: root <root@thb>
We have released the
[langchain-databricks](https://github.com/langchain-ai/langchain-databricks)
package for Databricks integration. This PR deprecates the legacy
classes within `langchain-community`.
---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description**:
This PR add support of clob/blob data type for oracle document loader,
clob/blob can only be read by oracledb package when connection is open,
so reformat code to process data before connection closes.
**Dependencies**:
oracledb package same as before. pip install oracledb
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
- This pull request addresses a bug in Langchain's VLLM integration,
where the use_beam_search parameter was erroneously passed to
SamplingParams. The SamplingParams class in vLLM does not support the
use_beam_search argument, which caused a TypeError.
- This PR introduces logic to filter out unsupported parameters,
ensuring that only valid parameters are passed to SamplingParams. As a
result, the integration now functions as expected without errors.
- The bug was reproduced by running the code sample from Langchain’s
documentation, which triggered the error due to the invalid parameter.
This fix resolves that error by implementing proper parameter filtering.
**VLLM Sampling Params Class:**
https://github.com/vllm-project/vllm/blob/main/vllm/sampling_params.py
**Issue:**
I could not found an Issue that belongs to this. Fixes "TypeError:
Unexpected keyword argument 'use_beam_search'" error when using VLLM
from Langchain.
**Dependencies:**
None.
**Tests and Documentation**:
Tests:
No new functionality was added, but I tested the changes by running
multiple prompts through the VLLM integration with various parameter
configurations. All tests passed successfully without breaking
compatibility.
Docs
No documentation changes were necessary as this is a bug fix.
**Reproducing the Error:**
https://python.langchain.com/docs/integrations/llms/vllm/
The code sample from the original documentation can be used to reproduce
the error I got.
from langchain_community.llms import VLLM
llm = VLLM(
model="mosaicml/mpt-7b",
trust_remote_code=True, # mandatory for hf models
max_new_tokens=128,
top_k=10,
top_p=0.95,
temperature=0.8,
)
print(llm.invoke("What is the capital of France ?"))

This PR resolves the issue by ensuring that only valid parameters are
passed to SamplingParams.
**Description**: PR fixes some formatting errors in deprecation message
in the `langchain_community.vectorstores.pgvector` module, where it was
missing spaces between a few words, and one word was misspelled.
**Issue**: n/a
**Dependencies**: n/a
Signed-off-by: mpeveler@timescale.com
Co-authored-by: Erick Friis <erick@langchain.dev>
PR message:
Description:
This PR refactors the Arxiv API wrapper by extracting the Arxiv search
logic into a helper function (_fetch_results) to reduce code duplication
and improve maintainability. The helper function is used in methods like
get_summaries_as_docs, run, and lazy_load, streamlining the code and
making it easier to maintain in the future.
Issue:
This is a minor refactor, so no specific issue is being fixed.
Dependencies:
No new dependencies are introduced with this change.
Add tests and docs:
No new integrations were added, so no additional tests or docs are
necessary for this PR.
Lint and test:
I have run make format, make lint, and make test to ensure all checks
pass successfully.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR updates the integration with OCI data science model deployment
service.
- Update LLM to support streaming and async calls.
- Added chat model.
- Updated tests and docs.
- Updated `libs/community/scripts/check_pydantic.sh` since the use of
`@pre_init` is removed from existing integration.
- Updated `libs/community/extended_testing_deps.txt` as this integration
requires `langchain_openai`.
---------
Co-authored-by: MING KANG <ming.kang@oracle.com>
Co-authored-by: Dmitrii Cherkasov <dmitrii.cherkasov@oracle.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR updates the Firecrawl Document Loader to use the recently
released V1 API of Firecrawl.
**Key Updates:**
**Firecrawl V1 Integration:** Updated the document loader to leverage
the new Firecrawl V1 API for improved performance, reliability, and
developer experience.
**Map Functionality Added:** Introduced the map mode for more flexible
document loading options.
These updates enhance the integration and provide access to the latest
features of Firecrawl.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
Updated
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
twitter: @MaxHTran
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
Not needed due to small change
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Max Tran <maxtra@amazon.com>
Starting with Clickhouse version 24.8, a different type of configuration
has been introduced in the vectorized data ingestion, and if this
configuration occurs, an error occurs when generating the table. As can
be seen below:

---------
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description**:
this PR enable VectorStore TLS and authentication (digest, basic) with
HTTP/2 for Infinispan server.
Based on httpx.
Added docker-compose facilities for testing
Added documentation
**Dependencies:**
requires `pip install httpx[http2]` if HTTP2 is needed
**Twitter handle:**
https://twitter.com/infinispan
**Description:** this PR adds a set of methods to deal with metadata
associated to the vector store entries. These, while essential to the
Graph-related extension of the `Cassandra` vector store, are also useful
in themselves. These are (all come in their sync+async versions):
- `[a]delete_by_metadata_filter`
- `[a]replace_metadata`
- `[a]get_by_document_id`
- `[a]metadata_search`
Additionally, a `[a]similarity_search_with_embedding_id_by_vector`
method is introduced to better serve the store's internal working (esp.
related to reranking logic).
**Issue:** no issue number, but now all Document's returned bear their
`.id` consistently (as a consequence of a slight refactoring in how the
raw entries read from DB are made back into `Document` instances).
**Dependencies:** (no new deps: packaging comes through langchain-core
already; `cassio` is now required to be version 0.1.10+)
**Add tests and docs**
Added integration tests for the relevant newly-introduced methods.
(Docs will be updated in a separate PR).
**Lint and test** Lint and (updated) test all pass.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Add timeout at client side for UCFunctionToolkit and add retry logic.
Users could specify environment variable
`UC_TOOL_CLIENT_EXECUTION_TIMEOUT` to increase the timeout value for
retrying to get the execution response if the status is pending. Default
timeout value is 120s.
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
Tested in Databricks:
<img width="1200" alt="image"
src="https://github.com/user-attachments/assets/54ab5dfc-5e57-4941-b7d9-bfe3f8ad3f62">
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Signed-off-by: serena-ruan <serena.rxy@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- [ ] **PR title**: docs: fix typo in SQLStore import path
- [ ] **PR message**:
- **Description:** This PR corrects a typo in the docstrings for the
class SQLStore(BaseStore[str, bytes]). The import path in the docstring
currently reads from langchain_rag.storage import SQLStore, which should
be changed to langchain_community.storage import SQLStore. This typo is
also reflected in the official documentation.
- **Issue:** N/A
- **Dependencies:** None
- **Twitter handle:** N/A
Co-authored-by: Erick Friis <erick@langchain.dev>
Given the current erroring behavior, every time we've moved a kwarg from
model_kwargs and made it its own field that was a breaking change.
Updating this behavior to support the old instantiations /
serializations.
Assuming build_extra_kwargs was not something that itself is being used
externally and needs to be kept backwards compatible
These allow converting linked documents (such as those used with
GraphVectorStore) to networkx for rendering and/or in-memory graph
algorithms such as community detection.
**Description:** Moves callback to before yield for `_stream` and
`_astream` function for the textgen model in the community llm package
**Issue:** #16913
**Description**:
Adds a vector store integration with
[sqlite-vec](https://alexgarcia.xyz/sqlite-vec/), the successor to
sqlite-vss that is a single C file with no external dependencies.
Pretty straightforward, just copy-pasted the sqlite-vss integration and
made a few tweaks and added integration tests. Only question is whether
all documentation should be directed away from sqlite-vss if it is
defacto deprecated (cc @asg017).
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: philippe-oger <philippe.oger@adevinta.com>
**Description:** Moves yield to after callback for `_stream` and
`_astream` function for the gigachat model in the community llm package
**Issue:** #16913
- **Description:** This pull request addresses the validation error in
`SettingsConfigDict` due to extra fields in the `.env` file. The issue
is prevalent across multiple Langchain modules. This fix ensures that
extra fields in the `.env` file are ignored, preventing validation
errors.
**Changes include:**
- Applied fixes to modules using `SettingsConfigDict`.
- **Issue:** NA, similar
https://github.com/langchain-ai/langchain/issues/26850
- **Dependencies:** NA
- **Description:** The flag is named `anonymize_snippets`. When set to
true, the Pebblo server will anonymize snippets by redacting all
personally identifiable information (PII) from the snippets going into
VectorDB and the generated reports
- **Issue:** NA
- **Dependencies:** NA
- **docs**: Updated
**Description:** Moves yield to after callback for `_stream` and
`_astream` function for the deepsparse model in the community package
**Issue:** #16913
- **Description:** This PR fixes the response parsing logic for
`ChatDeepInfra`, more specifially `_convert_delta_to_message_chunk()`,
which is invoked when streaming via `ChatDeepInfra`.
- **Issue:** Streaming from DeepInfra via `ChatDeepInfra` is currently
broken because the response parsing logic doesn't handle that
`tool_calls` can be `None`. (There is no GitHub issue for this problem
yet.)
- **Dependencies:** –
- **Twitter handle:** –
Keeping this here as a reminder:
> If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:** Moves yield to after callback for
`_prepare_input_and_invoke_stream` and
`_aprepare_input_and_invoke_stream` for bedrock llm in community
package.
**Issue:** #16913
without this `model_config` importing this package produces warnings
about "model_name" having conflicts with protected namespace "model_".
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:**
When PR body is empty `get_pull_request` method fails with bellow
exception.
**Issue:**
```
TypeError('expected string or buffer')Traceback (most recent call last):
File ".../.venv/lib/python3.9/site-packages/langchain_core/tools/base.py", line 661, in run
response = context.run(self._run, *tool_args, **tool_kwargs)
File ".../.venv/lib/python3.9/site-packages/langchain_community/tools/github/tool.py", line 52, in _run
return self.api_wrapper.run(self.mode, query)
File ".../.venv/lib/python3.9/site-packages/langchain_community/utilities/github.py", line 816, in run
return json.dumps(self.get_pull_request(int(query)))
File ".../.venv/lib/python3.9/site-packages/langchain_community/utilities/github.py", line 495, in get_pull_request
add_to_dict(response_dict, "body", pull.body)
File ".../.venv/lib/python3.9/site-packages/langchain_community/utilities/github.py", line 487, in add_to_dict
tokens = get_tokens(value)
File ".../.venv/lib/python3.9/site-packages/langchain_community/utilities/github.py", line 483, in get_tokens
return len(tiktoken.get_encoding("cl100k_base").encode(text))
File "....venv/lib/python3.9/site-packages/tiktoken/core.py", line 116, in encode
if match := _special_token_regex(disallowed_special).search(text):
TypeError: expected string or buffer
```
**Twitter:** __gorros__
Thank you for contributing to LangChain!
Fix error like
<img width="1167" alt="image"
src="https://github.com/user-attachments/assets/2e219b26-ec7e-48ef-8111-e0ff2f5ac4c0">
After the fix:
<img width="584" alt="image"
src="https://github.com/user-attachments/assets/48f36fe7-628c-48b6-81b2-7fe741e4ca85">
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Signed-off-by: serena-ruan <serena.rxy@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Added PebbloTextLoader for loading text in
PebbloSafeLoader.
- Since PebbloSafeLoader wraps document loaders, this new loader enables
direct loading of text into Documents using PebbloSafeLoader.
- **Issue:** NA
- **Dependencies:** NA
- [x] **Tests**: Added/Updated tests
# Description
[Vector store base
class](4cdaca67dc/libs/core/langchain_core/vectorstores/base.py (L65))
currently expects `ids` to be passed in and that is what it passes along
to the AzureSearch vector store when attempting to `add_texts()`.
However AzureSearch expects `keys` to be passed in. When they are not
present, AzureSearch `add_embeddings()` makes up new uuids. This is a
problem when trying to run indexing. [Indexing code
expects](b297af5482/libs/core/langchain_core/indexing/api.py (L371))
the documents to be uploaded using provided ids. Currently AzureSearch
ignores `ids` passed from `indexing` and makes up new ones. Later when
`indexer` attempts to delete removed file, it uses the `id` it had
stored when uploading the document, however it was uploaded under
different `id`.
**Twitter handle: @martintriska1**
Page content sometimes is empty when PyMuPDF can not find text on pages.
For example, this can happen when the text of the PDF is not copyable
"by hand". Then an OCR solution is need - which is not integrated here.
This warning should accurately warn the user that some pages are lost
during this process.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Fixes#26212: replaced the raw string with backslashes. Alternative:
raw-stringif the full docstring.
---------
Co-authored-by: Erick Friis <erickfriis@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
### Description:
This pull request significantly enhances the MongodbLoader class in the
LangChain community package by adding robust metadata customization and
improved field extraction capabilities. The updated class now allows
users to specify additional metadata fields through the metadata_names
parameter, enabling the extraction of both top-level and deeply nested
document attributes as metadata. This flexibility is crucial for users
who need to include detailed contextual information without altering the
database schema.
Moreover, the include_db_collection_in_metadata flag offers optional
inclusion of database and collection names in the metadata, allowing for
even greater customization depending on the user's needs.
The loader's field extraction logic has been refined to handle missing
or nested fields more gracefully. It now employs a safe access mechanism
that avoids the KeyError previously encountered when a specified nested
field was absent in a document. This update ensures that the loader can
handle diverse and complex data structures without failure, making it
more resilient and user-friendly.
### Issue:
This pull request addresses a critical issue where the MongodbLoader
class in the LangChain community package could throw a KeyError when
attempting to access nested fields that may not exist in some documents.
The previous implementation did not handle the absence of specified
nested fields gracefully, leading to runtime errors and interruptions in
data processing workflows.
This enhancement ensures robust error handling by safely accessing
nested document fields, using default values for missing data, thus
preventing KeyError and ensuring smoother operation across various data
structures in MongoDB. This improvement is crucial for users working
with diverse and complex data sets, ensuring the loader can adapt to
documents with varying structures without failing.
### Dependencies:
Requires motor for asynchronous MongoDB interaction.
### Twitter handle:
N/A
### Add tests and docs
Tests: Unit tests have been added to verify that the metadata inclusion
toggle works as expected and that the field extraction correctly handles
nested fields.
Docs: An example notebook demonstrating the use of the enhanced
MongodbLoader is included in the docs/docs/integrations directory. This
notebook includes setup instructions, example usage, and outputs.
(Here is the notebook link : [colab
link](https://colab.research.google.com/drive/1tp7nyUnzZa3dxEFF4Kc3KS7ACuNF6jzH?usp=sharing))
Lint and test
Before submitting, I ran make format, make lint, and make test as per
the contribution guidelines. All tests pass, and the code style adheres
to the LangChain standards.
```python
import unittest
from unittest.mock import patch, MagicMock
import asyncio
from langchain_community.document_loaders.mongodb import MongodbLoader
class TestMongodbLoader(unittest.TestCase):
def setUp(self):
"""Setup the MongodbLoader test environment by mocking the motor client
and database collection interactions."""
# Mocking the AsyncIOMotorClient
self.mock_client = MagicMock()
self.mock_db = MagicMock()
self.mock_collection = MagicMock()
self.mock_client.get_database.return_value = self.mock_db
self.mock_db.get_collection.return_value = self.mock_collection
# Initialize the MongodbLoader with test data
self.loader = MongodbLoader(
connection_string="mongodb://localhost:27017",
db_name="testdb",
collection_name="testcol"
)
@patch('langchain_community.document_loaders.mongodb.AsyncIOMotorClient', return_value=MagicMock())
def test_constructor(self, mock_motor_client):
"""Test if the constructor properly initializes with the correct database and collection names."""
loader = MongodbLoader(
connection_string="mongodb://localhost:27017",
db_name="testdb",
collection_name="testcol"
)
self.assertEqual(loader.db_name, "testdb")
self.assertEqual(loader.collection_name, "testcol")
def test_aload(self):
"""Test the aload method to ensure it correctly queries and processes documents."""
# Setup mock data and responses for the database operations
self.mock_collection.count_documents.return_value = asyncio.Future()
self.mock_collection.count_documents.return_value.set_result(1)
self.mock_collection.find.return_value = [
{"_id": "1", "content": "Test document content"}
]
# Run the aload method and check responses
loop = asyncio.get_event_loop()
results = loop.run_until_complete(self.loader.aload())
self.assertEqual(len(results), 1)
self.assertEqual(results[0].page_content, "Test document content")
def test_construct_projection(self):
"""Verify that the projection dictionary is constructed correctly based on field names."""
self.loader.field_names = ['content', 'author']
self.loader.metadata_names = ['timestamp']
expected_projection = {'content': 1, 'author': 1, 'timestamp': 1}
projection = self.loader._construct_projection()
self.assertEqual(projection, expected_projection)
if __name__ == '__main__':
unittest.main()
```
### Additional Example for Documentation
Sample Data:
```json
[
{
"_id": "1",
"title": "Artificial Intelligence in Medicine",
"content": "AI is transforming the medical industry by providing personalized medicine solutions.",
"author": {
"name": "John Doe",
"email": "john.doe@example.com"
},
"tags": ["AI", "Healthcare", "Innovation"]
},
{
"_id": "2",
"title": "Data Science in Sports",
"content": "Data science provides insights into player performance and strategic planning in sports.",
"author": {
"name": "Jane Smith",
"email": "jane.smith@example.com"
},
"tags": ["Data Science", "Sports", "Analytics"]
}
]
```
Example Code:
```python
loader = MongodbLoader(
connection_string="mongodb://localhost:27017",
db_name="example_db",
collection_name="articles",
filter_criteria={"tags": "AI"},
field_names=["title", "content"],
metadata_names=["author.name", "author.email"],
include_db_collection_in_metadata=True
)
documents = loader.load()
for doc in documents:
print("Page Content:", doc.page_content)
print("Metadata:", doc.metadata)
```
Expected Output:
```
Page Content: Artificial Intelligence in Medicine AI is transforming the medical industry by providing personalized medicine solutions.
Metadata: {'author_name': 'John Doe', 'author_email': 'john.doe@example.com', 'database': 'example_db', 'collection': 'articles'}
```
Thank you.
---
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
The object extends from
langchain_community.chat_models.openai.ChatOpenAI which doesn't have
`bind_tools` defined. I tried extending from
`langchain_openai.ChatOpenAI` in
https://github.com/langchain-ai/langchain/pull/25975 but that PR got
closed because this is not correct.
So adding our own `bind_tools` (which for now copying from ChatOpenAI is
good enough) will solve the tool calling issue we are having now.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [X ] **PR title**
- [X ] **PR message**:
**Description:** adds a handler for when delta choice is None
**Issue:** Fixes#25951
**Dependencies:** Not applicable
- [ X] **Add tests and docs**: Not applicable
- [X ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no>
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
Change the default Neo4j username/password (when not supplied as
environment variable or in code) from `None` to `""`.
Neo4j has an option to [disable
auth](https://neo4j.com/docs/operations-manual/current/configuration/configuration-settings/#config_dbms.security.auth_enabled)
which is helpful when developing. When auth is disabled, the username /
password through the `neo4j` module should be `""` (ie an empty string).
Empty strings get marked as false in
`langchain_core.utils.env.get_from_dict_or_env` -- changing this code /
behaviour would have a wide impact and is undesirable.
In order to both _allow_ access to Neo4j with auth disabled and _not_
impact `langchain_core` this patch is presented. The downside would be
that if a user forgets to set NEO4J_USERNAME or NEO4J_PASSWORD they
would see an invalid credentials error rather than missing credentials
error. This could be mitigated but would result in a less elegant patch!
**Issue:**
Fix issue where langchain cannot communicate with Neo4j if Neo4j auth is
disabled.
- **PR title**: "community: add Jina Search tool"
- **Description:** Added the Jina Search tool for querying the Jina
search API. This includes the implementation of the JinaSearchAPIWrapper
and the JinaSearch tool, along with a Jupyter notebook example
demonstrating its usage.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** [Twitter
handle](https://x.com/yashp3020?t=7wM0gQ7XjGciFoh9xaBtqA&s=09)
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Returns an array of results which is more specific and easier for later
use.
Tested locally:
```
resp = tool.invoke("what's the weather like in Shanghai?")
for item in resp:
print(item)
```
returns
```
{'snippet': '<b>Shanghai</b>, <b>Shanghai</b>, China <b>Weather</b> Forecast, with current conditions, wind, air quality, and what to expect for the next 3 days.', 'title': 'Shanghai, Shanghai, China Weather Forecast | AccuWeather', 'link': 'https://www.accuweather.com/en/cn/shanghai/106577/weather-forecast/106577'}
{'snippet': '5. 99 / 87 °F. 6. 99 / 86 °F. 7. Detailed forecast for 14 days. Need some help? Current <b>weather</b> <b>in Shanghai</b> and forecast for today, tomorrow, and next 14 days.', 'title': 'Weather for Shanghai, Shanghai Municipality, China - timeanddate.com', 'link': 'https://www.timeanddate.com/weather/china/shanghai'}
{'snippet': '<b>Shanghai</b> - <b>Weather</b> warnings issued 14-day forecast. <b>Weather</b> warnings issued. Forecast - <b>Shanghai</b>. Day by day forecast. Last updated Friday at 01:05. Tonight, ... Temperature feels <b>like</b> 34 ...', 'title': 'Shanghai - BBC Weather', 'link': 'https://www.bbc.com/weather/1796236'}
{'snippet': 'Current <b>weather</b> <b>in Shanghai</b>, <b>Shanghai</b>, China. Check current conditions <b>in Shanghai</b>, <b>Shanghai</b>, China with radar, hourly, and more.', 'title': 'Shanghai, Shanghai, China Current Weather | AccuWeather', 'link': 'https://www.accuweather.com/en/cn/shanghai/106577/current-weather/106577'}
13-Day Beijing, Xi'an, Chengdu, <b>Shanghai</b> Chinese Language and Culture Immersion Tour. <b>Shanghai</b> in September. Average daily temperature range: 23–29°C (73–84°F) Average rainy days: 10. Average sunny days: 20. September ushers in pleasant autumn <b>weather</b>, making it one of the best months to visit <b>Shanghai</b>. <b>Weather</b> in <b>Shanghai</b>: Climate, Seasons, and Average Monthly Temperature. <b>Shanghai</b> has a subtropical maritime monsoon climate, meaning high humidity and lots of rain. Hot muggy summers, cool falls, cold winters with little snow, and warm springs are the norm. Midsummer through early fall is the best time to visit <b>Shanghai</b>. <b>Shanghai</b>, <b>Shanghai</b>, China <b>Weather</b> Forecast, with current conditions, wind, air quality, and what to expect for the next 3 days. 1165. 45.9. 121. Winter, from December to February, is quite cold: the average January temperature is 5 °C (41 °F). There may be cold periods, with highs around 5 °C (41 °F) or below, and occasionally, even snow can fall. The temperature dropped to -10 °C (14 °F) in January 1977 and to -7 °C (19.5 °F) in January 2016. 5. 99 / 87 °F. 6. 99 / 86 °F. 7. Detailed forecast for 14 days. Need some help? Current <b>weather</b> in <b>Shanghai</b> and forecast for today, tomorrow, and next 14 days. Everything you need to know about today's <b>weather</b> in <b>Shanghai</b>, <b>Shanghai</b>, China. High/Low, Precipitation Chances, Sunrise/Sunset, and today's Temperature History. <b>Shanghai</b> - <b>Weather</b> warnings issued 14-day forecast. <b>Weather</b> warnings issued. Forecast - <b>Shanghai</b>. Day by day forecast. Last updated Friday at 01:05. Tonight, ... Temperature feels <b>like</b> 34 ... <b>Shanghai</b> 14 Day Extended Forecast. <b>Weather</b> Today <b>Weather</b> Hourly 14 Day Forecast Yesterday/Past <b>Weather</b> Climate (Averages) Currently: 84 °F. Passing clouds. (<b>Weather</b> station: <b>Shanghai</b> Hongqiao Airport, China). See more current <b>weather</b>. Current <b>weather</b> in <b>Shanghai</b>, <b>Shanghai</b>, China. Check current conditions in <b>Shanghai</b>, <b>Shanghai</b>, China with radar, hourly, and more. <b>Shanghai</b> <b>Weather</b> Forecasts. <b>Weather Underground</b> provides local & long-range <b>weather</b> forecasts, weatherreports, maps & tropical <b>weather</b> conditions for the <b>Shanghai</b> area.
```
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:**
Adding a new option to the CSVLoader that allows us to implicitly
specify the columns that are used for generating the Document content.
Currently these are implicitly set as "all fields not part of the
metadata_columns".
In some cases however it is useful to have a field both as a metadata
and as part of the document content.
- **Description:** The function `_is_assistants_builtin_tool` didn't had
support for `file_search` from OpenAI. This was creating conflict and
blocking the usage of such. OpenAI Assistant changed from`retrieval` to
`file_search`.
The following code
```
agent = OpenAIAssistantV2Runnable.create_assistant(
name="Data Analysis Assistant",
instructions=prompt[0].content,
tools={'type': 'file_search'},
model=self.chat_config.connection.deployment_name,
client=llm,
as_agent=True,
tool_resources={
"file_search": {
"vector_store_ids": vector_store_id
}
}
)
```
Was throwing the following error
```
Traceback (most recent call last):
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/chat/chat_decorators.py",
line 500, in get_response
return await super().get_response(post, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/chat/chat_decorators.py",
line 96, in get_response
response = await self.inner_chat.get_response(post, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/chat/chat_decorators.py",
line 96, in get_response
response = await self.inner_chat.get_response(post, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/chat/chat_decorators.py",
line 96, in get_response
response = await self.inner_chat.get_response(post, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Previous line repeated 4 more times]
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/chat/azure_open_ai_chat.py",
line 147, in get_response
chain = chain_factory.get_chain(prompts, post.conversation.id,
overrides, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/llm_connections/chains.py",
line 1324, in get_chain
agent = OpenAIAssistantV2Runnable.create_assistant(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/anaconda3/envs/shell-e/lib/python3.11/site-packages/langchain_community/agents/openai_assistant/base.py",
line 256, in create_assistant
tools=[_get_assistants_tool(tool) for tool in tools], # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/anaconda3/envs/shell-e/lib/python3.11/site-packages/langchain_community/agents/openai_assistant/base.py",
line 256, in <listcomp>
tools=[_get_assistants_tool(tool) for tool in tools], # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/anaconda3/envs/shell-e/lib/python3.11/site-packages/langchain_community/agents/openai_assistant/base.py",
line 119, in _get_assistants_tool
return convert_to_openai_tool(tool)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/anaconda3/envs/shell-e/lib/python3.11/site-packages/langchain_core/utils/function_calling.py",
line 255, in convert_to_openai_tool
function = convert_to_openai_function(tool)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/anaconda3/envs/shell-e/lib/python3.11/site-packages/langchain_core/utils/function_calling.py",
line 230, in convert_to_openai_function
raise ValueError(
ValueError: Unsupported function
{'type': 'file_search'}
Functions must be passed in as Dict, pydantic.BaseModel, or Callable. If
they're a dict they must either be in OpenAI function format or valid
JSON schema with top-level 'title' and 'description' keys.
```
With the proposed changes, this is fixed and the function will have support for `file_search`.
This was the only place missing the support for `file_search`.
Reference doc
https://platform.openai.com/docs/assistants/tools/file-search
- **Twitter handle:** luizf0992
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Description:
- Add system templates and user templates in integration testing
- initialize the response id field value to request_id
- Adjust the default model to hunyuan-pro
- Remove the default values of Temperature and TopP
- Add SystemMessage
all the integration tests have passed.
1、Execute integration tests for the first time
<img width="1359" alt="71ca77a2-e9be-4af6-acdc-4d665002bd9b"
src="https://github.com/user-attachments/assets/9298dc3a-aa26-4bfa-968b-c011a4e699c9">
2、Run the integration test a second time
<img width="1501" alt="image"
src="https://github.com/user-attachments/assets/61335416-4a67-4840-bb89-090ba668e237">
Issue: None
Dependencies: None
Twitter handle: None
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:** [IPEX-LLM](https://github.com/intel-analytics/ipex-llm)
is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local
PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low
latency. This PR adds Intel GPU support to `ipex-llm` llm integration.
**Dependencies:** `ipex-llm`
**Contribution maintainer**: @ivy-lv11 @Oscilloscope98
**tests and docs**:
- Add: langchain/docs/docs/integrations/llms/ipex_llm_gpu.ipynb
- Update: langchain/docs/docs/integrations/llms/ipex_llm_gpu.ipynb
- Update: langchain/libs/community/tests/llms/test_ipex_llm.py
---------
Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>
- **Description:**
Improve llamacpp embedding class by adding the `device` parameter so it
can be passed to the model and used with `gpu`, `cpu` or Apple metal
(`mps`).
Improve performance by making use of the bulk client api to compute
embeddings in batches.
- **Dependencies:** none
- **Tag maintainer:**
@hwchase17
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:**
Starting from Neo4j 5.23 (22 August 2024), with vector-2.0 indexes,
`vector.dimensions` is not required to be set, which will cause it the
key not exist error in index config if it's not set.
Since the existence of vector.dimensions will only ensure additional
checks, this commit turns embedding dimension check optional, and only
do checks when it exists (not None).
https://neo4j.com/release-notes/database/neo4j-5/
**Twitter handle:** @HollowM186
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Added `ref` query parameter so data is not loaded
only from the default branch but any branch passed
---------
Co-authored-by: Osama Mehdi <mehdi@hm.edu>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
## Description
- Updates the self-query retriever factory to check for the new Qdrant
vector store class. i.e. `langchain_qdrant.QdrantVectorstore`.
- Deprecates `QdrantSparseVectorRetriever`, since the vector store
implementation natively supports it now.
Resolves#25798
- **Description:** When useing LLM integration moonshot,it's occurring
error "'Moonshot' object has no attribute '_client'",it's because of the
"_client" that is private in pydantic v1.0 so that we can't use it.I
turn "_client" into "client" , the error to be resolved!
- **Issue:** the issue #24390
- **Dependencies:** none
- **Twitter handle:** @Rainsubtime
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Co-authored-by: Cyue <Cyue_work2001@163.com>
- [x] **PR title - community: add neo4j query constructor for self
query**
- [x] **PR message**
- **Description:** adding a Neo4jTranslator so that the Neo4j vector
database can use SelfQueryRetriever
- **Issue:** this issue had been raised before in #19748
- **Dependencies:** none.
- **Twitter handle:** @moyi_dang
- p.s. I have not added the query constructor in BUILTIN_TRANSLATORS in
this PR, I want to make changes to only one package at a time.
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- [ ] **PR title**: community: add tests for ChatOctoAI
- [ ] **PR message**:
Description: Added unit tests for the ChatOctoAI class in the community
package to ensure proper validation and default values. These tests
verify the correct initialization of fields, the handling of missing
required parameters, and the proper setting of aliases.
Issue: N/A
Dependencies: None
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Thank you for contributing to LangChain!
community:premai[patch]: standardize init args
- updated `temperature` with Pydantic Field, updated the unit test.
- updated `max_tokens` with Pydantic Field, updated the unit test.
- updated `max_retries` with Pydantic Field, updated the unit test.
Related to #20085
---------
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Description: Moves yield to after callback for _astream for gigachat in
the community package
Issue: #16913
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- [x] **PR title**: "community: Patch enable to use Amazon OpenSearch
Serverless for Semantic Cache store"
- [x] **PR message**:
- **Description:** OpenSearchSemanticCache class support Amazon
OpenSearch Serverless for Semantic Cache store, it's only required to
pass auth(http_auth) parameter to initializer
- **Dependencies:** none
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Jinoos Lee <jinoos@amazon.com>
it fixes two issues:
### YGPTs are broken #25575
```
File ....conda/lib/python3.11/site-packages/langchain_community/embeddings/yandex.py:211, in _make_request(self, texts, **kwargs)
..
--> 211 res = stub.TextEmbedding(request, metadata=self._grpc_metadata) # type: ignore[attr-defined]
AttributeError: 'YandexGPTEmbeddings' object has no attribute '_grpc_metadata'
```
My gut feeling that #23841 is the cause.
I have to drop leading underscore from `_grpc_metadata` for quickfix,
but I just don't know how to do it _pydantic_ enough.
### minor issue:
if we use `api_key`, which is not the best practice the code fails with
```
File ~/git/...../python3.11/site-packages/langchain_community/embeddings/yandex.py:119, in YandexGPTEmbeddings.validate_environment(cls, values)
...
AttributeError: 'tuple' object has no attribute 'append'
```
- Added new integration test. But it requires YGPT env available and
active account. I don't know how int tests dis\enabled in CI.
- added small unit tests with mocks. Should be fine.
---------
Co-authored-by: mikhail-khludnev <mikhail_khludnev@rntgroup.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
Support passing extra params when executing UC functions:
The params should be a dictionary with key EXECUTE_FUNCTION_ARG_NAME,
the assumption is that the function itself doesn't use such variable
name (starting and ending with double underscores), and if it does we
raise Exception.
If invalid params passing to the execute_statement, we raise Exception
as well.
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Signed-off-by: Serena Ruan <serena.rxy@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "community: optimize xinference llm import"
- [ ] **PR message**:
- **Description:** from xinferece_client import RESTfulClient when there
is no importing xinference.
- **Dependencies:** xinferece_client
- **Why do so:** the total xinference(pip install xinference[all]) is
too heavy for installing, let alone it is useless for langchain user
except RESTfulClient. The modification has maintained consistency with
the xinference embeddings
[embeddings/xinference](../blob/master/libs/community/langchain_community/embeddings/xinference.py#L89).
**Description:**
Adds the 'score' returned by Pinecone to the
`PineconeHybridSearchRetriever` list of returned Documents.
There is currently no way to return the score when using Pinecone hybrid
search, so in this PR I include it by default.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
### Description
adds an init method to ChatDeepInfra to set the model_name attribute
accordings to the argument
### Issue
currently, the model_name specified by the user during initialization of
the ChatDeepInfra class is never set. Therefore, it always chooses the
default model (meta-llama/Llama-2-70b-chat-hf, however probably since
this is deprecated it always uses meta-llama/Llama-3-70b-Instruct). We
stumbled across this issue and fixed it as proposed in this pull
request. Feel free to change the fix according to your coding guidelines
and style, this is just a proposal and we want to draw attention to this
problem.
### Dependencies
no additional dependencies required
Feel free to contact me or @timo282 and @finitearth if you have any
questions.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** Make the hyperlink only appear once in the
extract_hyperlinks tool output. (for some websites output contains
meaningless '#' hyperlinks multiple times which will extend the tokens
of context window without any advantage)
**Issue:** None
**Dependencies:** None
Added Azure Search Access Token Authentication instead of API KEY auth.
Fixes Issue: https://github.com/langchain-ai/langchain/issues/24263
Dependencies: None
Twitter: @levalencia
@baskaryan
Could you please review? First time creating a PR that fixes some code.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This addresses the issue mentioned in #25702
I have updated the endpoint used in validating the endpoint API type in
the AzureMLBaseEndpoint class from `/v1/completions` to `/completions`
and `/v1/chat/completions` to `/chat/completions`.
Co-authored-by: = <=>
- **Description:** Added langchain version while calling discover API
during both ingestion and retrieval
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA
---------
Co-authored-by: dristy.cd <dristy@clouddefense.io>
- **Description:** Updating source path and file path in Pebblo safe
loader for SharePoint apps during loading
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA
---------
Co-authored-by: dristy.cd <dristy@clouddefense.io>
- **PR message**: **Fix URL construction in newer Python versions**
- **Description:**
- Update the URL construction logic to use the .value attribute for
Routes enum members.
- This adjustment resolves an issue where the code worked correctly in
Python 3.9 but failed in Python 3.11.
- Clean up unused routes.
- **Issue:** NA
- **Dependencies:** NA
* Removed `ruff check --select I` as `I` is already selected and checked
in the main `ruff check` command
* Added checks for non-empty `PYTHON_FILES`
* Run `ruff check` only on `PYTHON_FILES`
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR introduces adjustments to ensure compatibility with the recently
released preview version of [TiDB Serverless Vector
Search](https://tidb.cloud/ai), aiming to prevent user confusion.
- TiDB Vector now supports vector indexing with cosine and l2 distance
strategies, although inner_product remains unsupported.
- Changing the distance strategy is currently not supported, so the test
cased should be adjusted.
updated stop and request_timeout so they aliased to stop_sequences, and
timeout respectively. Added test that both continue to set the same
underlying attributes.
Related to
[20085](https://github.com/langchain-ai/langchain/issues/20085)
Co-authored-by: ccurme <chester.curme@gmail.com>
Issue: the `service` optional parameter was mentioned but not used.
Fix: added this parameter.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
## Description
There is a bug in the concatenation of embeddings obtained from MLflow
that does not conform to the type hint requested by the function.
``` python
def _query(self, texts: List[str]) -> List[List[float]]:
```
It is logical to expect a **List[List[float]]** for a **List[str]**.
However, the append method encapsulates the response in a global List.
To avoid this, the extend method should be used, which will add the
embeddings of all strings at the same list level.
## Testing
I have tried using OpenAI-ADA to obtain the embeddings, and the result
of executing this snippet is as follows:
``` python
embeds = await MlflowAIGatewayEmbeddings().aembed_documents(texts=["hi", "how are you?"])
print(embeds)
```
``` python
[[[-0.03512698, -0.020624293, -0.015343423, ...], [-0.021260535, -0.011461929, -0.00033121882, ...]]]
```
When in reality, the expected result should be:
``` python
[[-0.03512698, -0.020624293, -0.015343423, ...], [-0.021260535, -0.011461929, -0.00033121882, ...]]
```
The above result complies with the expected type hint:
**List[List[float]]** . As I mentioned, we can achieve that by using the
extend method instead of the append method.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
Description: Simply pass kwargs to allow arguments like "where" to be
propagated
Issue: Previously, db.delete(where={}) wouldn't work for chroma
vectorstores
Dependencies: N/A
Twitter handle: N/A
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Description: Send both the query and query_embedding to the Databricks
index for hybrid search.
Issue: When using hybrid search with non-Databricks managed embedding we
currently don't pass both the embedding and query_text to the index.
Hybrid search requires both of these. This change fixes this issue for
both `similarity_search` and `similarity_search_by_vector`.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
# Issue
As of late July, Perplexity [no longer supports Llama 3
models](https://docs.perplexity.ai/changelog/introducing-new-and-improved-sonar-models).
# Description
This PR updates the default model and doc examples to reflect their
latest supported model. (Mostly updating the same places changed by
#23723.)
# Twitter handle
`@acompa_` on behalf of the team at Not Diamond. Check us out
[here](https://notdiamond.ai).
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Fix handling of pipeline_kwargs to prioritize class attribute defaults.
#19770
Co-authored-by: jaizo <manuel.jaiczay@polygons.at>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
This PR adds tiny improvements to the `GithubFileLoader` document loader
and its code sample, addressing the following issues:
1. Currently, the `file_extension` argument of `GithubFileLoader` does
not change its behavior at all.
1. The `GithubFileLoader` sample code in
`docs/docs/integrations/document_loaders/github.ipynb` does not work as
it stands.
The respective solutions I propose are the following:
1. Remove `file_extension` argument from `GithubFileLoader`.
1. Specify the branch as `master` (not the default `main`) and rename
`documents` as `document`.
---------
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
When I used the Neo4JGraph enhanced_schema=True option, I ran into an
error because a prop min_size of None was compared numerically with an
int.
The fix I applied is similar to the pattern of skipping embeddings
elsewhere in the file.
Co-authored-by: ccurme <chester.curme@gmail.com>
Description: DeepInfra 500 errors have useful information in the text
field that isn't being exposed to the user. I updated the error message
to fix this.
As an example, this code
```
from langchain_community.chat_models import ChatDeepInfra
from langchain_core.messages import HumanMessage
model = "meta-llama/Meta-Llama-3-70B-Instruct"
deepinfra_api_token = "..."
model = ChatDeepInfra(model=model, deepinfra_api_token=deepinfra_api_token)
messages = [HumanMessage("All work and no play makes Jack a dull boy\n" * 9000)]
response = model.invoke(messages)
```
Currently gives this error:
```
langchain_community.chat_models.deepinfra.ChatDeepInfraException: DeepInfra Server: Error 500
```
This change would give the following error:
```
langchain_community.chat_models.deepinfra.ChatDeepInfraException: DeepInfra Server error status 500: {"error":{"message":"Requested input length 99009 exceeds maximum input length 8192"}}
```
**Refactor PebbloRetrievalQA**
- Created `APIWrapper` and moved API logic into it.
- Created smaller functions/methods for better readability.
- Properly read environment variables.
- Removed unused code.
- Updated models
**Issue:** NA
**Dependencies:** NA
**tests**: NA
**Refactor PebbloSafeLoader**
- Created `APIWrapper` and moved API logic into it.
- Moved helper functions to the utility file.
- Created smaller functions and methods for better readability.
- Properly read environment variables.
- Removed unused code.
**Issue:** NA
**Dependencies:** NA
**tests**: Updated