Commit Graph

1893 Commits

Author SHA1 Message Date
Bhav Sardana
624216aa64
community:Fix for Pydantic model validator of GoogleApiYoutubeLoader (#29694)
- **Description:** Community: bugfix for pedantic model validator for
GoogleApiYoutubeLoader
- **Issue:** #29165, #27432 
Fix is similar to #29346
2025-02-10 08:57:58 -05:00
Changyong Um
60740c44c5
community: Add configurable text key for indexing and the retriever in Pinecone Hybrid Search (#29697)
**issue**

In Langchain, the original content is generally stored under the `text`
key. However, the `PineconeHybridSearchRetriever` searches the `context`
field in the metadata and cannot change this key. To address this, I
have modified the code to allow changing the key to something other than
context.

In my opinion, following Langchain's conventions, the `text` key seems
more appropriate than `context`. However, since I wasn't sure about the
author's intent, I have left the default value as `context`.
2025-02-10 08:56:37 -05:00
manukychen
3de445d521
using getattr and default value to prevent 'OpenSearchVectorSearch' has no attribute 'bulk_size' (#29682)
- Description: Adding getattr methods and set default value 500 to
cls.bulk_size, it can prevent the error below:
Error: type object 'OpenSearchVectorSearch' has no attribute 'bulk_size'

- Issue: https://github.com/langchain-ai/langchain/issues/29071
2025-02-08 14:39:57 -05:00
Philippe PRADOS
beb75b2150
community[minor]: 05 - Refactoring PyPDFium2 parser (#29625)
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once. This specific part focuses on updating the
PyPDFium2 parser.

For more details, see
https://github.com/langchain-ai/langchain/pull/28970.
2025-02-07 21:31:12 -05:00
Christophe Bornet
723031d548
community: Bump ruff version to 0.9 (#29206)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-02-08 01:21:10 +00:00
Christophe Bornet
30f6c9f5c8
community: Use Blockbuster to detect blocking calls in asyncio during tests (#29609)
Same as https://github.com/langchain-ai/langchain/pull/29043 for
langchain-community.

**Dependencies:**
- blockbuster (test)

**Twitter handle:** cbornet_

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-02-08 01:10:39 +00:00
Marlene
4fa3ef0d55
Community/Partner: Adding Azure community and partner user agent to better track usage in Python (#29561)
- This pull request includes various changes to add a `user_agent`
parameter to Azure OpenAI, Azure Search and Whisper in the Community and
Partner packages. This helps in identifying the source of API requests
so we can better track usage and help support the community better. I
will also be adding the user_agent to the new `langchain-azure` repo as
well.

- No issue connected or  updated dependencies. 
- Utilises existing tests and docs

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-02-07 23:28:30 +00:00
ccurme
bff25b552c
community: release 0.3.17 (#29676) 2025-02-07 19:41:44 +00:00
Ikko Eltociear Ashimine
0d45ad57c1
community: update base_o365.py (#29657)
extention -> extension
2025-02-07 08:43:29 -05:00
Sunish Sheth
25ce1e211a
docs: Updating the imports for langchain-databricks to databricks-langchain (#29646)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-02-06 13:28:07 -08:00
Vincent Emonet
08b9eaaa6f
community: improve FastEmbedEmbeddings support for ONNX execution provider (e.g. GPU) (#29645)
I made a change to how was implemented the support for GPU in
`FastEmbedEmbeddings` to be more consistent with the existing
implementation `langchain-qdrant` sparse embeddings implementation

It is directly enabling to provide the list of ONNX execution providers:
https://github.com/langchain-ai/langchain/blob/master/libs/partners/qdrant/langchain_qdrant/fastembed_sparse.py#L15

It is a bit less clear to a user that just wants to enable GPU, but
gives more capabilities to work with other execution providers that are
not the `CUDAExecutionProvider`, and is more future proof

Sorry for the disturbance @ccurme

> Nice to see you just moved to `uv`! It is so much nicer to run
format/lint/test! No need to manually rerun the `poetry install` with
all required extras now
2025-02-06 15:31:23 -05:00
ccurme
d172984c91
infra: migrate to uv (#29566) 2025-02-06 13:36:26 -05:00
Vincent Emonet
db8201d4da
community: fix typo in the module imported when using GPU with FastEmbedEmbeddings (#29631)
Made a mistake in the module to import (the module stay the same only
the installed package changes), fixed it and tested it

https://github.com/langchain-ai/langchain/pull/29627

@ccurme
2025-02-06 10:26:08 -05:00
Mohammed Abbadi
f8fd65dea2
community: Update deeplake.py (#29633)
Deep Lake recently released version 4, which introduces significant
architectural changes, including a new on-disk storage format, enhanced
indexing mechanisms, and improved concurrency. However, LangChain's
vector store integration currently does not support Deep Lake v4 due to
breaking API changes.

Previously, the installation command was:
`pip install deeplake[enterprise]`
This installs the latest available version, which now defaults to Deep
Lake v4. Since LangChain's vector store integration is still dependent
on v3, this can lead to compatibility issues when using Deep Lake as a
vector database within LangChain.

To ensure compatibility, the installation command has been updated to:
`pip install deeplake[enterprise]<4.0.0`
This constraint ensures that pip installs the latest available version
of Deep Lake within the v3 series while avoiding the incompatible v4
update.
2025-02-06 10:25:13 -05:00
Vincent Emonet
0ac5536f04
community: add support for using GPUs with FastEmbedEmbeddings (#29627)
- **Description:** add a `gpu: bool = False` field to the
`FastEmbedEmbeddings` class which enables to use GPU (through ONNX CUDA
provider) when generating embeddings with any fastembed model. It just
requires the user to install a different dependency and we use a
different provider when instantiating `fastembed.TextEmbedding`
- **Issue:** when generating embeddings for a really large amount of
documents this drastically increase performance (honestly that is a must
have in some situations, you can't just use CPU it is way too slow)
- **Dependencies:** no direct change to dependencies, but internally the
users will need to install `fastembed-gpu` instead of `fastembed`, I
made all the changes to the init function to properly let the user know
which dependency they should install depending on if they enabled `gpu`
or not
 
cf. fastembed docs about GPU for more details:
https://qdrant.github.io/fastembed/examples/FastEmbed_GPU/

I did not added test because it would require access to a GPU in the
testing environment
2025-02-06 08:04:19 -05:00
Dmitrii Rashchenko
0ceda557aa
add o1 and o3-mini to pricing (#29628)
### PR Title:  
**community: add latest OpenAI models pricing**  

### Description:  
This PR updates the OpenAI model cost calculation mapping by adding the
latest OpenAI models, **o1 (non-preview)** and **o3-mini**, based on the
pricing listed on the [OpenAI pricing
page](https://platform.openai.com/docs/pricing).

### Changes:  
- Added pricing for `o1`, `o1-2024-12-17`, `o1-cached`, and
`o1-2024-12-17-cached` for input tokens.
- Added pricing for `o1-completion` and `o1-2024-12-17-completion` for
output tokens.
- Added pricing for `o3-mini`, `o3-mini-2025-01-31`, `o3-mini-cached`,
and `o3-mini-2025-01-31-cached` for input tokens.
- Added pricing for `o3-mini-completion` and
`o3-mini-2025-01-31-completion` for output tokens.

### Issue:  
N/A  

### Dependencies:  
None  

### Testing & Validation:  
- No functional changes outside of updating the cost mapping.  
- No tests were added or modified.
2025-02-06 08:02:20 -05:00
Mohammad Anash
f849305a56
fixed Bug in PreFilter of AzureCosmosDBNoSqlVectorSearch (#29613)
Description: Fixes PreFilter value handling in Azure Cosmos DB NoSQL
vectorstore. The current implementation fails to handle numeric values
in filter conditions, causing an undefined value variable error. This PR
adds support for numeric, boolean, and NULL values while maintaining the
existing string and list handling.

Changes:
Added handling for numeric types (int/float)
Added boolean value support
Added NULL value handling
Added type validation for unsupported values
Fixed scope of value variable initialization

Issue: 
Fixes #29610

Implementation Notes:
No changes to public API
Backwards compatible
Maintains consistent behavior with existing MongoDB-style filtering
Preserves SQL injection prevention through proper value handling

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-06 02:20:26 +00:00
Philippe PRADOS
6ff0d5c807
community[minor]: 04 - Refactoring PDFMiner parser (#29526)
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once. This specific part focuses on updating the XXX
parser.

For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-02-05 21:08:27 -05:00
Philippe PRADOS
5771e561fb
[Bugfix langchain_community] Fix PyMuPDFLoader (#29550)
- **Description:**  add legacy properties
    - **Issue:** #29470
    - **Twitter handle:** pprados
2025-02-04 09:24:40 -05:00
Ashutosh Kumar
65b404a2d1
[oci_generative_ai] Option to pass auth_file_location (#29481)
**PR title**: "community: Option to pass auth_file_location for
oci_generative_ai"

**Description:** Option to pass auth_file_location, to overwrite config
file default location "~/.oci/config" where profile name configs
present. This is not fixing any issues. Just added optional parameter
called "auth_file_location", which internally supported by any OCI
client including GenerativeAiInferenceClient.
2025-02-03 21:44:13 -05:00
Hemant Rawat
db1693aa70
community: fix issue #29429 in age_graph.py (#29506)
## Description:

This PR addresses issue #29429 by fixing the _wrap_query method in
langchain_community/graphs/age_graph.py. The method now correctly
handles Cypher queries with UNION and EXCEPT operators, ensuring that
the fields in the SQL query are ordered as they appear in the Cypher
query. Additionally, the method now properly handles cases where RETURN
* is not supported.

### Issue: #29429

### Dependencies: None


### Add tests and docs:

Added unit tests in tests/unit_tests/graphs/test_age_graph.py to
validate the changes.
No new integrations were added, so no example notebook is necessary.
Lint and test:

Ran make format, make lint, and make test to ensure code quality and
functionality.
2025-02-01 21:24:45 -05:00
ccurme
16a422f3fa
community: add standard tests for Perplexity (#29534) 2025-02-01 17:02:57 -05:00
Philippe PRADOS
ceda8bc050
community[minor]: 03 - Refactoring PyPDF parser (#29330)
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once.
This specific part focuses on updating the PyPDF parser.

For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).
2025-01-31 10:05:07 -05:00
Julian Castro Pulgarin
b7e3e337b1
community: Fix YahooFinanceNewsTool to handle updated yfinance data structure (#29498)
*Description:**
Updates the YahooFinanceNewsTool to handle the current yfinance news
data structure. The tool was failing with a KeyError due to changes in
the yfinance API's response format. This PR updates the code to
correctly extract news URLs from the new structure.

**Issue:** #29495

**Dependencies:** 
No new dependencies required. Works with existing yfinance package.

The changes maintain backwards compatibility while fixing the KeyError
that users were experiencing.

The modified code properly handles the new data structure where:
- News type is now at `content.contentType`
- News URL is now at `content.canonicalUrl.url`

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-31 02:31:44 +00:00
Mohammad Anash
12bcc85927
added operator filter for supabase (#29475)
Description
This PR adds support for MongoDB-style $in operator filtering in the
Supabase vectorstore implementation. Currently, filtering with $in
operators returns no results, even when matching documents exist. This
change properly translates MongoDB-style filters to PostgreSQL syntax,
enabling efficient multi-document filtering.
Changes

Modified similarity_search_by_vector_with_relevance_scores to handle
MongoDB-style $in operators
Added automatic conversion of $in filters to PostgreSQL IN clauses
Preserved original vector type handling and numpy array conversion
Maintained compatibility with existing postgrest filters
Added support for the same filtering in
similarity_search_by_vector_returning_embeddings

Issue
Closes #27932

Implementation Notes
No changes to public API or function signatures
Backwards compatible - behavior unchanged for non-$in filters
More efficient than multiple individual queries for multi-ID searches
Preserves all existing functionality including numpy array conversion
for vector types

Dependencies
None

Additional Notes
The implementation handles proper SQL escaping for filter values
Maintains consistent behavior with other vectorstore implementations
that support MongoDB-style operators
Future extensions could support additional MongoDB-style operators ($gt,
$lt, etc.)

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-29 14:24:18 +00:00
Michael Chin
e120378695
community: Additional AWS deprecations (#29447)
Added deprecation warnings for a few more classes that weremoved to
`langchain-aws` package:
- [SageMaker Endpoint
LLM](https://python.langchain.com/api_reference/aws/retrievers/langchain_aws.retrievers.bedrock.AmazonKnowledgeBasesRetriever.html)
- [Amazon Kendra
retriever](https://python.langchain.com/api_reference/aws/retrievers/langchain_aws.retrievers.kendra.AmazonKendraRetriever.html)
- [Amazon Bedrock Knowledge Bases
retriever](https://python.langchain.com/api_reference/aws/retrievers/langchain_aws.retrievers.bedrock.AmazonKnowledgeBasesRetriever.html)
2025-01-28 09:50:14 -05:00
Erick Friis
2d776351af
community: release 0.3.16 (#29452) 2025-01-28 07:44:54 +00:00
Adrián Panella
1551d9750c
community(doc_loaders): allow any credential type in AzureAIDocumentI… (#29289)
allow any credential type in AzureAIDocumentInteligence, not only
`api_key`.
This allows to use any of the credentials types integrated with AD.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-27 20:56:30 +00:00
Jorge Piedrahita Ortiz
3b886cdbb2
libs: add sambanova-lagchain integration package (#29417)
- **Description:**: Add sambanova-langchain integration package as
suggested in previous PRs

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-27 20:34:55 +00:00
Mohammad Anash
aba1fd0bd4
fixed similarity search with score error #29407 (#29413)
Description: Fix TypeError in AzureSearch similarity_search_with_score
by removing search_type from kwargs before passing to underlying
requests.

This resolves issue #29407 where search_type was being incorrectly
passed through to Session.request().
Issue: #29407

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-27 20:34:42 +00:00
Teruaki Ishizaki
3fce78994e
community: Fixed the procedure of initializing pad_token_id (#29434)
- **Description:** Add to check pad_token_id and eos_token_id of model
config. It seems that this is the same bug as the HuggingFace TGI bug.
In addition, the source code of
libs/partners/huggingface/langchain_huggingface/llms/huggingface_pipeline.py
also requires similar changes.
- **Issue:** #29431
- **Dependencies:** none
- **Twitter handle:** tell14
2025-01-27 14:54:54 -05:00
Loris Alexandre
e4921239a6
community: missing mandatory parameter partition_key for AzureCosmosDBNoSqlVectorSearch (#29382)
- **Description:** the `delete` function of
AzureCosmosDBNoSqlVectorSearch is using
`self._container.delete_item(document_id)` which miss a mandatory
parameter `partition_key`
We use the class function `delete_document_by_id` to provide a default
`partition_key`
- **Issue:** #29372 
- **Dependencies:** None
- **Twitter handle:** None

Co-authored-by: Loris Alexandre <loris.alexandre@boursorama.fr>
2025-01-23 10:05:10 -05:00
Terry Tan
ec0ebb76f2
community: fix Google Scholar tool errors (#29371)
Resolve https://github.com/langchain-ai/langchain/issues/27557
2025-01-23 10:03:01 -05:00
江同学呀
a1e62070d0
community: Fix the problem of error reporting when OCR extracts text from PDF. (#29378)
- **Description:** The issue has been fixed where images could not be
recognized from ```xObject[obj]["/Filter"]``` (whose value can be either
a string or a list of strings) in the ```_extract_images_from_page()```
method. It also resolves the bug where vectorization by Faiss fails due
to the failure of image extraction from a PDF containing only
images```IndexError: list index out of range```.

![69a60f3f6bd474641b9126d74bb18f7e](https://github.com/user-attachments/assets/dc9e098d-2862-49f7-93b0-00f1056727dc)

- **Issue:** 
    Fix the following issues:
[#15227 ](https://github.com/langchain-ai/langchain/issues/15227)
[#22892 ](https://github.com/langchain-ai/langchain/issues/22892)
[#26652 ](https://github.com/langchain-ai/langchain/issues/26652)
[#27153 ](https://github.com/langchain-ai/langchain/issues/27153)
    Related issues:
[#7067 ](https://github.com/langchain-ai/langchain/issues/7067)

- **Dependencies:** None
- **Twitter handle:** None

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-23 15:01:52 +00:00
Tim Mallezie
a13faab6b7
community; allow to set gitlab url in gitlab tool in constrictor (#29380)
This pr, expands the gitlab url so it can also be set in a constructor,
instead of only through env variables.

This allows to do something like this. 
```
       # Create the GitLab API wrapper
        gitlab_api = GitLabAPIWrapper(
            gitlab_url=self.gitlab_url,
            gitlab_personal_access_token=self.gitlab_personal_access_token,
            gitlab_repository=self.gitlab_repository,
            gitlab_branch=self.gitlab_branch,
            gitlab_base_branch=self.gitlab_base_branch,
        )
```
Where before you could not set the url in the constructor.

Co-authored-by: Tim Mallezie <tim.mallezie@dropsolid.com>
2025-01-23 09:36:27 -05:00
Macs Dickinson
7378c955db
community: adds support for getting github releases for the configured repository (#29318)
**Description:** adds support for github tool to query github releases
on the configure respository
**Issue:** N/A
**Dependencies:** N/A
**Twitter handle:** @macsdickinson

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-22 15:45:52 +00:00
Bhav Sardana
d6a7aaa97d
community: Fix for Pydantic model validator of GoogleApiClient (#29346)
- [ *] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Fix for pedantic model validator for GoogleApiHandler
    - **Issue:** the issue #29165 

- [ *] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified.

---------

Signed-off-by: Bhav Sardana <sardana.bhav@gmail.com>
2025-01-21 15:17:43 -05:00
Bagatur
536b44a47f
community[patch]: Release 0.3.15 (#29325) 2025-01-21 03:10:07 +00:00
Bagatur
923e6fb321
core[patch]: 0.3.31 (#29320) 2025-01-21 01:17:31 +00:00
Philippe PRADOS
4efc5093c1
community[minor]: Refactoring PyMuPDF parser, loader and add image blob parsers (#29063)
* Adds BlobParsers for images. These implementations can take an image
and produce one or more documents per image. This interface can be used
for exposing OCR capabilities.
* Update PyMuPDFParser and Loader to standardize metadata, handle
images, improve table extraction etc.

- **Twitter handle:** pprados

This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once.
This specific part focuses to prepare the update of all parsers.

For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-01-20 15:15:43 -05:00
CLOVA Studio 개발
7a95ffc775
community: fix some features on Naver ChatModel & embedding model 2 (#29243)
## Description
- Responding to `NCP API Key` changes.
- To fix `ChatClovaX` `astream` function to raise `SSEError` when an
error event occurs.
- To add `token length` and `ai_filter` to ChatClovaX's
`response_metadata`.
- To update document for apply NCP API Key changes.

cc. @efriis @vbarda
2025-01-20 11:01:03 -05:00
Sangyun_LEE
5d64597490
docs: fix broken Appearance of langchain_community/document_loaders/recursive_url_loader API Reference (#29305)
# PR mesesage
## Description
Fixed a broken Appearance of RecurisveUrlLoader API Reference.

### Before
<p align="center">
<img width="750" alt="image"
src="https://github.com/user-attachments/assets/f39df65d-b788-411d-88af-8bfa2607c00b"
/>
<img width="750" alt="image"
src="https://github.com/user-attachments/assets/b8a92b70-4548-4b4a-965f-026faeebd0ec"
/>
</p>

### After
<p align="center">
<img width="750" alt="image"
src="https://github.com/user-attachments/assets/8ea28146-de45-42e2-b346-3004ec4dfc55"
/>
<img width="750" alt="image"
src="https://github.com/user-attachments/assets/914c6966-4055-45d3-baeb-2d97eab06fe7"
/>
</p>

## Issue:
N/A
## Dependencies
None
## Twitter handle
N/A

# Add tests and docs
Not applicable; this change only affects documentation.

# Lint and test
Ran make format, make lint, and make test to ensure no issues.
2025-01-20 10:56:59 -05:00
Mohammad Mohtashim
b5fbebb3c8
(Community): Changing the BaseURL and Model for MiniMax (#29299)
- **Description:** Changed the Base Default Model and Base URL to
correct versions. Plus added a more explicit exception if user provides
an invalid API Key
- **Issue:** #29278
2025-01-19 14:15:02 -05:00
ThomasSaulou
e9abe583b2
chatperplexity stream-citations in additional kwargs (#29273)
chatperplexity stream-citations in additional kwargs

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-18 22:31:10 +00:00
TheSongg
1cd4d8d101
[langchain_community.llms.xinference]: Rewrite _stream() method and support stream() method in xinference.py (#29259)
- [ ] **PR title**:[langchain_community.llms.xinference]: Rewrite
_stream() method and support stream() method in xinference.py

- [ ] **PR message**: Rewrite the _stream method so that the
chain.stream() can be used to return data streams.

       chain = prompt | llm
       chain.stream(input=user_input)


- [ ] **tests**: 
      from langchain_community.llms import Xinference
      from langchain.prompts import PromptTemplate

      llm = Xinference(
server_url="http://0.0.0.0:9997", # replace your xinference server url
model_uid={model_uid} # replace model_uid with the model UID return from
launching the model
          stream = True
       )
prompt = PromptTemplate(input=['country'], template="Q: where can we
visit in the capital of {country}? A:")
      chain = prompt | llm
      chain.stream(input={'country': 'France'})
2025-01-17 20:31:59 -05:00
Luis Lopez
75663f2cae
community: Add cost per 1K tokens for fine-tuned model cached input (#29248)
### Description

- Since there is no cost per 1k input tokens for a fine-tuned cached
version of `gpt-4o-mini-2024-07-18` is not available when using the
`OpenAICallbackHandler`, it raises an error when trying to make calls
with such model.
- To add the price in the `MODEL_COST_PER_1K_TOKENS` dictionary

cc. @efriis
2025-01-16 15:19:26 -05:00
Junon
667d2a57fd
add mode arg to OBSFileLoader.load() method (#29246)
- **Description:** add mode arg to OBSFileLoader.load() method
  - **Issue:** #29245
  - **Dependencies:** no dependencies required for this change

---------

Co-authored-by: Junon_Gz <junon_gz@qq.com>
2025-01-16 11:09:04 -05:00
Nadeem Sajjad
eaf2fb287f
community(pypdfloader): added page_label in metadata for pypdf loader (#29225)
# Description

## Summary
This PR adds support for handling multi-labeled page numbers in the
**PyPDFLoader**. Some PDFs use complex page numbering systems where the
actual content may begin after multiple introductory pages. The
page_label field helps accurately reflect the document’s page structure,
making it easier to handle such cases during document parsing.

## Motivation
This feature improves document parsing accuracy by allowing users to
access the actual page labels instead of relying only on the physical
page numbers. This is particularly useful for documents where the first
few pages have roman numerals or other non-standard page labels.

## Use Case
This feature is especially useful for **Retrieval-Augmented Generation**
(RAG) systems where users may reference page numbers when asking
questions. Some PDFs have both labeled page numbers (like roman numerals
for introductory sections) and index-based page numbers.

For example, a user might ask:

	"What is mentioned on page 5?"

The system can now check both:
	•	**Index-based page number** (page)
	•	**Labeled page number** (page_label)

This dual-check helps improve retrieval accuracy. Additionally, the
results can be validated with an **agent or tool** to ensure the
retrieved pages match the user’s query contextually.

## Code Changes

- Added a page_label field to the metadata of the Document class in
**PyPDFLoader**.
- Implemented support for retrieving page_label from the
pdf_reader.page_labels.
- Created a test case (test_pypdf_loader_with_multi_label_page_numbers)
with a sample PDF containing multi-labeled pages
(geotopo-komprimiert.pdf) [[Source of
pdf](https://github.com/py-pdf/sample-files/blob/main/009-pdflatex-geotopo/GeoTopo-komprimiert.pdf)].
- Updated existing tests to ensure compatibility and verify page_label
extraction.

## Tests Added

- Added a new test case for a PDF with multi-labeled pages.
- Verified both page and page_label metadata fields are correctly
extracted.

## Screenshots

<img width="549" alt="image"
src="https://github.com/user-attachments/assets/65db9f5c-032e-4592-926f-824777c28f33"
/>
2025-01-15 14:18:07 -05:00
TheSongg
4867fe7ac8
[langchain_community.llms.xinference]: fix error in xinference.py (#29216)
- [ ] **PR title**: [langchain_community.llms.xinference]: fix error in
xinference.py

- [ ] **PR message**:
- The old code raised an ValidationError:
pydantic_core._pydantic_core.ValidationError: 1 validation error for
Xinference when import Xinference from xinference.py. This issue has
been resolved by adjusting it's type and default value.

File "/media/vdc/python/lib/python3.10/site-packages/pydantic/main.py",
line 212, in __init__
validated_self = self.__pydantic_validator__.validate_python(data,
self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for
Xinference
        client
Field required [type=missing, input_value={'server_url':
'http://10...t4', 'model_kwargs': {}}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.9/v/missing

- [ ] **tests**:

       from langchain_community.llms import Xinference
       llm = Xinference(
server_url="http://0.0.0.0:9997", # replace your xinference server url
model_uid={model_uid} # replace model_uid with the model UID return from
launching the model
         )
2025-01-15 10:11:26 -05:00
Syed Baqar Abbas
4278046329
[fix] Convert table names to list for compatibility in SQLDatabase (#29229)
- [langchain_community.utilities.SQLDatabase] **[fix] Convert table
names to list for compatibility in SQLDatabase**:
  - The issue #29227 is being fixed here
  - The "package" modified is community
  - The issue lied in this block of code:

44b41b699c/libs/community/langchain_community/utilities/sql_database.py (L72-L77)

- [langchain_community.utilities.SQLDatabase] **[fix] Convert table
names to list for compatibility in SQLDatabase**:
- **Description:** When the SQLDatabase is initialized, it runs a code
`self._inspector.get_table_names(schema=schema)` which expects an output
of list. However, with some connectors (such as snowflake) the data type
returned could be another iterable. This results in a type error when
concatenating the table_names to view_names. I have added explicit type
casting to prevent this.
    - **Issue:** The issue #29227 is being fixed here
    - **Dependencies:** None
    - **Twitter handle:** @BaqarAbbas2001

## Additional Information
When the following method is called for a Snowflake database:

44b41b699c/libs/community/langchain_community/utilities/sql_database.py (L75)

Snowflake under the hood calls:
```python
from snowflake.sqlalchemy.snowdialect import SnowflakeDialect
SnowflakeDialect.get_table_names
```

This method returns a `dict_keys()` object which is incompatible to
concatenate with a list and results in a `TypeError`

### Relevant Library Versions
- **snowflake-sqlalchemy**: 1.7.2  
- **snowflake-connector-python**: 3.12.4  
- **sqlalchemy**: 2.0.20  
- **langchain_community**: 0.3.14
2025-01-15 10:00:03 -05:00