Commit Graph

6495 Commits

Author SHA1 Message Date
Erick Friis
50d61eafa2
partners/deepseek: release 0.1.1 (#29592) 2025-02-04 23:46:38 +00:00
Erick Friis
7edfcbb090
docs: rename to langchain-deepseek in docs (#29587) 2025-02-04 14:22:17 -08:00
Erick Friis
df8fa882b2
deepseek: bump core (#29584) 2025-02-04 10:25:46 -08:00
Erick Friis
455f65947a
deepseek: rename to langchain-deepseek from langchain-deepseek-official (#29583) 2025-02-04 17:57:25 +00:00
Philippe PRADOS
5771e561fb
[Bugfix langchain_community] Fix PyMuPDFLoader (#29550)
- **Description:**  add legacy properties
    - **Issue:** #29470
    - **Twitter handle:** pprados
2025-02-04 09:24:40 -05:00
Ashutosh Kumar
65b404a2d1
[oci_generative_ai] Option to pass auth_file_location (#29481)
**PR title**: "community: Option to pass auth_file_location for
oci_generative_ai"

**Description:** Option to pass auth_file_location, to overwrite config
file default location "~/.oci/config" where profile name configs
present. This is not fixing any issues. Just added optional parameter
called "auth_file_location", which internally supported by any OCI
client including GenerativeAiInferenceClient.
2025-02-03 21:44:13 -05:00
Teruaki Ishizaki
aeb42dc900
partners: Fixed the procedure of initializing pad_token_id (#29500)
- **Description:** Add to check pad_token_id and eos_token_id of model
config. It seems that this is the same bug as the HuggingFace TGI bug.
It's same bug as #29434
- **Issue:** #29431
- **Dependencies:** none
- **Twitter handle:** tell14

Example code is followings:
```python
from langchain_huggingface.llms import HuggingFacePipeline

hf = HuggingFacePipeline.from_model_id(
    model_id="meta-llama/Llama-3.2-3B-Instruct",
    task="text-generation",
    pipeline_kwargs={"max_new_tokens": 10},
)

from langchain_core.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = prompt | hf

question = "What is electroencephalography?"

print(chain.invoke({"question": question}))
```
2025-02-03 21:40:33 -05:00
AmirPoursaberi
a6efd22ba1
Fix a tiny typo in create_retrieval_chain docstring (#29552)
Hi there!

To fix a tiny typo in `create_retrieval_chain` docstring.
2025-02-03 10:54:49 -05:00
Hemant Rawat
db1693aa70
community: fix issue #29429 in age_graph.py (#29506)
## Description:

This PR addresses issue #29429 by fixing the _wrap_query method in
langchain_community/graphs/age_graph.py. The method now correctly
handles Cypher queries with UNION and EXCEPT operators, ensuring that
the fields in the SQL query are ordered as they appear in the Cypher
query. Additionally, the method now properly handles cases where RETURN
* is not supported.

### Issue: #29429

### Dependencies: None


### Add tests and docs:

Added unit tests in tests/unit_tests/graphs/test_age_graph.py to
validate the changes.
No new integrations were added, so no example notebook is necessary.
Lint and test:

Ran make format, make lint, and make test to ensure code quality and
functionality.
2025-02-01 21:24:45 -05:00
Keenan Pepper
2f97916dea
docs: Add goodfire notebook and add to packages.yml (#29512)
- **Description:** Add Goodfire ipynb notebook and add
langchain-goodfire package to packages.yml
- **Issue:** n/a
- **Dependencies:** docs only
- **Twitter handle:** keenanpepper

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-01 19:43:20 -05:00
ccurme
a3c5e4d070
deepseek[patch]: bump langchain-openai and add to scheduled testing (#29535) 2025-02-01 18:40:59 -05:00
ccurme
16a422f3fa
community: add standard tests for Perplexity (#29534) 2025-02-01 17:02:57 -05:00
Amit Ghadge
0c405245c4
[Integrations][Tool] Added Jenkins tools support (#29516)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-31 12:50:10 -05:00
Christophe Bornet
aab2e42169
core[patch]: Use Blockbuster to detect blocking calls in asyncio during tests (#29043)
This PR uses the [blockbuster](https://github.com/cbornet/blockbuster)
library in langchain-core to detect blocking calls made in the asyncio
event loop during unit tests.
Avoiding blocking calls is hard as these can be deeply buried in the
code or made in 3rd party libraries.
Blockbuster makes it easier to detect them by raising an exception when
a call is made to a known blocking function (eg: `time.sleep`).

Adding blockbuster allowed to find a blocking call in
`aconfig_with_context` (it ends up calling `get_function_nonlocals`
which loads function code).

**Dependencies:**
- blockbuster (test)

**Twitter handle:** cbornet_
2025-01-31 10:06:34 -05:00
Philippe PRADOS
ceda8bc050
community[minor]: 03 - Refactoring PyPDF parser (#29330)
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once.
This specific part focuses on updating the PyPDF parser.

For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).
2025-01-31 10:05:07 -05:00
Julian Castro Pulgarin
b7e3e337b1
community: Fix YahooFinanceNewsTool to handle updated yfinance data structure (#29498)
*Description:**
Updates the YahooFinanceNewsTool to handle the current yfinance news
data structure. The tool was failing with a KeyError due to changes in
the yfinance API's response format. This PR updates the code to
correctly extract news URLs from the new structure.

**Issue:** #29495

**Dependencies:** 
No new dependencies required. Works with existing yfinance package.

The changes maintain backwards compatibility while fixing the KeyError
that users were experiencing.

The modified code properly handles the new data structure where:
- News type is now at `content.contentType`
- News URL is now at `content.canonicalUrl.url`

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-31 02:31:44 +00:00
Erick Friis
332e303858
partners/mistralai: release 0.2.6 (#29491) 2025-01-29 22:23:14 +00:00
Erick Friis
2c795f5628
partners/openai: release 0.3.3 (#29490) 2025-01-29 22:23:03 +00:00
Erick Friis
f307b3cc5f
langchain: release 0.3.17 (#29485) 2025-01-29 22:22:49 +00:00
Erick Friis
5cad3683b4
partners/groq: release 0.2.4 (#29488) 2025-01-29 22:22:30 +00:00
Erick Friis
e074c26a6b
partners/fireworks: release 0.2.7 (#29487) 2025-01-29 22:22:18 +00:00
Erick Friis
685609e1ef
partners/anthropic: release 0.3.5 (#29486) 2025-01-29 22:22:11 +00:00
Erick Friis
ed3a5e664c
standard-tests: release 0.3.10 (#29484) 2025-01-29 22:21:05 +00:00
Erick Friis
29461b36d9
partners/ollama: release 0.2.3 (#29489) 2025-01-29 22:19:44 +00:00
Erick Friis
07e2e80fe7
core: release 0.3.33 (#29483) 2025-01-29 14:11:53 -08:00
Erick Friis
8f95da4eb1
multiple: structured output tracing standard metadata (#29421)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-29 14:00:26 -08:00
ccurme
284c935b08
tests[patch]: improve coverage of structured output tests (#29478) 2025-01-29 14:52:09 -05:00
Matheus Torquato
7aae738296
docs:Fix Imports for Document and BaseRetriever (#29473)
This pull request addresses an issue with import statements in the
langchain_core/retrievers.py file. The following changes have been made:

Corrected the import for Document from langchain_core.documents.base.
Corrected the import for BaseRetriever from langchain_core.retrievers.
These changes ensure that the SimpleRetriever class can correctly
reference the Document and BaseRetriever classes, improving code
reliability and maintainability.

---------

Co-authored-by: Matheus Torquato <mtorquat@jaguarlandrover.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-29 14:32:05 +00:00
Mohammad Anash
12bcc85927
added operator filter for supabase (#29475)
Description
This PR adds support for MongoDB-style $in operator filtering in the
Supabase vectorstore implementation. Currently, filtering with $in
operators returns no results, even when matching documents exist. This
change properly translates MongoDB-style filters to PostgreSQL syntax,
enabling efficient multi-document filtering.
Changes

Modified similarity_search_by_vector_with_relevance_scores to handle
MongoDB-style $in operators
Added automatic conversion of $in filters to PostgreSQL IN clauses
Preserved original vector type handling and numpy array conversion
Maintained compatibility with existing postgrest filters
Added support for the same filtering in
similarity_search_by_vector_returning_embeddings

Issue
Closes #27932

Implementation Notes
No changes to public API or function signatures
Backwards compatible - behavior unchanged for non-$in filters
More efficient than multiple individual queries for multi-ID searches
Preserves all existing functionality including numpy array conversion
for vector types

Dependencies
None

Additional Notes
The implementation handles proper SQL escaping for filter values
Maintains consistent behavior with other vectorstore implementations
that support MongoDB-style operators
Future extensions could support additional MongoDB-style operators ($gt,
$lt, etc.)

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-29 14:24:18 +00:00
ccurme
585f467d4a
mistral[patch]: release 0.2.5 (#29463) 2025-01-28 18:29:54 -05:00
ccurme
ca9d4e4595
mistralai: support method="json_schema" in structured output (#29461)
https://docs.mistral.ai/capabilities/structured-output/custom_structured_output/
2025-01-28 18:17:39 -05:00
Michael Chin
e120378695
community: Additional AWS deprecations (#29447)
Added deprecation warnings for a few more classes that weremoved to
`langchain-aws` package:
- [SageMaker Endpoint
LLM](https://python.langchain.com/api_reference/aws/retrievers/langchain_aws.retrievers.bedrock.AmazonKnowledgeBasesRetriever.html)
- [Amazon Kendra
retriever](https://python.langchain.com/api_reference/aws/retrievers/langchain_aws.retrievers.kendra.AmazonKendraRetriever.html)
- [Amazon Bedrock Knowledge Bases
retriever](https://python.langchain.com/api_reference/aws/retrievers/langchain_aws.retrievers.bedrock.AmazonKnowledgeBasesRetriever.html)
2025-01-28 09:50:14 -05:00
Erick Friis
2d776351af
community: release 0.3.16 (#29452) 2025-01-28 07:44:54 +00:00
Erick Friis
737a68fcdc
langchain: release 0.3.16 (#29451) 2025-01-28 07:31:09 +00:00
Erick Friis
8bf9c71673
core: release 0.3.32 (#29450) 2025-01-28 07:20:04 +00:00
Erick Friis
ecdc881328
langchain: add deepseek provider to init chat model (#29449) 2025-01-27 23:13:59 -08:00
Erick Friis
dced0ed3fd
deepseek, docs: chatdeepseek integration added (#29445) 2025-01-28 06:32:58 +00:00
Isaac Francisco
2bb2c9bfe8
change behavior for converting a string to openai messages (#29446) 2025-01-27 18:18:54 -08:00
ccurme
b1fdac726b
groq[patch]: update model used in test (#29441)
`llama-3.1-70b-versatile` was [shut
down](https://console.groq.com/docs/deprecations).
2025-01-27 21:11:44 +00:00
Adrián Panella
1551d9750c
community(doc_loaders): allow any credential type in AzureAIDocumentI… (#29289)
allow any credential type in AzureAIDocumentInteligence, not only
`api_key`.
This allows to use any of the credentials types integrated with AD.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-27 20:56:30 +00:00
ccurme
f00c66cc1f
chroma[patch]: release 0.2.1 (#29440) 2025-01-27 20:41:35 +00:00
Jorge Piedrahita Ortiz
3b886cdbb2
libs: add sambanova-lagchain integration package (#29417)
- **Description:**: Add sambanova-langchain integration package as
suggested in previous PRs

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-27 20:34:55 +00:00
Mohammad Anash
aba1fd0bd4
fixed similarity search with score error #29407 (#29413)
Description: Fix TypeError in AzureSearch similarity_search_with_score
by removing search_type from kwargs before passing to underlying
requests.

This resolves issue #29407 where search_type was being incorrectly
passed through to Session.request().
Issue: #29407

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-27 20:34:42 +00:00
itaismith
7b404fcd37
partners[chroma]: Upgrade Chroma to 0.6.x (#29404)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-01-27 15:32:21 -05:00
Teruaki Ishizaki
3fce78994e
community: Fixed the procedure of initializing pad_token_id (#29434)
- **Description:** Add to check pad_token_id and eos_token_id of model
config. It seems that this is the same bug as the HuggingFace TGI bug.
In addition, the source code of
libs/partners/huggingface/langchain_huggingface/llms/huggingface_pipeline.py
also requires similar changes.
- **Issue:** #29431
- **Dependencies:** none
- **Twitter handle:** tell14
2025-01-27 14:54:54 -05:00
Christophe Bornet
dbb6b7b103
core: Add ruff rules TRY (tryceratops) (#29388)
TRY004 ("use TypeError rather than ValueError") existing errors are
marked as ignore to preserve backward compatibility.
LMK if you prefer to fix some of them.

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-24 05:01:40 +00:00
Erick Friis
723b603f52
docs: groq api key links (#29402) 2025-01-24 04:33:18 +00:00
ccurme
bbc50f65e7
anthropic[patch]: release 0.3.4 (#29399) 2025-01-23 23:55:58 +00:00
ccurme
ed797e17fb
anthropic[patch]: always return content blocks if citations are generated (#29398)
We currently return string (and therefore no content blocks / citations)
if the response is of the form
```
[
    {"text": "a claim", "citations": [...]},
]
```

There are other cases where we do return citations as-is:
```
[
    {"text": "a claim", "citations": [...]},
    {"text": "some other text"},
    {"text": "another claim", "citations": [...]},
]
```
Here we update to return content blocks including citations in the first
case as well.
2025-01-23 18:47:23 -05:00
Bagatur
317fb86fd9
openai[patch]: fix int test (#29395) 2025-01-23 21:23:01 +00:00
Bagatur
8d566a8fe7
openai[patch]: detect old models in with_structured_output (#29392)
Co-authored-by: ccurme <chester.curme@gmail.com>
2025-01-23 20:47:32 +00:00
Christophe Bornet
b6ae7ca91d
core: Cache RunnableLambda __repr__ (#29199)
`RunnableLambda`'s `__repr__` may do costly OS operation by calling
`get_lambda_source`.
So it's better to cache it.
See #29043

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-23 18:34:47 +00:00
Christophe Bornet
618e550f06
core: Cache RunnableLambda deps (#29200)
`RunnableLambda`'s `deps` may do costly OS operation by calling
`get_function_nonlocals`.
So it's better to cache it.
See #29043
2025-01-23 13:09:07 -05:00
ccurme
f795ab99ec
docs: fix title rendered for integration package (#29387)
"Tilores LangchAIn" -> "Tilores"
2025-01-23 12:21:19 -05:00
Stefan Berkner
8977451c76
docs: add Tilores provider and tools (#29244)
Description: This PR adds documentation for the Tilores provider and
tools.
Issue: closes #26320
2025-01-23 12:17:59 -05:00
Ahmed Tammaa
d5b8aabb32
text-splitters[patch]: delete unused html_chunks_with_headers.xslt (#29340)
This pull request removes the now-unused html_chunks_with_headers.xslt
file from the codebase. In a previous update ([PR
#27678](https://github.com/langchain-ai/langchain/pull/27678)), the
HTMLHeaderTextSplitter class was refactored to utilize BeautifulSoup
instead of lxml and XSLT for HTML processing. As a result, the
html_chunks_with_headers.xslt file is no longer necessary and can be
safely deleted to maintain code cleanliness and reduce potential
confusion.

Issue: N/A

Dependencies: N/A
2025-01-23 11:29:08 -05:00
Wang Ran (汪然)
8f2c11e17b
core[patch]: fix API reference for draw_ascii (#29370)
typo: no `draw` but `draw_ascii` and other things

now, it works:
<img width="688" alt="image"
src="https://github.com/user-attachments/assets/5b5a8cc2-cf81-4a5c-b443-da0e4426556c"
/>

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-23 16:04:58 +00:00
Loris Alexandre
e4921239a6
community: missing mandatory parameter partition_key for AzureCosmosDBNoSqlVectorSearch (#29382)
- **Description:** the `delete` function of
AzureCosmosDBNoSqlVectorSearch is using
`self._container.delete_item(document_id)` which miss a mandatory
parameter `partition_key`
We use the class function `delete_document_by_id` to provide a default
`partition_key`
- **Issue:** #29372 
- **Dependencies:** None
- **Twitter handle:** None

Co-authored-by: Loris Alexandre <loris.alexandre@boursorama.fr>
2025-01-23 10:05:10 -05:00
Terry Tan
ec0ebb76f2
community: fix Google Scholar tool errors (#29371)
Resolve https://github.com/langchain-ai/langchain/issues/27557
2025-01-23 10:03:01 -05:00
江同学呀
a1e62070d0
community: Fix the problem of error reporting when OCR extracts text from PDF. (#29378)
- **Description:** The issue has been fixed where images could not be
recognized from ```xObject[obj]["/Filter"]``` (whose value can be either
a string or a list of strings) in the ```_extract_images_from_page()```
method. It also resolves the bug where vectorization by Faiss fails due
to the failure of image extraction from a PDF containing only
images```IndexError: list index out of range```.

![69a60f3f6bd474641b9126d74bb18f7e](https://github.com/user-attachments/assets/dc9e098d-2862-49f7-93b0-00f1056727dc)

- **Issue:** 
    Fix the following issues:
[#15227 ](https://github.com/langchain-ai/langchain/issues/15227)
[#22892 ](https://github.com/langchain-ai/langchain/issues/22892)
[#26652 ](https://github.com/langchain-ai/langchain/issues/26652)
[#27153 ](https://github.com/langchain-ai/langchain/issues/27153)
    Related issues:
[#7067 ](https://github.com/langchain-ai/langchain/issues/7067)

- **Dependencies:** None
- **Twitter handle:** None

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-23 15:01:52 +00:00
Tim Mallezie
a13faab6b7
community; allow to set gitlab url in gitlab tool in constrictor (#29380)
This pr, expands the gitlab url so it can also be set in a constructor,
instead of only through env variables.

This allows to do something like this. 
```
       # Create the GitLab API wrapper
        gitlab_api = GitLabAPIWrapper(
            gitlab_url=self.gitlab_url,
            gitlab_personal_access_token=self.gitlab_personal_access_token,
            gitlab_repository=self.gitlab_repository,
            gitlab_branch=self.gitlab_branch,
            gitlab_base_branch=self.gitlab_base_branch,
        )
```
Where before you could not set the url in the constructor.

Co-authored-by: Tim Mallezie <tim.mallezie@dropsolid.com>
2025-01-23 09:36:27 -05:00
Tyllen
f2ea62f632
docs: add payman docs (#29362)
- **Description:** Adding the docs to use the payman-langchain
integration :)
2025-01-22 18:37:47 -08:00
Erick Friis
3f1d20964a
standard-tests: release 0.3.9 (#29356) 2025-01-22 09:46:19 -08:00
Macs Dickinson
7378c955db
community: adds support for getting github releases for the configured repository (#29318)
**Description:** adds support for github tool to query github releases
on the configure respository
**Issue:** N/A
**Dependencies:** N/A
**Twitter handle:** @macsdickinson

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-22 15:45:52 +00:00
Tayaa Med Amine
ef1610e24a
langchain[patch]: support ollama in init_embeddings (#29349)
Why not Ollama ?

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-22 14:47:12 +00:00
Siddhant
9eb10a9240
langchain: added vectorstore docstring linting (#29241)
…ore.py

Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"
  
Added docstring linting in the vectorstore.py file relating to issue
#25154


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Siddhant Jain <sjain35@buffalo.edu>
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 03:47:43 +00:00
Sohan
de1fc4811d
packages, docs: Pipeshift - Langchain integration of pipeshift (#29114)
Description: Added pipeshift integration. This integrates pipeshift LLM
and ChatModels APIs with langchain
Dependencies: none

Unit Tests & Integration tests are added

Documentation is added as well

This PR is w.r.t
[#27390](https://github.com/langchain-ai/langchain/pull/27390) and as
per request, a freshly minted `langchain-pipeshift` package is uploaded
to PYPI. Only changes to the docs & packages.yml are made in langchain
master branch

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 03:03:06 +00:00
Christophe Bornet
836c791829
text-splitters: Bump ruff version to 0.9 (#29231)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 00:27:58 +00:00
Christophe Bornet
a004dec119
langchain: Bump ruff version to 0.9 (#29211)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 00:26:39 +00:00
Christophe Bornet
2340b3154d
standard-tests: Bump ruff version to 0.9 (#29230)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 00:23:01 +00:00
Christophe Bornet
e4a78dfc2a
core: Bump ruff version to 0.9 (#29201)
Also run some preview autofix and formatting

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 00:20:09 +00:00
Ella Charlaix
6f95db81b7
huggingface: Add IPEX models support (#29179)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 00:16:44 +00:00
Bhav Sardana
d6a7aaa97d
community: Fix for Pydantic model validator of GoogleApiClient (#29346)
- [ *] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Fix for pedantic model validator for GoogleApiHandler
    - **Issue:** the issue #29165 

- [ *] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified.

---------

Signed-off-by: Bhav Sardana <sardana.bhav@gmail.com>
2025-01-21 15:17:43 -05:00
Christophe Bornet
1c4ce7b42b
core: Auto-fix some docstrings (#29337) 2025-01-21 13:29:53 -05:00
ccurme
86a0720310
fireworks[patch]: update model used in integration tests (#29342)
No access to firefunction-v1 and -v2.
2025-01-21 11:05:30 -05:00
Hugo Berg
32c9c58adf
Community: fix missing f-string modifier in oai structured output parsing error (#29326)
- **Description:** The ValueError raised on certain structured-outputs
parsing errors, in langchain openai community integration, was missing a
f-string modifier and so didn't produce useful outputs. This is a
2-line, 2-character change.
- **Issue:** None open that this fixes
- **Dependencies:** Nothing changed
- **Twitter handle:** None

- [X] **Add tests and docs**: There's nothing to add for.
- [-] **Lint and test**: Happy to run this if you deem it necessary.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-21 14:26:38 +00:00
Nuno Campos
566915d7cf
core: fix call to get closure vars for partial-wrapped funcs (#29316)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-01-21 09:26:15 -05:00
ZhangShenao
33e22ccb19
[Doc] Improve api doc (#29324)
- Fix doc description
- Add static method decorator
2025-01-21 09:16:08 -05:00
Bagatur
536b44a47f
community[patch]: Release 0.3.15 (#29325) 2025-01-21 03:10:07 +00:00
Bagatur
ec5fae76d4
langchain[patch]: Release 0.3.15 (#29322) 2025-01-21 02:24:11 +00:00
Bagatur
923e6fb321
core[patch]: 0.3.31 (#29320) 2025-01-21 01:17:31 +00:00
Ahmed Tammaa
d3ed9b86be
text-splitters[minor]: Replace lxml and XSLT with BeautifulSoup in HTMLHeaderTextSplitter for Improved Large HTML File Processing (#27678)
This pull request updates the `HTMLHeaderTextSplitter` by replacing the
`split_text_from_file` method's implementation. The original method used
`lxml` and XSLT for processing HTML files, which caused
`lxml.etree.xsltapplyerror maxhead` when handling large HTML documents
due to limitations in the XSLT processor. Fixes #13149

By switching to BeautifulSoup (`bs4`), we achieve:

- **Improved Performance and Reliability:** BeautifulSoup efficiently
processes large HTML files without the errors associated with `lxml` and
XSLT.
- **Simplified Dependencies:** Removes the dependency on `lxml` and
external XSLT files, relying instead on the widely used `beautifulsoup4`
library.
- **Maintained Functionality:** The new method replicates the original
behavior, ensuring compatibility with existing code and preserving the
extraction of content and metadata.

**Issue:**

This change addresses issues related to processing large HTML files with
the existing `HTMLHeaderTextSplitter` implementation. It resolves
problems where users encounter lxml.etree.xsltapplyerror maxhead due to
large HTML documents.

**Dependencies:**

- **BeautifulSoup (`beautifulsoup4`):** The `beautifulsoup4` library is
now used for parsing HTML content.
  - Installation: `pip install beautifulsoup4`

**Code Changes:**

Updated the `split_text_from_file` method in `HTMLHeaderTextSplitter` as
follows:

```python
def split_text_from_file(self, file: Any) -> List[Document]:
    """Split HTML file using BeautifulSoup.

    Args:
        file: HTML file path or file-like object.

    Returns:
        List of Document objects with page_content and metadata.
    """
    from bs4 import BeautifulSoup
    from langchain.docstore.document import Document
    import bs4

    # Read the HTML content from the file or file-like object
    if isinstance(file, str):
        with open(file, 'r', encoding='utf-8') as f:
            html_content = f.read()
    else:
        # Assuming file is a file-like object
        html_content = file.read()

    # Parse the HTML content using BeautifulSoup
    soup = BeautifulSoup(html_content, 'html.parser')

    # Extract the header tags and their corresponding metadata keys
    headers_to_split_on = [tag[0] for tag in self.headers_to_split_on]
    header_mapping = dict(self.headers_to_split_on)

    documents = []

    # Find the body of the document
    body = soup.body if soup.body else soup

    # Find all header tags in the order they appear
    all_headers = body.find_all(headers_to_split_on)

    # If there's content before the first header, collect it
    first_header = all_headers[0] if all_headers else None
    if first_header:
        pre_header_content = ''
        for elem in first_header.find_all_previous():
            if isinstance(elem, bs4.Tag):
                text = elem.get_text(separator=' ', strip=True)
                if text:
                    pre_header_content = text + ' ' + pre_header_content
        if pre_header_content.strip():
            documents.append(Document(
                page_content=pre_header_content.strip(),
                metadata={}  # No metadata since there's no header
            ))
    else:
        # If no headers are found, return the whole content
        full_text = body.get_text(separator=' ', strip=True)
        if full_text.strip():
            documents.append(Document(
                page_content=full_text.strip(),
                metadata={}
            ))
        return documents

    # Process each header and its associated content
    for header in all_headers:
        current_metadata = {}
        header_name = header.name
        header_text = header.get_text(separator=' ', strip=True)
        current_metadata[header_mapping[header_name]] = header_text

        # Collect all sibling elements until the next header of the same or higher level
        content_elements = []
        for sibling in header.find_next_siblings():
            if sibling.name in headers_to_split_on:
                # Stop at the next header
                break
            if isinstance(sibling, bs4.Tag):
                content_elements.append(sibling)

        # Get the text content of the collected elements
        current_content = ''
        for elem in content_elements:
            text = elem.get_text(separator=' ', strip=True)
            if text:
                current_content += text + ' '

        # Create a Document if there is content
        if current_content.strip():
            documents.append(Document(
                page_content=current_content.strip(),
                metadata=current_metadata.copy()
            ))
        else:
            # If there's no content, but we have metadata, still create a Document
            documents.append(Document(
                page_content='',
                metadata=current_metadata.copy()
            ))

    return documents
```

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-01-20 16:10:37 -05:00
Christophe Bornet
989eec4b7b
core: Add ruff rule S101 (no assert) (#29267)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-01-20 20:24:31 +00:00
Christophe Bornet
e5d62c6ce7
core: Add ruff rule W293 (whitespaces) (#29272) 2025-01-20 15:16:12 -05:00
Philippe PRADOS
4efc5093c1
community[minor]: Refactoring PyMuPDF parser, loader and add image blob parsers (#29063)
* Adds BlobParsers for images. These implementations can take an image
and produce one or more documents per image. This interface can be used
for exposing OCR capabilities.
* Update PyMuPDFParser and Loader to standardize metadata, handle
images, improve table extraction etc.

- **Twitter handle:** pprados

This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once.
This specific part focuses to prepare the update of all parsers.

For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-01-20 15:15:43 -05:00
Syed Baqar Abbas
f175319303
[feat] Added backwards compatibility for OllamaEmbeddings initialization (migration from langchain_community.embeddings to langchain_ollama.embeddings (#29296)
- [feat] **Added backwards compatibility for OllamaEmbeddings
initialization (migration from `langchain_community.embeddings` to
`langchain_ollama.embeddings`**: "langchain_ollama"
- **Description:** Given that `OllamaEmbeddings` from
`langchain_community.embeddings` is deprecated, code is being shifted to
``langchain_ollama.embeddings`. However, this does not offer backward
compatibility of initializing the parameters and `OllamaEmbeddings`
object.
    - **Issue:** #29294 
    - **Dependencies:** None
    - **Twitter handle:** @BaqarAbbas2001


## Additional Information
Previously, `OllamaEmbeddings` from `langchain_community.embeddings`
used to support the following options:

e9abe583b2/libs/community/langchain_community/embeddings/ollama.py (L125-L139)

However, in the new package `from langchain_ollama import
OllamaEmbeddings`, there is no method to set these options. I have added
these parameters to resolve this issue.

This issue was also discussed in
https://github.com/langchain-ai/langchain/discussions/29113
2025-01-20 11:16:29 -05:00
CLOVA Studio 개발
7a95ffc775
community: fix some features on Naver ChatModel & embedding model 2 (#29243)
## Description
- Responding to `NCP API Key` changes.
- To fix `ChatClovaX` `astream` function to raise `SSEError` when an
error event occurs.
- To add `token length` and `ai_filter` to ChatClovaX's
`response_metadata`.
- To update document for apply NCP API Key changes.

cc. @efriis @vbarda
2025-01-20 11:01:03 -05:00
Sangyun_LEE
5d64597490
docs: fix broken Appearance of langchain_community/document_loaders/recursive_url_loader API Reference (#29305)
# PR mesesage
## Description
Fixed a broken Appearance of RecurisveUrlLoader API Reference.

### Before
<p align="center">
<img width="750" alt="image"
src="https://github.com/user-attachments/assets/f39df65d-b788-411d-88af-8bfa2607c00b"
/>
<img width="750" alt="image"
src="https://github.com/user-attachments/assets/b8a92b70-4548-4b4a-965f-026faeebd0ec"
/>
</p>

### After
<p align="center">
<img width="750" alt="image"
src="https://github.com/user-attachments/assets/8ea28146-de45-42e2-b346-3004ec4dfc55"
/>
<img width="750" alt="image"
src="https://github.com/user-attachments/assets/914c6966-4055-45d3-baeb-2d97eab06fe7"
/>
</p>

## Issue:
N/A
## Dependencies
None
## Twitter handle
N/A

# Add tests and docs
Not applicable; this change only affects documentation.

# Lint and test
Ran make format, make lint, and make test to ensure no issues.
2025-01-20 10:56:59 -05:00
Hemant Rawat
6c52378992
Add Google-style docstring linting and update pyproject.toml (#29303)
### Description:

This PR introduces Google-style docstring linting for the
ModelLaboratory class in libs/langchain/langchain/model_laboratory.py.
It also updates the pyproject.toml file to comply with the latest Ruff
configuration standards (deprecating top-level lint settings in favor of
lint).

### Changes include:
- [x] Added detailed Google-style docstrings to all methods in
ModelLaboratory.
- [x] Updated pyproject.toml to move select and pydocstyle settings
under the [tool.ruff.lint] section.
- [x] Ensured all files pass Ruff linting.

Issue:
Closes #25154

### Dependencies:
No additional dependencies are required for this change.

### Checklist
- [x] Files passes ruff linting.
- [x] Docstrings conform to the Google-style convention.
- [x] pyproject.toml updated to avoid deprecation warnings.
- [x] My PR is ready to review, please review.
2025-01-19 14:37:21 -05:00
Mohammad Mohtashim
b5fbebb3c8
(Community): Changing the BaseURL and Model for MiniMax (#29299)
- **Description:** Changed the Base Default Model and Base URL to
correct versions. Plus added a more explicit exception if user provides
an invalid API Key
- **Issue:** #29278
2025-01-19 14:15:02 -05:00
ccurme
c20f7418c7
openai[patch]: fix Azure LLM test (#29302)
The tokens I get are:
```
['', '\n\n', 'The', ' sun', ' was', ' setting', ' over', ' the', ' horizon', ',', ' casting', '']
```
so possibly an extra empty token is included in the output.

lmk @efriis if we should look into this further.
2025-01-19 17:25:42 +00:00
ccurme
6b249a0dc2
openai[patch]: release 0.3.1 (#29301) 2025-01-19 17:04:00 +00:00
ThomasSaulou
e9abe583b2
chatperplexity stream-citations in additional kwargs (#29273)
chatperplexity stream-citations in additional kwargs

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-18 22:31:10 +00:00
TheSongg
1cd4d8d101
[langchain_community.llms.xinference]: Rewrite _stream() method and support stream() method in xinference.py (#29259)
- [ ] **PR title**:[langchain_community.llms.xinference]: Rewrite
_stream() method and support stream() method in xinference.py

- [ ] **PR message**: Rewrite the _stream method so that the
chain.stream() can be used to return data streams.

       chain = prompt | llm
       chain.stream(input=user_input)


- [ ] **tests**: 
      from langchain_community.llms import Xinference
      from langchain.prompts import PromptTemplate

      llm = Xinference(
server_url="http://0.0.0.0:9997", # replace your xinference server url
model_uid={model_uid} # replace model_uid with the model UID return from
launching the model
          stream = True
       )
prompt = PromptTemplate(input=['country'], template="Q: where can we
visit in the capital of {country}? A:")
      chain = prompt | llm
      chain.stream(input={'country': 'France'})
2025-01-17 20:31:59 -05:00
ccurme
184ea8aeb2
anthropic[patch]: update tool choice type (#29276) 2025-01-17 15:26:33 -05:00
ccurme
ac52021097
anthropic[patch]: release 0.3.2 (#29275) 2025-01-17 19:48:31 +00:00
ccurme
c616b445f2
anthropic[patch]: support parallel_tool_calls (#29257)
Need to:
- Update docs
- Decide if this is an explicit kwarg of bind_tools
- Decide if this should be in standard test with flag for supporting
2025-01-17 19:41:41 +00:00
ccurme
d5360b9bd6
core[patch]: release 0.3.30 (#29256) 2025-01-16 17:52:37 -05:00
Nuno Campos
595297e2e5
core: Add support for calls in get_function_nonlocals (#29255)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-01-16 14:43:42 -08:00
Luis Lopez
75663f2cae
community: Add cost per 1K tokens for fine-tuned model cached input (#29248)
### Description

- Since there is no cost per 1k input tokens for a fine-tuned cached
version of `gpt-4o-mini-2024-07-18` is not available when using the
`OpenAICallbackHandler`, it raises an error when trying to make calls
with such model.
- To add the price in the `MODEL_COST_PER_1K_TOKENS` dictionary

cc. @efriis
2025-01-16 15:19:26 -05:00