Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Added a link to make it easier to organize github
issues for langchain.
- **Issue:** After reading that there was a taxonomy of labels I had to
figure out how to find it.
- **Dependencies:** None
- **Twitter handle:** None
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:**
This PR resolves an issue with the
`ExperimentalMarkdownSyntaxTextSplitter` class, which retains the
internal state across multiple calls to the `split_text` method. This
behaviour caused an unintended accumulation of chunks in `self`
variables, leading to incorrect outputs when processing multiple
Markdown files sequentially.
- Modified `libs\text-splitters\langchain_text_splitters\markdown.py` to
reset the relevant internal attributes at the start of each `split_text`
invocation. This ensures each call processes the input independently.
- Added unit tests in
`libs\text-splitters\tests\unit_tests\test_text_splitters.py` to verify
the fix and ensure the state does not persist across calls.
- **Issue:**
Fixes [#26440](https://github.com/langchain-ai/langchain/issues/26440).
- **Dependencies:**
No additional dependencies are introduced with this change.
- [x] Unit tests were added to verify the changes.
- [x] Updated documentation where necessary.
- [x] Ran `make format`, `make lint`, and `make test` to ensure
compliance with project standards.
---------
Co-authored-by: Angel Chen <angelchen396@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** The `kwargs` was being checked as None object which
was causing the rest of code in `with_structured_output` not getting
executed. The checking part has been fixed in this PR.
- **Issue:** #28776
Thank you for contributing to LangChain!
- [x] **PR title**: Add float Message into Dynamo DB
- community
- Example: "community: Chat Message History
- [x] **PR message**:
- **Description:** pushing float values into dynamo db creates error ,
solved that by converting to str type
- **Issue:** Float values are not getting pushed
- **Twitter handle:** VpkPrasanna
Have added an utility function for str conversion , let me know where to
place it happy to do an commit.
This PR is from an discussion of #26543
@hwchase17 @baskaryan @efriis
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
## Description
This pull request introduces the `DocumentLoaderAsParser` class, which
acts as an adapter to transform document loaders into parsers within the
LangChain framework. The class enables document loaders that accept a
`file_path` parameter to be utilized as blob parsers. This is
particularly useful for integrating various document loading
capabilities seamlessly into the LangChain ecosystem.
When merged in together with PR
https://github.com/langchain-ai/langchain/pull/27716 It opens options
for `SharePointLoader` / `OneDriveLoader` to process any filetype that
has a document loader.
### Features
- **Flexible Parsing**: The `DocumentLoaderAsParser` class can adapt any
document loader that meets the criteria of accepting a `file_path`
argument, allowing for lazy parsing of documents.
- **Compatibility**: The class has been designed to work with various
document loaders, making it versatile for different use cases.
### Usage Example
To use the `DocumentLoaderAsParser`, you would initialize it with a
suitable document loader class and any required parameters. Here’s an
example of how to do this with the `UnstructuredExcelLoader`:
```python
from langchain_community.document_loaders.blob_loaders import Blob
from langchain_community.document_loaders.parsers.documentloader_adapter import DocumentLoaderAsParser
from langchain_community.document_loaders.excel import UnstructuredExcelLoader
# Initialize the parser adapter with UnstructuredExcelLoader
xlsx_parser = DocumentLoaderAsParser(UnstructuredExcelLoader, mode="paged")
# Use parser, for ex. pass it to MimeTypeBasedParser
MimeTypeBasedParser(
handlers={
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": xlsx_parser
}
)
```
- **Dependencies:** None
- **Twitter handle:** @martintriska1
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** One-Bit Images was raising error which has been fixed
in this PR for `PDFPlumberParser`
- **Issue:** #28480
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- [ x ] Fix when lancedb return table without metadata column
- **Description:** Check the table schema, if not has metadata column,
init the Document with metadata argument equal to empty dict
- **Issue:** https://github.com/langchain-ai/langchain/issues/27005
- [ x ] **Add tests and docs**
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
## Overview
This PR adds documentation for the `langchain-yt-dlp` package, a YouTube
document loader that uses `yt-dlp` for Youtube videos metadata
extraaction.
## Changes
- Added documentation notebook for YoutubeLoader
- Updated packages.yml to include langchain-yt-dlp
## Motivation
The existing LangChain YoutubeLoader was unable to fetch YouTube
metadata due to changes in YouTube's structure. This package resolves
those issues by leveraging the `yt-dlp` library.
## Features
- Reliable YouTube metadata extraction
## Related
- Package Repository: https://github.com/aqib0770/langchain-yt-dlp
- PyPI Package: https://pypi.org/project/langchain-yt-dlp/
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Hi, langchain team! I'm a maintainer of
[OceanBase](https://github.com/oceanbase/oceanbase).
With the integration guidance, I create a python lib named
[langchain-oceanbase](https://github.com/oceanbase/langchain-oceanbase)
to integrate `Oceanbase Vector Store` with `Langchain`.
So I'd like to add the required docs. I will appreciate your feedback.
Thank you!
---------
Signed-off-by: shanhaikang.shk <shanhaikang.shk@oceanbase.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- [X] **PR title**:
community: Add new model and structured output support
- [X] **PR message**:
- **Description:** add support for meta llama 3.2 image handling, and
JSON mode for structured output
- **Issue:** NA
- **Dependencies:** NA
- **Twitter handle:** NA
- [x] **Add tests and docs**:
1. we have updated our unit tests,
2. no changes required for documentation.
- [x] **Lint and test**:
make format, make lint and make test we run successfully
---------
Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
**Description:**
Adding new AWS Bedrock model and their respective costs to match
https://aws.amazon.com/bedrock/pricing/ for the Bedrock callback
**Issue:**
Missing models for those that wish to try them out
**Dependencies:**
Nothing added
**Twitter handle:**
@David_Pryce and / or @JamfSoftware
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description**:
This PR addresses an issue where the DocumentAttributeValue class
properties did not have default values of None. By explicitly setting
the Optional attributes (DateValue, LongValue, StringListValue, and
StringValue) to default to None, this change ensures the class functions
as expected when no value is provided for these attributes.
**Changes Made**:
Added default None values to the following properties of the
DocumentAttributeValue class:
DateValue
LongValue
StringListValue
StringValue
Removed the invalid argument extra="allow" from the BaseModel
inheritance.
Dependencies: None.
**Twitter handle (optional)**: @__korikori1021
**Checklist**
- [x] Verified that KendraRetriever works as expected after the changes.
Co-authored-by: y1u0d2a1i <y.kotani@raksul.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
Description:
Improved the `_parse_google_docstring` function in `langchain/core` to
support parsing multi-paragraph descriptions before the `Args:` section
while maintaining compliance with Google-style docstring guidelines.
This change ensures better handling of docstrings with detailed function
descriptions.
Issue:
Fixes#28628
Dependencies:
None.
Twitter handle:
@isatyamks
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** I am working to address a similar issue to the one
mentioned in https://github.com/langchain-ai/langchain/pull/19499.
Specifically, there is a problem with the Webbase loader used in
open-webui, where it fails to load the proxy configuration. This PR aims
to resolve that issue.
<!--If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.-->
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description**: Some confluence instances don't support personal access
token, then cookie is a convenient way to authenticate. This PR adds
support for Confluence cookies.
**Twitter handle**: soulmachine
Issue: the current
[Cache](https://python.langchain.com/docs/integrations/llm_caching/)
page has an inconsistent heading.
Mixed terms are used; mixed casing; and mixed `selecting`. Excessively
long titles make right-side ToC hard to read and unnecessarily long.
Changes: consitent and more-readable ToC
…ent path given.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:**
- Add _concatenate_rich_text method to combine all elements in rich text
arrays
- Update load_page method to use _concatenate_rich_text for rich text
properties
- Ensure all text content is captured, including inline code and
formatted text
- Add unit tests to verify correct handling of multi-element rich text
This fix prevents truncation of content after backticks or other
formatting elements.
**Issue:**
Using Notion DB Loader, the text for `richtext` and `title` is truncated
after 1st element was loaded as Notion Loader only read the first
element.
**Dependencies:** any dependencies required for this change
None.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR closes#27781
# Problem
The current implementation of `NLTKTextSplitter` is using
`sent_tokenize`. However, this `sent_tokenize` doesn't handle chars
between 2 tokenized sentences... hence, this behavior throws errors when
we are using `add_start_index=True`, as described in issue #27781. In
particular:
```python
from nltk.tokenize import sent_tokenize
output1 = sent_tokenize("Innovation drives our success. Collaboration fosters creative solutions. Efficiency enhances data management.", language="english")
print(output1)
output2 = sent_tokenize("Innovation drives our success. Collaboration fosters creative solutions. Efficiency enhances data management.", language="english")
print(output2)
>>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.']
>>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.']
```
# Solution
With this new `use_span_tokenize` parameter, we can use NLTK to create
sentences (with `span_tokenize`), but also add extra chars to be sure
that we still can map the chunks to the original text.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Erick Friis <erickfriis@gmail.com>
**Description:** Added support for FalkorDB Vector Store, including its
implementation, unit tests, documentation, and an example notebook. The
FalkorDB integration allows users to efficiently manage and query
embeddings in a vector database, with relevance scoring and maximal
marginal relevance search. The following components were implemented:
- Core implementation for FalkorDBVector store.
- Unit tests ensuring proper functionality and edge case coverage.
- Example notebook demonstrating an end-to-end setup, search, and
retrieval using FalkorDB.
**Twitter handle:** @tariyekorogha
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Update to OpenLLM 0.6, which we decides to make use of OpenLLM's
OpenAI-compatible endpoint. Thus, OpenLLM will now just become a thin
wrapper around OpenAI wrapper.
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
---------
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: ccurme <chester.curme@gmail.com>
## description
- I refactor `Chathunyuan` using tencentcloud sdk because I found the
original one can't work in my application
- I add `HunyuanEmbeddings` using tencentcloud sdk
- Both of them are extend the basic class of langchain. I have fully
tested them in my application
## Dependencies
- tencentcloud-sdk-python
---------
Co-authored-by: centonhuang <centonhuang@tencent.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
### About:
- **Description:** the _get_files_from_directory_ method return a
string, but it's used in other methods that expect a List[str]
- **Issue:** None
- **Dependencies:** None
This pull request import a new method _list_files_ with the old logic of
_get_files_from_directory_, but it return a List[str] at the end.
The behavior of _ get_files_from_directory_ is not changed.
Thank you for contributing to LangChain!
- [ ] **PR title**: community: Add configurable `VisualFeatures` to the
`AzureAiServicesImageAnalysisTool`
- [ ] **PR message**:
- **Description:** The `AzureAiServicesImageAnalysisTool` is a good
service and utilises the Azure AI Vision package under the hood.
However, since the creation of this tool, new `VisualFeatures` have been
added to allow the user to request other image specific information to
be returned. Currently, the tool offers neither configuration of which
features should be return nor does it offer any newer feature types. The
aim of this PR is to address this and expose more of the Azure Service
in this integration.
- **Dependencies:** no new dependencies in the main class file,
azure.ai.vision.imageanalysis added to extra test dependencies file.
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. Although no tests exist for already implemented Azure Service tools,
I've created 3 unit tests for this class that test initialisation and
credentials, local file analysis and a test for the new changes/
features option.
- [ ] **Lint and test**: All linting has passed.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This pull request addresses the issue with authenticating Azure National
Cloud using token (RBAC) in the AzureSearch vectorstore implementation.
## Changes
- Modified the `_get_search_client` method in `azuresearch.py` to pass
`additional_search_client_options` to the `SearchIndexClient` instance.
## Implementation Details
The patch updates the `SearchIndexClient` initialization to include the
`additional_search_client_options` parameter:
```python
index_client: SearchIndexClient = SearchIndexClient(
endpoint=endpoint,
credential=credential,
user_agent=user_agent,
**additional_search_client_options
)
```
This change allows the `audience` parameter to be correctly passed when
using Azure National Cloud, fixing the authentication issues with
GovCloud & RBAC.
This patch was generated by [Ana - AI SDE](https://openana.ai/), an
AI-powered software development assistant.
This is a fix for [Issue
25823](https://github.com/langchain-ai/langchain/issues/25823)
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- **PR title**: "community: Remove all other keys in ChatLiteLLM and add
api_key"
- **PR message**: Currently, no api_key are passed to LiteLLM, and
LiteLLM only takes on api_key parameter. Therefore I removed all current
`*_api_key` attributes (They are not used), and added `api_key` that is
passed to ChatLiteLLM.
- Should fix issue #27826
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Description:
Current AGEGraph() implementation does some custom wrapping for graph
queries. The method here is _wrap_query() as it parse the field from the
original query to add some SQL context to it.
This improves the current parsing logic to cover additional edge cases
that are added to the test coverage, basically if any Node property name
or value has the "return" literal in it will break the graph / SQL
query.
We discovered this while dealing with real world datasets, is not an
uncommon scenario and I think it needs to be covered.