Follow up to https://github.com/langchain-ai/langsmith-sdk/pull/1696,
I've bumped the `langsmith` version where applicable in `uv.lock`.
Type checking problems here because deps have been updated in
`pyproject.toml` and `uv lock` hasn't been run - we should enforce that
in the future - goes with the other dependabot todos :).
## Description
Added support for retrieving column comments in the SQL Database
utility. This feature allows users to see comments associated with
database columns when querying table information. Column comments
provide valuable metadata that helps LLMs better understand the
semantics and purpose of database columns.
A new optional parameter `get_col_comments` was added to the
`get_table_info` method, defaulting to `False` for backward
compatibility. When set to `True`, it retrieves and formats column
comments for each table.
Currently, this feature is supported on PostgreSQL, MySQL, and Oracle
databases.
## Implementation
You should create Table with column comments before.
```python
db = SQLDatabase.from_uri("YOUR_DB_URI")
print(db.get_table_info(get_col_comments=True))
```
## Result
```
CREATE TABLE test_table (
name VARCHAR
school VARCHAR)
/*
Column Comments: {'name': person name, 'school":school_name}
*/
/*
3 rows from test_table:
name
a
b
c
*/
```
## Benefits
1. Enhances LLM's understanding of database schema semantics
2. Preserves valuable domain knowledge embedded in database design
3. Improves accuracy of SQL query generation
4. Provides more context for data interpretation
Tests are available in
`langchain/libs/community/tests/test_sql_get_table_info.py`.
---------
Co-authored-by: chbae <chbae@gcsc.co.kr>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:**
This PR marks the `HanaDB` vector store (and related utilities) in
`langchain_community` as deprecated using the `@deprecated` annotation.
- Set `since="0.1.0"` and `removal="1.0"`
- Added a clear migration path and a link to the SAP-maintained
replacement in the
[`langchain_hana`](https://github.com/SAP/langchain-integration-for-sap-hana-cloud)
package.
Additionally, the example notebook has been updated to use the new
`HanaDB` class from `langchain_hana`, ensuring users follow the
recommended integration moving forward.
- **Issue:** None
- **Dependencies:** None
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
community: fix browserbase integration
docs: update docs
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Updated BrowserbaseLoader to use the new python sdk.
- **Issue:** update browserbase integration with langchain
- **Dependencies:** n/a
- **Twitter handle:** @kylejeong21
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
- [x] **PR title**: "community: add indexname to other functions in
opensearch"
- [x] **PR message**:
- **Description:** add ability to over-ride index-name if provided in
the kwargs of sub-functions. When used in WSGI application it's crucial
to be able to dynamically change parameters.
- [ ] **Add tests and docs**:
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
PR title:
Community: Add bind variable support for oracle adb docloader
Description:
This PR adds support of using bind variable to oracle adb doc loader
class, including minor document change.
Issue:
N/A
Dependencies:
No new dependencies.
This is a follow-on PR to go with the identical changes that were made
in parters/openai.
Previous PR: https://github.com/langchain-ai/langchain/pull/30757
When calling embed_documents and providing a chunk_size argument, that
argument is ignored when OpenAIEmbeddings is instantiated with its
default configuration (where check_embedding_ctx_length=True).
_get_len_safe_embeddings specifies a chunk_size parameter but it's not
being passed through in embed_documents, which is its only caller. This
appears to be an oversight, especially given that the
_get_len_safe_embeddings docstring states it should respect "the set
embedding context length and chunk size."
Developers typically expect method parameters to take effect (also, take
precedence) when explicitly provided, especially when instantiating
using defaults. I was confused as to why my API calls were being
rejected regardless of the chunk size I provided.
SingleStore integration now has its package `langchain-singlestore', so
the community implementation will no longer be maintained.
Added `deprecated` decorator to `SingleStoreDBChatMessageHistory`,
`SingleStoreDBSemanticCache`, and `SingleStoreDB` classes in the
community package.
**Dependencies:** https://github.com/langchain-ai/langchain/pull/30841
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Support "usage_metadata" for LiteLLM streaming calls.
This is a follow-up to
https://github.com/langchain-ai/langchain/pull/30625, which tackled
non-streaming calls.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
- [ ] **PR message**:
- **Description:** including metadata_field in
max_marginal_relevance_search() would result in error, changed the logic
to be similar to how it's handled in similarity_search, where it can be
any field or simply a "*" to include every field
**Description:** add support for oauth2 in Jira tool by adding the
possibility to pass a dictionary with oauth parameters. I also adapted
the documentation to show this new behavior
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
Google vertex ai search will now return the title of the found website
as part of the document metadata, if available.
Thank you for contributing to LangChain!
- **Description**: Vertex AI Search can be used to index websites and
then develop chatbots that use these websites to answer questions. At
present, the document metadata includes an `id` and `source` (which is
the URL). While the URL is enough to create a link, the ID is not
descriptive enough to show users. Therefore, I propose we return `title`
as well, when available (e.g., it will not be available in `.txt`
documents found during the website indexing).
- **Issue**: No bug in particular, but it would be better if this was
here.
- **Dependencies**: None
- I do not use twitter.
Format, Lint and Test seem to be all good.
Generally, this PR is CI performance focused + aims to clean up some
dependencies at the same time.
1. Unpins upper bounds for `numpy` in all `pyproject.toml` files where
`numpy` is specified
2. Requires `numpy >= 2.1.0` for Python 3.13 and `numpy > v1.26.0` for
Python 3.12, plus a `numpy` min version bump for `chroma`
3. Speeds up CI by minutes - linting on Python 3.13, installing `numpy <
2.1.0` was taking [~3
minutes](https://github.com/langchain-ai/langchain/actions/runs/14316342925/job/40123305868?pr=30713),
now the entire env setup takes a few seconds
4. Deleted the `numpy` test dependency from partners where that was not
used, specifically `huggingface`, `voyageai`, `xai`, and `nomic`.
It's a bit unfortunate that `langchain-community` depends on `numpy`, we
might want to try to fix that in the future...
Closes https://github.com/langchain-ai/langchain/issues/26026
Fixes https://github.com/langchain-ai/langchain/issues/30555
- **Description:** We do not need to set parser in `scrape` since it is
already been done in `_scrape`
- **Issue:** #30629, not directly related but makes sure xml parser is
used
- [ ] **PR title**: "community: Removes pandas dependency for using
DuckDB for similarity search"
- [ ] **PR message**:
- **Description:** Removes pandas dependency for using DuckDB for
similarity search. The old function still exists as
`similarity_search_pd`, while the new one is at `similarity_search` and
requires no code changes. Return format remains the same.
- **Issue:** Issue #29933 and update on PR #30435
- **Dependencies:** No dependencies
**Description:**
Adds support for Riza custom runtimes to the two Riza code interpreter
tools, allowing users to run LLM-generated code that depends on
libraries outside stdlib.
**Issue:** N/A
**Dependencies:** None
**Twitter handle:** @rizaio
Plus, some accompanying docs updates
Some compelling usage:
```py
from langchain_perplexity import ChatPerplexity
chat = ChatPerplexity(model="llama-3.1-sonar-small-128k-online")
response = chat.invoke(
"What were the most significant newsworthy events that occurred in the US recently?",
extra_body={"search_recency_filter": "week"},
)
print(response.content)
# > Here are the top significant newsworthy events in the US recently: ...
```
Also, some confirmation of structured outputs:
```py
from langchain_perplexity import ChatPerplexity
from pydantic import BaseModel
class AnswerFormat(BaseModel):
first_name: str
last_name: str
year_of_birth: int
num_seasons_in_nba: int
messages = [
{"role": "system", "content": "Be precise and concise."},
{
"role": "user",
"content": (
"Tell me about Michael Jordan. "
"Please output a JSON object containing the following fields: "
"first_name, last_name, year_of_birth, num_seasons_in_nba. "
),
},
]
llm = ChatPerplexity(model="llama-3.1-sonar-small-128k-online")
structured_llm = llm.with_structured_output(AnswerFormat)
response = structured_llm.invoke(messages)
print(repr(response))
#> AnswerFormat(first_name='Michael', last_name='Jordan', year_of_birth=1963, num_seasons_in_nba=15)
```
Support "usage_metadata" for LiteLLM.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
## Description
This PR adds a new `sitemap_url` parameter to the `GitbookLoader` class
that allows users to specify a custom sitemap URL when loading content
from a GitBook site. This is particularly useful for GitBook sites that
use non-standard sitemap file names like `sitemap-pages.xml` instead of
the default `sitemap.xml`.
The standard `GitbookLoader` assumes that the sitemap is located at
`/sitemap.xml`, but some GitBook instances (including GitBook's own
documentation) use different paths for their sitemaps. This parameter
makes the loader more flexible and helps users extract content from a
wider range of GitBook sites.
## Issue
Fixes bug
[30473](https://github.com/langchain-ai/langchain/issues/30473) where
the `GitbookLoader` would fail to find pages on GitBook sites that use
custom sitemap URLs.
## Dependencies
No new dependencies required.
*I've added*:
* Unit tests to verify the parameter works correctly
* Integration tests to confirm the parameter is properly used with real
GitBook sites
* Updated docstrings with parameter documentation
The changes are fully backward compatible, as the parameter is optional
with a sensible default.
---------
Co-authored-by: andrasfe <andrasf94@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
This PR addresses two key issues:
- **Prevent history errors from failing silently**: Previously, errors
in message history were only logged and not raised, which can lead to
inconsistent state and downstream failures (e.g., ValidationError from
Bedrock due to malformed message history). This change ensures that such
errors are raised explicitly, making them easier to detect and debug.
(Side note: I’m using AWS Lambda Powertools Logger but hadn’t configured
it properly with the standard Python logger—my bad. If the error had
been raised, I would’ve seen it in the logs 😄) This is a **BREAKING
CHANGE**
- **Add messages in bulk instead of iteratively**: This introduces a
custom add_messages method to add all messages at once. The previous
approach failed silently when individual messages were too large,
resulting in partial history updates and inconsistent state. With this
change, either all messages are added successfully, or none are—helping
avoid obscure history-related errors from Bedrock.
---------
Co-authored-by: Kacper Wlodarczyk <kacper.wlodarczyk@chaosgears.com>
**Description:**
Fixes a bug in the YoutubeLoader where FetchedTranscript objects were
not properly processed. The loader was only extracting the 'text'
attribute from FetchedTranscriptSnippet objects while ignoring 'start'
and 'duration' attributes. This would cause a TypeError when the code
later tried to access these missing keys, particularly when using the
CHUNKS format or any code path that needed timestamp information.
This PR modifies the conversion of FetchedTranscriptSnippet objects to
include all necessary attributes, ensuring that the loader works
correctly with all transcript formats.
**Issue:** Fixes#30309
**Dependencies:** None
**Testing:**
- Tested the fix with multiple YouTube videos to confirm it resolves the
issue
- Verified that both regular loading and CHUNKS format work correctly
- **Description:**
- Make Brave Search Tool consistent with other tools and allow reading
its api key from `BRAVE_SEARCH_API_KEY` instead of having to pass the
api key manually (no breaking changes)
- Improve Brave Search Tool by storing api key in `SecretStr` instead of
plain `str`.
- Add unit test for `BraveSearchWrapper`
- Reflect the changes in the documentation
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** ivan_brko
This PR includes support for HANA dialect in SQLDatabase, which is a
wrapper class for SQLAlchemy.
Currently, it is unable to set schema name when using HANA DB with
Langchain. And, it does not show any message to user so that it makes
hard for user to figure out why the SQL does not work as expected.
Here is the reference document for HANA DB to set schema for the
session.
- [SET SCHEMA Statement (Session
Management)](https://help.sap.com/docs/SAP_HANA_PLATFORM/4fe29514fd584807ac9f2a04f6754767/20fd550375191014b886a338afb4cd5f.html)
This pull request includes enhancements to the `perplexity.py` file in
the `chat_models` module, focusing on improving the handling of
additional keyword arguments (`additional_kwargs`) in message processing
methods. Additionally, new unit tests have been added to ensure the
correct inclusion of citations, images, and related questions in the
`additional_kwargs`.
Issue: resolves https://github.com/langchain-ai/langchain/issues/30439
Enhancements to `perplexity.py`:
*
[`libs/community/langchain_community/chat_models/perplexity.py`](diffhunk://#diff-d3e4d7b277608683913b53dcfdbd006f0f4a94d110d8b9ac7acf855f1f22207fL208-L212):
Modified the `_convert_delta_to_message_chunk`, `_stream`, and
`_generate` methods to handle `additional_kwargs`, which include
citations, images, and related questions.
[[1]](diffhunk://#diff-d3e4d7b277608683913b53dcfdbd006f0f4a94d110d8b9ac7acf855f1f22207fL208-L212)
[[2]](diffhunk://#diff-d3e4d7b277608683913b53dcfdbd006f0f4a94d110d8b9ac7acf855f1f22207fL277-L286)
[[3]](diffhunk://#diff-d3e4d7b277608683913b53dcfdbd006f0f4a94d110d8b9ac7acf855f1f22207fR324-R331)
New unit tests:
*
[`libs/community/tests/unit_tests/chat_models/test_perplexity.py`](diffhunk://#diff-dab956d79bd7d17a0f5dea3f38ceab0d583b43b63eb1b29138ee9b6b271ba1d9R119-R275):
Added new tests `test_perplexity_stream_includes_citations_and_images`
and `test_perplexity_stream_includes_citations_and_related_questions` to
verify that the `stream` method correctly includes citations, images,
and related questions in the `additional_kwargs`.
Thank you for contributing to LangChain!
- **Description:** Azure Document Intelligence OCR solution has a
*feature* parameter that enables some features such as high-resolution
document analysis, key-value pairs extraction, ... In langchain parser,
you could be provided as a `analysis_feature` parameter to the
constructor that was passed on the `DocumentIntelligenceClient`.
However, according to the `DocumentIntelligenceClient` [API
Reference](https://learn.microsoft.com/en-us/python/api/azure-ai-documentintelligence/azure.ai.documentintelligence.documentintelligenceclient?view=azure-python),
this is not a valid constructor parameter. It was therefore remove and
instead stored as a parser property that is used in the
`begin_analyze_document`'s `features` parameter (see [API
Reference](https://learn.microsoft.com/en-us/python/api/azure-ai-formrecognizer/azure.ai.formrecognizer.documentanalysisclient?view=azure-python#azure-ai-formrecognizer-documentanalysisclient-begin-analyze-document)).
I also removed the check for "Supported features" since all features are
supported out-of-the-box. Also I did not check if the provided `str`
actually corresponds to the Azure package enumeration of features, since
the `ValueError` when creating the enumeration object is pretty
explicit.
Last caveat, is that some features are not supported for some kind of
documents. This is documented inside Microsoft documentation and
exception are also explicit.
- **Issue:** N/A
- **Dependencies:** No
- **Twitter handle:** @Louis___A
---------
Co-authored-by: Louis Auneau <louis@handshakehealth.co>
Description: Extend the gremlin graph schema to include the edge
properties, grouped by its triples; i.e: `inVLabel` and `outVLabel`.
This should give more context when crafting queries to run against a
gremlin graph db
This pull request includes extensive documentation updates for the
`ChatPerplexity` class in the
`libs/community/langchain_community/chat_models/perplexity.py` file. The
changes provide detailed setup instructions, key initialization
arguments, and usage examples for various functionalities of the
`ChatPerplexity` class.
Documentation improvements:
* Added setup instructions for installing the `openai` package and
setting the `PPLX_API_KEY` environment variable.
* Documented key initialization arguments for completion parameters and
client parameters, including `model`, `temperature`, `max_tokens`,
`streaming`, `pplx_api_key`, `request_timeout`, and `max_retries`.
* Provided examples for instantiating the `ChatPerplexity` class,
invoking it with messages, using structured output, invoking with
perplexity-specific parameters, streaming responses, and accessing token
usage and response metadata.Thank you for contributing to LangChain!