Commit Graph

7318 Commits

Author SHA1 Message Date
Mason Daugherty
116b758498 fix: bump deps for release (#32179)
forgot to bump the `pyproject.toml` files
2025-07-22 13:12:14 -04:00
Mason Daugherty
10996a2821 release(perplexity): 0.1.2 (#32176) 2025-07-22 13:02:19 -04:00
Mason Daugherty
2aed07efb6 release(deepseek): 0.1.4 (#32178) 2025-07-22 13:01:54 -04:00
Mason Daugherty
64dac1faf7 release(huggingface): 0.3.1 (#32177) 2025-07-22 13:01:34 -04:00
Mason Daugherty
58768d8aef release(xai): 0.2.5 (#32174) 2025-07-22 13:01:26 -04:00
Mason Daugherty
d65da13299 docs(ollama): add validate_model_on_init note, bump lock (#32172) 2025-07-22 10:58:45 -04:00
Copilot
2104cf0d9a fix: replace deprecated Pydantic .schema() calls with v1/v2 compatible pattern (#32162)
This PR addresses deprecation warnings users encounter when using
LangChain tools with Pydantic v2:

```
PydanticDeprecatedSince20: The `schema` method is deprecated; use `model_json_schema` instead. 
Deprecated in Pydantic V2.0 to be removed in V3.0.
```

## Root Cause

Several LangChain components were still using the deprecated `.schema()`
method directly instead of the Pydantic v1/v2 compatible approach. While
users calling `.schema()` on returned models will still see warnings
(which is correct), LangChain's internal code should not generate these
warnings.

## Changes Made

Updated 3 files to use the standard compatibility pattern:

```python
# Before (deprecated)
schema = model.schema()

# After (compatible with both v1 and v2) 
if hasattr(model, "model_json_schema"):
    schema = model.model_json_schema()  # Pydantic v2
else:
    schema = model.schema()  # Pydantic v1
```

### Files Updated:
- **`evaluation/parsing/json_schema.py`**: Fixed `_parse_json()` method
to handle Pydantic models correctly
- **`output_parsers/yaml.py`**: Fixed `get_format_instructions()` to use
compatible schema access
- **`chains/openai_functions/citation_fuzzy_match.py`**: Fixed direct
`.schema()` call on QuestionAnswer model

## Verification

 **Zero breaking changes** - all existing functionality preserved  
 **No deprecation warnings** from LangChain internal code  
 **Backward compatible** with Pydantic v1  
 **Forward compatible** with Pydantic v2  
 **Edge cases handled** (strings, plain objects, etc.)

## User Impact

LangChain users will no longer see deprecation warnings from internal
LangChain code. Users who directly call `.schema()` on schemas returned
by LangChain should adopt the same compatibility pattern:

```python
# User code should use this pattern
input_schema = tool.get_input_schema()
if hasattr(input_schema, "model_json_schema"):
    schema_result = input_schema.model_json_schema()
else:
    schema_result = input_schema.schema()
```

Fixes #31458.

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-21 21:19:53 -04:00
Copilot
18c64aed6d feat(core): add sanitize_for_postgres utility to fix PostgreSQL NUL byte DataError (#32157)
This PR fixes the PostgreSQL NUL byte issue that causes
`psycopg.DataError` when inserting documents containing `\x00` bytes
into PostgreSQL-based vector stores.

## Problem

PostgreSQL text fields cannot contain NUL (0x00) bytes. When documents
with such characters are processed by PGVector or langchain-postgres
implementations, they fail with:

```
(psycopg.DataError) PostgreSQL text fields cannot contain NUL (0x00) bytes
```

This commonly occurs when processing PDFs, documents from various
loaders, or text extracted by libraries like unstructured that may
contain embedded NUL bytes.

## Solution

Added `sanitize_for_postgres()` utility function to
`langchain_core.utils.strings` that removes or replaces NUL bytes from
text content.

### Key Features

- **Simple API**: `sanitize_for_postgres(text, replacement="")`
- **Configurable**: Replace NUL bytes with empty string (default) or
space for readability
- **Comprehensive**: Handles all problematic examples from the original
issue
- **Well-tested**: Complete unit tests with real-world examples
- **Backward compatible**: No breaking changes, purely additive

### Usage Example

```python
from langchain_core.utils import sanitize_for_postgres
from langchain_core.documents import Document

# Before: This would fail with DataError
problematic_content = "Getting\x00Started with embeddings"

# After: Clean the content before database insertion
clean_content = sanitize_for_postgres(problematic_content)
# Result: "GettingStarted with embeddings"

# Or preserve readability with spaces
readable_content = sanitize_for_postgres(problematic_content, " ")
# Result: "Getting Started with embeddings"

# Use in Document processing
doc = Document(page_content=clean_content, metadata={...})
```

### Integration Pattern

PostgreSQL vector store implementations should sanitize content before
insertion:

```python
def add_documents(self, documents: List[Document]) -> List[str]:
    # Sanitize documents before insertion
    sanitized_docs = []
    for doc in documents:
        sanitized_content = sanitize_for_postgres(doc.page_content, " ")
        sanitized_doc = Document(
            page_content=sanitized_content,
            metadata=doc.metadata,
            id=doc.id
        )
        sanitized_docs.append(sanitized_doc)
    
    return self._insert_documents_to_db(sanitized_docs)
```

## Changes Made

- Added `sanitize_for_postgres()` function in
`langchain_core/utils/strings.py`
- Updated `langchain_core/utils/__init__.py` to export the new function
- Added comprehensive unit tests in
`tests/unit_tests/utils/test_strings.py`
- Validated against all examples from the original issue report

## Testing

All tests pass, including:
- Basic NUL byte removal and replacement
- Multiple consecutive NUL bytes
- Empty string handling
- Real examples from the GitHub issue
- Backward compatibility with existing string utilities

This utility enables PostgreSQL integrations in both langchain-community
and langchain-postgres packages to handle documents with NUL bytes
reliably.

Fixes #26033.

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-21 20:33:20 -04:00
Christophe Bornet
64261449b8 feat(langchain): add ruff rules TRY (#32047)
See https://docs.astral.sh/ruff/rules/#tryceratops-try

* TRY004 (replace by TypeError) in main code is escaped with `noqa` to
not break backward compatibility. The rule is still interesting for new
code.
* TRY301 ignored at the moment. This one is quite hard to fix and I'm
not sure it's very interesting to activate it.

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-21 13:41:20 -04:00
Christophe Bornet
8b8d90bea5 feat(langchain): add ruff rules PT (#32010)
See https://docs.astral.sh/ruff/rules/#flake8-pytest-style-pt
2025-07-21 13:15:05 -04:00
Mohammad Mohtashim
095f4a7c28 fix(core): fix parse_resultin case of self.first_tool_only with multiple keys matching for JsonOutputKeyToolsParser (#32106)
* **Description:** Updated `parse_result` logic to handle cases where
`self.first_tool_only` is `True` and multiple matching keys share the
same function name. Instead of returning the first match prematurely,
the method now prioritizes filtering results by the specified key to
ensure correct selection.
* **Issue:** #32100

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-21 12:50:22 -04:00
diego-coder
8e4396bb32 fix(ollama): robustly parse single-quoted JSON in tool calls (#32109)
**Description:**
This PR makes argument parsing for Ollama tool calls more robust. Some
LLMs—including Ollama—may return arguments as Python-style dictionaries
with single quotes (e.g., `{'a': 1}`), which are not valid JSON and
previously caused parsing to fail.
The updated `_parse_json_string` method in
`langchain_ollama.chat_models` now attempts standard JSON parsing and,
if that fails, falls back to `ast.literal_eval` for safe evaluation of
Python-style dictionaries. This improves interoperability with LLMs and
fixes a common usability issue for tool-based agents.

**Issue:**
Closes #30910

**Dependencies:**
None

**Tests:**
- Added new unit tests for double-quoted JSON, single-quoted dicts,
mixed quoting, and malformed/failure cases.
- All tests pass locally, including new coverage for single-quoted
inputs.

**Notes:**
- No breaking changes.
- No new dependencies introduced.
- Code is formatted and linted (`ruff format`, `ruff check`).
- If maintainers have suggestions for further improvements, I’m happy to
revise!

Thank you for maintaining LangChain! Looking forward to your feedback.
2025-07-21 12:11:22 -04:00
ccurme
2ef9465893 fix(anthropic): fix test (#32145) 2025-07-21 14:49:40 +00:00
ccurme
0355da3159 release(core): 0.3.70 (#32144) 2025-07-21 10:49:32 -04:00
astraszab
668c084520 docs(core): move incorrect arg limitation in rate limiter's docstring (#32118) 2025-07-20 14:28:35 -04:00
ccurme
cc076ed891 fix(huggingface): update model used in standard tests (#32116) 2025-07-20 01:50:31 +00:00
Yoshi
6d71bb83de fix(core): fix docstrings and add sleep to FakeListChatModel._call (#32108) 2025-07-19 17:30:15 -04:00
Isaac Francisco
98bfd57a76 fix(core): better error message for empty var names (#32073)
Previously, we hit an index out of range error with empty variable names
(accessing tag[0]), now we through a slightly nicer error

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-18 17:00:02 -04:00
Gurram Siddarth Reddy
427d2d6397 fix(core): implement sleep delay in FakeMessagesListChatModel _generate (#32014)
implement sleep delay in FakeMessagesListChatModel._generate so the
sleep parameter is respected, matching the documented behavior. This
adds artificial latency between responses for testing purposes.

Issue: closes
[#31974](https://github.com/langchain-ai/langchain/issues/31974)
following
[docs](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.fake_chat_models.FakeMessagesListChatModel.html#langchain_core.language_models.fake_chat_models.FakeMessagesListChatModel.sleep)

Dependencies: none

Twitter handle: [@siddarthreddyg2](https://x.com/siddarthreddyg2)

---------

Signed-off-by: Siddarthreddygsr <siddarthreddygsr@gmail.com>
2025-07-18 15:54:28 -04:00
Sarah Guthals
22535eb4b3 docs: add tensorlake provider (#32046) 2025-07-17 19:28:14 -04:00
open-swe[bot]
5da986c3f6 fix(core): JSON Schema reference resolution for list indices (#32088)
Fixes #32042

## Summary
Fixes a critical bug in JSON Schema reference resolution that prevented
correctly dereferencing numeric components in JSON pointer paths,
specifically for list indices in `anyOf`, `oneOf`, and `allOf` arrays.

## Changes
- Fixed `_retrieve_ref` function in
`libs/core/langchain_core/utils/json_schema.py` to properly handle
numeric components
- Added comprehensive test function `test_dereference_refs_list_index()`
in `libs/core/tests/unit_tests/utils/test_json_schema.py`
- Resolved line length formatting issues
- Improved type checking and index validation for list and dictionary
references

## Key Improvements
- Correctly handles list index references in JSON pointer paths
- Maintains backward compatibility with existing dictionary numeric key
functionality
- Adds robust error handling for out-of-bounds and invalid indices
- Passes all test cases covering various reference scenarios

## Test Coverage
- Verified fix for `#/properties/payload/anyOf/1/properties/startDate`
reference
- Tested edge cases including out-of-bounds and negative indices
- Ensured no regression in existing reference resolution functionality

Resolves the reported issue with JSON Schema reference dereferencing for
list indices.

---------

Co-authored-by: open-swe-dev[bot] <open-swe-dev@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-17 15:54:38 -04:00
Christophe Bornet
b61ce9178c refactor(langchain): remove model_rebuild (#32080)
Since #29963 BaseCache and Callbacks are imported in BaseLanguageModel
so there's no need to import them and rebuild the models.
Note: fix is available since `langchain-core==0.3.39` and the current
langchain dependency on core is `>=0.3.66` so the fix will always be
there.
2025-07-17 10:34:41 -04:00
Mason Daugherty
491f63ca82 release(ollama): release 0.3.5 (#32076) 2025-07-16 18:45:32 -04:00
Mason Daugherty
587c213760 bump lcok 2025-07-16 18:44:56 -04:00
Copilot
98c3bbbaf0 fix(ollama): num_gpu parameter not working in async OllamaEmbeddings method (#32074)
The `num_gpu` parameter in `OllamaEmbeddings` was not being passed to
the Ollama client in the async embedding method, causing GPU
acceleration settings to be ignored when using async operations.

## Problem

The issue was in the `aembed_documents` method where the `options`
parameter (containing `num_gpu` and other configuration) was missing:

```python
# Sync method (working correctly)
return self._client.embed(
    self.model, texts, options=self._default_params, keep_alive=self.keep_alive
)["embeddings"]

# Async method (missing options parameter)
return (
    await self._async_client.embed(
        self.model, texts, keep_alive=self.keep_alive  #  No options!
    )
)["embeddings"]
```

This meant that when users specified `num_gpu=4` (or any other GPU
configuration), it would work with sync calls but be ignored with async
calls.

## Solution

Added the missing `options=self._default_params` parameter to the async
embed call to match the sync version:

```python
# Fixed async method
return (
    await self._async_client.embed(
        self.model,
        texts,
        options=self._default_params,  #  Now includes num_gpu!
        keep_alive=self.keep_alive,
    )
)["embeddings"]
```

## Validation

-  Added unit test to verify options are correctly passed in both sync
and async methods
-  All existing tests continue to pass
-  Manual testing confirms `num_gpu` parameter now works correctly
-  Code passes linting and formatting checks

The fix ensures that GPU configuration works consistently across both
synchronous and asynchronous embedding operations.

Fixes #32059.

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-16 18:42:52 -04:00
efj-amzn
d3072e2d2e feat(core): update _import_utils.py to not mask the thrown exception (#32071) 2025-07-16 17:11:56 -04:00
Inácio Nery
ea8f2a05ba feat(perplexity): expose search_results in chat model (#31468)
Description
The Perplexity chat model already returns a search_results field, but
LangChain dropped it when mapping Perplexity responses to
additional_kwargs.
This patch adds "search_results" to the allowed attribute lists in both
_stream and _generate, so downstream code can access it just like
images, citations, or related_questions.

Dependencies
None. The change is purely internal; no new imports or optional
dependencies required.


https://community.perplexity.ai/t/new-feature-search-results-field-with-richer-metadata/398

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-16 15:16:35 -04:00
nikk0o046
b1c7de98f5 fix(deepseek): convert tool output arrays to strings (#31913)
## Description
When ChatDeepSeek invokes a tool that returns a list, it results in an
openai.UnprocessableEntityError due to a failure in deserializing the
JSON body.

The root of the problem is that ChatDeepSeek uses BaseChatOpenAI
internally, but the APIs are not identical: OpenAI v1/chat/completions
accepts arrays as tool results, but Deepseek API does not.

As a solution added `_get_request_payload` method to ChatDeepSeek, which
inherits the behavior from BaseChatOpenAI but adds a step to stringify
tool message content in case the content is an array. I also add a unit
test for this.

From the linked issue you can find the full reproducible example the
reporter of the issue provided. After the changes it works as expected.

Source: [Deepseek
docs](https://api-docs.deepseek.com/api/create-chat-completion/)


![image](https://github.com/user-attachments/assets/a59ed3e7-6444-46d1-9dcf-97e40e4e8952)

Source: [OpenAI
docs](https://platform.openai.com/docs/api-reference/chat/create)


![image](https://github.com/user-attachments/assets/728f4fc6-e1a3-4897-b39f-6f1ade07d3dc)


## Issue
Fixes #31394

## Dependencies:
No new dependencies.

## Twitter handle:
Don't have one.

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-16 12:19:44 -04:00
Mohammad Mohtashim
96bf8262e2 fix: fixing missing Docstring Bug if no Docstring is provided in BaseModel class (#31608)
- **Description:** Ensure that the tool description is an empty string
when creating a Structured Tool from a Pydantic class in case no
description is provided
- **Issue:** Fixes #31606

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-16 11:56:05 -04:00
Casi
686a6b754c fix: issue a warning if np.nan or np.inf are in _cosine_similarity argument Matrices (#31532)
- **Description**: issues a warning if inf and nan are passed as inputs
to langchain_core.vectorstores.utils._cosine_similarity
- **Issue**: Fixes #31496
- **Dependencies**: no external dependencies added, only warnings module
imported

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-16 11:50:09 -04:00
Michael Li
12d370a55a fix(cli): exception to prevent swallowing unexpected errors (#31983) 2025-07-16 10:23:43 -04:00
Michael Li
5a4c0c0816 fix(cli): handle exception in remove() (#31982) 2025-07-16 10:23:02 -04:00
Mason Daugherty
ad44f0688b release(core): release 0.3.69 (#32056) 2025-07-15 17:13:46 -04:00
Mason Daugherty
8ad12f3fcf docs: add missing js providers to table (#32055)
Update to show that Cerebras, xAI, and Cloudflare now have JS/TS
equivalents
2025-07-15 17:09:35 -04:00
Jacob Lee
535ba43b0d feat(core): add an option to make deserialization more permissive (#32054)
## Description

Currently when deserializing objects that contain non-deserializable
values, we throw an error. However, there are cases (e.g. proxies that
return response fields containing extra fields like Python datetimes),
where these values are not important and we just want to drop them.

Twitter handle: @hacubu

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-15 17:00:01 -04:00
Mason Daugherty
3b9dd1eba0 docs(groq): cleanup (#32043) 2025-07-15 10:37:37 -04:00
Eugene Yurtsev
02d0a9af6c chore(core): unpin packaging dependency (#32032)
Unpin packaging dependency

---------

Co-authored-by: ntjohnson1 <24689722+ntjohnson1@users.noreply.github.com>
2025-07-14 21:42:32 +00:00
Christophe Bornet
953592d4f7 feat(langchain): add ruff rules G (#32029)
https://docs.astral.sh/ruff/rules/#flake8-logging-format-g
2025-07-14 15:19:36 -04:00
Christophe Bornet
19fff8cba9 feat(langchain): add ruff rules DTZ (#32021)
See https://docs.astral.sh/ruff/rules/#flake8-datetimez-dtz
2025-07-14 12:47:16 -04:00
董哥的黑板报
553ac1863b docs: add deprecation notice for PipelinePromptTemplate (#31999)
**PR title**: 
add deprecation notice for PipelinePromptTemplate

**PR message**: 
In the API documentation, PipelinePromptTemplate is marked as
deprecated, but this is not mentioned in the docs.

I'm submitting this PR to add a deprecation notice to the docs.

**Tests**:
N/A (documentation only)

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-14 15:27:29 +00:00
Andreas V. Jonsterhaug
6dcca35a34 fix(core): correct return type hints in BaseChatPromptTemplate (#32009)
This PR changes the return type hints of the `format_prompt` and
`aformat_prompt` methods in `BaseChatPromptTemplate` from `PromptValue`
to `ChatPromptValue`. Since both methods always return a
`ChatPromptValue`.
2025-07-14 11:00:01 -04:00
Christophe Bornet
d57216c295 feat(core): add ruff rules D to tests except D1 (#32000)
Docs are not required for tests but when there are docstrings, they
shall be correctly formatted.
See https://docs.astral.sh/ruff/rules/#pydocstyle-d
2025-07-14 10:42:03 -04:00
Christophe Bornet
58d4261875 feat(langchain): add ruff rules PTH (#32008)
See https://docs.astral.sh/ruff/rules/#flake8-use-pathlib-pth
2025-07-14 10:41:37 -04:00
Fabio Fontana
fd168e1c11 feat(text-splitters): add Visual Basic 6 support (#31173)
### **Description**

Add Visual Basic 6 support.

---

### **Issue**

No specific issue addressed.

---

### **Dependencies**

No additional dependencies required.

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-14 13:51:16 +00:00
ccurme
4b89483fe6 release(openai): 0.3.28 (#32015) 2025-07-14 10:38:11 +00:00
ccurme
de13f6ae4f fix(openai): support acknowledged safety checks in computer use (#31984) 2025-07-14 07:33:37 -03:00
Michael Li
a04131489e chore: update error message formatting (#31980) 2025-07-11 15:19:51 -04:00
Mason Daugherty
56d6d69ce9 release(groq): 0.3.6 (#31975) 2025-07-11 10:55:26 -04:00
Akshara
103fd6ac0c docs: add Google-style docstrings to tools and llms modules (zapier, … (#31957)
**Description:**
Added standardized Google-style docstrings to improve documentation
consistency across key modules.

Updated files:
- `tools/zapier/tool.py`
- `tools/jira/tool.py`
- `tools/json/tool.py`
- `llms/base.py`

These changes enhance readability and maintain consistency with
LangChain’s documentation style guide.

**Issue:**
Fixes #21983

**Dependencies:**
None

**Twitter handle :**
@Akshara_p_
2025-07-11 09:46:21 -04:00
ccurme
612ccf847a chore: [openai] bump sdk (#31958) 2025-07-10 15:53:41 -04:00