Commit Graph

948 Commits

Author SHA1 Message Date
Mason Daugherty
a02ad3d192 docs: formatting cleanup (#32188)
* formatting cleaning
* make `init_chat_model` more prominent in list of guides
2025-07-22 15:46:15 -04:00
ccurme
0c4054a7fc release(core): 0.3.71 (#32186) 2025-07-22 15:44:36 -04:00
ccurme
ebf2e11bcb fix(core): exclude api_key from tracing metadata (#32184)
(standard param)
2025-07-22 15:32:12 -04:00
ccurme
8acfd677bc fix(core): add type key when tracing in some cases (#31825) 2025-07-22 18:08:16 +00:00
Copilot
18c64aed6d feat(core): add sanitize_for_postgres utility to fix PostgreSQL NUL byte DataError (#32157)
This PR fixes the PostgreSQL NUL byte issue that causes
`psycopg.DataError` when inserting documents containing `\x00` bytes
into PostgreSQL-based vector stores.

## Problem

PostgreSQL text fields cannot contain NUL (0x00) bytes. When documents
with such characters are processed by PGVector or langchain-postgres
implementations, they fail with:

```
(psycopg.DataError) PostgreSQL text fields cannot contain NUL (0x00) bytes
```

This commonly occurs when processing PDFs, documents from various
loaders, or text extracted by libraries like unstructured that may
contain embedded NUL bytes.

## Solution

Added `sanitize_for_postgres()` utility function to
`langchain_core.utils.strings` that removes or replaces NUL bytes from
text content.

### Key Features

- **Simple API**: `sanitize_for_postgres(text, replacement="")`
- **Configurable**: Replace NUL bytes with empty string (default) or
space for readability
- **Comprehensive**: Handles all problematic examples from the original
issue
- **Well-tested**: Complete unit tests with real-world examples
- **Backward compatible**: No breaking changes, purely additive

### Usage Example

```python
from langchain_core.utils import sanitize_for_postgres
from langchain_core.documents import Document

# Before: This would fail with DataError
problematic_content = "Getting\x00Started with embeddings"

# After: Clean the content before database insertion
clean_content = sanitize_for_postgres(problematic_content)
# Result: "GettingStarted with embeddings"

# Or preserve readability with spaces
readable_content = sanitize_for_postgres(problematic_content, " ")
# Result: "Getting Started with embeddings"

# Use in Document processing
doc = Document(page_content=clean_content, metadata={...})
```

### Integration Pattern

PostgreSQL vector store implementations should sanitize content before
insertion:

```python
def add_documents(self, documents: List[Document]) -> List[str]:
    # Sanitize documents before insertion
    sanitized_docs = []
    for doc in documents:
        sanitized_content = sanitize_for_postgres(doc.page_content, " ")
        sanitized_doc = Document(
            page_content=sanitized_content,
            metadata=doc.metadata,
            id=doc.id
        )
        sanitized_docs.append(sanitized_doc)
    
    return self._insert_documents_to_db(sanitized_docs)
```

## Changes Made

- Added `sanitize_for_postgres()` function in
`langchain_core/utils/strings.py`
- Updated `langchain_core/utils/__init__.py` to export the new function
- Added comprehensive unit tests in
`tests/unit_tests/utils/test_strings.py`
- Validated against all examples from the original issue report

## Testing

All tests pass, including:
- Basic NUL byte removal and replacement
- Multiple consecutive NUL bytes
- Empty string handling
- Real examples from the GitHub issue
- Backward compatibility with existing string utilities

This utility enables PostgreSQL integrations in both langchain-community
and langchain-postgres packages to handle documents with NUL bytes
reliably.

Fixes #26033.

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-21 20:33:20 -04:00
Mohammad Mohtashim
095f4a7c28 fix(core): fix parse_resultin case of self.first_tool_only with multiple keys matching for JsonOutputKeyToolsParser (#32106)
* **Description:** Updated `parse_result` logic to handle cases where
`self.first_tool_only` is `True` and multiple matching keys share the
same function name. Instead of returning the first match prematurely,
the method now prioritizes filtering results by the specified key to
ensure correct selection.
* **Issue:** #32100

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-21 12:50:22 -04:00
ccurme
0355da3159 release(core): 0.3.70 (#32144) 2025-07-21 10:49:32 -04:00
astraszab
668c084520 docs(core): move incorrect arg limitation in rate limiter's docstring (#32118) 2025-07-20 14:28:35 -04:00
Yoshi
6d71bb83de fix(core): fix docstrings and add sleep to FakeListChatModel._call (#32108) 2025-07-19 17:30:15 -04:00
Isaac Francisco
98bfd57a76 fix(core): better error message for empty var names (#32073)
Previously, we hit an index out of range error with empty variable names
(accessing tag[0]), now we through a slightly nicer error

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-18 17:00:02 -04:00
Gurram Siddarth Reddy
427d2d6397 fix(core): implement sleep delay in FakeMessagesListChatModel _generate (#32014)
implement sleep delay in FakeMessagesListChatModel._generate so the
sleep parameter is respected, matching the documented behavior. This
adds artificial latency between responses for testing purposes.

Issue: closes
[#31974](https://github.com/langchain-ai/langchain/issues/31974)
following
[docs](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.fake_chat_models.FakeMessagesListChatModel.html#langchain_core.language_models.fake_chat_models.FakeMessagesListChatModel.sleep)

Dependencies: none

Twitter handle: [@siddarthreddyg2](https://x.com/siddarthreddyg2)

---------

Signed-off-by: Siddarthreddygsr <siddarthreddygsr@gmail.com>
2025-07-18 15:54:28 -04:00
open-swe[bot]
5da986c3f6 fix(core): JSON Schema reference resolution for list indices (#32088)
Fixes #32042

## Summary
Fixes a critical bug in JSON Schema reference resolution that prevented
correctly dereferencing numeric components in JSON pointer paths,
specifically for list indices in `anyOf`, `oneOf`, and `allOf` arrays.

## Changes
- Fixed `_retrieve_ref` function in
`libs/core/langchain_core/utils/json_schema.py` to properly handle
numeric components
- Added comprehensive test function `test_dereference_refs_list_index()`
in `libs/core/tests/unit_tests/utils/test_json_schema.py`
- Resolved line length formatting issues
- Improved type checking and index validation for list and dictionary
references

## Key Improvements
- Correctly handles list index references in JSON pointer paths
- Maintains backward compatibility with existing dictionary numeric key
functionality
- Adds robust error handling for out-of-bounds and invalid indices
- Passes all test cases covering various reference scenarios

## Test Coverage
- Verified fix for `#/properties/payload/anyOf/1/properties/startDate`
reference
- Tested edge cases including out-of-bounds and negative indices
- Ensured no regression in existing reference resolution functionality

Resolves the reported issue with JSON Schema reference dereferencing for
list indices.

---------

Co-authored-by: open-swe-dev[bot] <open-swe-dev@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-17 15:54:38 -04:00
efj-amzn
d3072e2d2e feat(core): update _import_utils.py to not mask the thrown exception (#32071) 2025-07-16 17:11:56 -04:00
Mohammad Mohtashim
96bf8262e2 fix: fixing missing Docstring Bug if no Docstring is provided in BaseModel class (#31608)
- **Description:** Ensure that the tool description is an empty string
when creating a Structured Tool from a Pydantic class in case no
description is provided
- **Issue:** Fixes #31606

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-16 11:56:05 -04:00
Casi
686a6b754c fix: issue a warning if np.nan or np.inf are in _cosine_similarity argument Matrices (#31532)
- **Description**: issues a warning if inf and nan are passed as inputs
to langchain_core.vectorstores.utils._cosine_similarity
- **Issue**: Fixes #31496
- **Dependencies**: no external dependencies added, only warnings module
imported

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-16 11:50:09 -04:00
Mason Daugherty
ad44f0688b release(core): release 0.3.69 (#32056) 2025-07-15 17:13:46 -04:00
Jacob Lee
535ba43b0d feat(core): add an option to make deserialization more permissive (#32054)
## Description

Currently when deserializing objects that contain non-deserializable
values, we throw an error. However, there are cases (e.g. proxies that
return response fields containing extra fields like Python datetimes),
where these values are not important and we just want to drop them.

Twitter handle: @hacubu

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-15 17:00:01 -04:00
董哥的黑板报
553ac1863b docs: add deprecation notice for PipelinePromptTemplate (#31999)
**PR title**: 
add deprecation notice for PipelinePromptTemplate

**PR message**: 
In the API documentation, PipelinePromptTemplate is marked as
deprecated, but this is not mentioned in the docs.

I'm submitting this PR to add a deprecation notice to the docs.

**Tests**:
N/A (documentation only)

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-14 15:27:29 +00:00
Andreas V. Jonsterhaug
6dcca35a34 fix(core): correct return type hints in BaseChatPromptTemplate (#32009)
This PR changes the return type hints of the `format_prompt` and
`aformat_prompt` methods in `BaseChatPromptTemplate` from `PromptValue`
to `ChatPromptValue`. Since both methods always return a
`ChatPromptValue`.
2025-07-14 11:00:01 -04:00
Azhagammal
4d9c0b0883 fix[core]: added error message if the query vector or embedding contains NaN values (#31822)
**Description:**  
Added an explicit validation step in
`langchain_core.vectorstores.utils._cosine_similarity` to raise a
`ValueError` if the input query or any embedding contains `NaN` values.
This prevents silent failures or unstable behavior during similarity
calculations, especially when using maximal_marginal_relevance.

**Issue**:
Fixes #31806 

**Dependencies:**  
None

---------

Co-authored-by: Azhagammal S C <azhagammal@kofluence.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-09 18:30:26 -04:00
Chris G
65b098325b core: docs: clarify where the kwargs in on_tool_start and on_tool_end go (#31909)
**Description:**  
I traced the kwargs starting at `.invoke()` and it was not clear where
they go. it was clarified to two layers down. so I changed it to make it
more documented for the next person.


**Issue:**  
No related issue.

**Dependencies:**  
No dependency changes.

**Twitter handle:**  
Nah. We're good.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-08 10:35:31 -04:00
Mason Daugherty
e686a70ee0 ollama: thinking, tool streaming, docs, tests (#31772)
* New `reasoning` (bool) param to support toggling [Ollama
thinking](https://ollama.com/blog/thinking) (#31573, #31700). If
`reasoning=True`, Ollama's `thinking` content will be placed in the
model responses' `additional_kwargs.reasoning_content`.
  * Supported by:
    * ChatOllama (class level, invocation level TODO)
    * OllamaLLM (TODO)
* Added tests to ensure streaming tool calls is successful (#29129)
* Refactored tests that relied on `extract_reasoning()`
* Myriad docs additions and consistency/typo fixes
* Improved type safety in some spots

Closes #29129
Addresses #31573 and #31700
Supersedes #31701
2025-07-07 13:56:41 -04:00
Michael Li
47d330f4e6 fix: fix file open with encoding in chat_history.py (#31884)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
  - Example: "core: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-07 13:30:59 -04:00
Christophe Bornet
03e8327e01 core: Ruff preview fixes (#31877)
Auto-fixes from `uv run ruff check --fix --unsafe-fixes --preview`

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-07 13:02:40 -04:00
Christophe Bornet
8aed3b61a9 core: Bump ruff version to 0.12 (#31846) 2025-07-07 10:02:51 -04:00
Mohammad Mohtashim
b26d2250ba core[patch]: Int Combine when Merging Dicts (#31572)
- **Description:** Combining the Int Types by adding them which makes
the most sense.
- **Issue:**  #31565
2025-07-04 14:44:16 -04:00
ccurme
2090f85789 core: release 0.3.68 (#31848)
Also add `search_result` to recognized tool message block types.
2025-07-03 12:36:25 -04:00
Eugene Yurtsev
73fefe0295 core[path]: Use context manager for FileCallbackHandler (#31813)
Recommend using context manager for FileCallbackHandler to avoid opening
too many file descriptors

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-02 13:31:58 -04:00
ccurme
04cc674e80 core: release 0.3.67 (#31791) 2025-06-30 12:00:39 -04:00
ccurme
46cef90f7b core: expose tool message recognized block types (#31787) 2025-06-30 11:19:34 -04:00
Mason Daugherty
9aa75eaef3 docs: enhance docstring for disable_streaming parameter in BaseChatModel (#31759)
Resolves #31758
2025-06-27 11:27:41 -04:00
Mason Daugherty
3c3320ae30 fix: update import paths for ChatOllama to use langchain_ollama instead of community (#31721) 2025-06-24 16:19:31 -04:00
Eugene Yurtsev
9164e6f906 core[patch]: Add additional hashing options to indexing API, warn on SHA-1 (#31649)
Add additional hashing options to the indexing API, warn on SHA-1

Requires:

- Bumping langchain-core version
- bumping min langchain-core in langchain

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-06-24 14:44:06 -04:00
Mason Daugherty
6d71b6b6ee standard-tests: refactoring and fixes (#31703)
- `libs/core/langchain_core/messages/base.py`: add model name to
examples [per
docs](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.html#langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.test_usage_metadata)
("0.3.17: Additionally check for the presence of model_name in the
response metadata, which is needed for usage tracking in callback
handlers")
- `libs/core/langchain_core/utils/function_calling.py`: correct typo
-
`libs/standard-tests/langchain_tests/integration_tests/chat_models.py`:
- `magic_function(input)` -> `magic_function(_input)` to prevent warning
about redefining built in `input`
    - relocate a few tests for better grouping and narrative flow
    - suppress some type hint warnings following suit from similar tests
    - fix a few more typos
- validate not only that `model_name` is defined, but that it is not
empty (test_usage_metadata)
2025-06-23 23:22:31 +00:00
ccurme
ee83993b91 docs: document Anthropic cache TTL count details (#31708) 2025-06-23 20:16:42 +00:00
Christophe Bornet
b1cc972567 core[patch]: Improve RunnableWithMessageHistory init arg types (#31639)
`Runnable`'s `Input` is contravariant so we need to enumerate all
possible inputs and it's not possible to put them in a `Union`.
Also, it's better to only require a runnable that
accepts`list[BaseMessage]` instead of a broader `Sequence[BaseMessage]`
as internally the runnable is only called with a list.
2025-06-23 13:45:52 -04:00
Mikhail
6105a5841b core: fix get_buffer_string output for structured message content (#31600) 2025-06-20 23:21:50 +00:00
Bagatur
5271fd76f1 core[patch]: check before removing tags (#31691) 2025-06-20 17:46:50 -04:00
ccurme
39a8a1121a core: release 0.3.66 (#31690) 2025-06-20 17:45:03 -04:00
Mohammad Mohtashim
7ff405077d core[patch]: Returning always 2D Array for _cosine_similarity (#31528)
- **Description:** Very simple change in `_cosine_similarity` which
always 2D array.
- **Issue:** #31497
2025-06-20 11:25:02 -04:00
Eugene Yurtsev
2842e0c8c1 core[patch]: Add doc-strings to tools/base.py (#31684)
Add doc-strings
2025-06-20 11:16:57 -04:00
Christophe Bornet
7e046ea848 core: Cleanup Pydantic models and handle deprecation warnings (#30799)
* Simplified Pydantic handling since Pydantic v1 is not supported
anymore.
* Replace use of deprecated v1 methods by corresponding v2 methods.
* Remove use of other deprecated methods.
* Activate mypy errors on deprecated methods use.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-06-20 10:42:52 -04:00
Nuno Campos
ddc850ca72 core: In LangChainTracer, send only the first token event (#31591)
- only the first one is used for analytics
2025-06-12 14:04:23 -07:00
ccurme
b0f100af7e core: release 0.3.65 (#31557) 2025-06-10 19:39:50 +00:00
Sydney Runkle
5b165effcd core(fix): revert set_text optimization (#31555)
Revert serialization regression introduced in
https://github.com/langchain-ai/langchain/pull/31238

Fixes https://github.com/langchain-ai/langchain/issues/31486
2025-06-10 13:36:55 -04:00
lc-arjun
35ae5eab4f core: use run tree post/patch (#31500)
Use run post/patch
2025-06-05 14:05:57 -07:00
Mohammad Mohtashim
ae3551c96b core[patch]: Correct type casting of annotations in _infer_arg_descriptions (#31181)
- **Description:** 
- In _infer_arg_descriptions, the annotations dictionary contains string
representations of types instead of actual typing objects. This causes
_is_annotated_type to fail, preventing the correct description from
being generated.
- This is a simple fix using the get_type_hints method, which resolves
the annotations properly and is supported across all Python versions.

  - **Issue:** #31051

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-06-05 11:58:36 -04:00
ccurme
741bb1ffa1 core[patch]: revert change to stream type hint (#31501)
https://github.com/langchain-ai/langchain/pull/31286 included an update
to the return type for `BaseChatModel.(a)stream`, from
`Iterator[BaseMessageChunk]` to `Iterator[BaseMessage]`.

This change is correct, because when streaming is disabled, the stream
methods return an iterator of `BaseMessage`, and the inheritance is such
that an `BaseMessage` is not a `BaseMessageChunk` (but the reverse is
true).

However, LangChain includes a pattern throughout its docs of [summing
BaseMessageChunks](https://python.langchain.com/docs/how_to/streaming/#llms-and-chat-models)
to accumulate a chat model stream. This pattern is implemented in tests
for most integration packages and appears in application code. So
https://github.com/langchain-ai/langchain/pull/31286 introduces mypy
errors throughout the ecosystem (or maybe more accurately, it reveals
that this pattern does not account for use of the `.stream` method when
streaming is disabled).

Here we revert just the change to the stream return type to unblock
things. A fix for this should address docs + integration packages (or if
we elect to just force people to update code, be explicit about that).
2025-06-05 11:20:06 -04:00
Christophe Bornet
539e5b6936 core: Add mypy strict-equality rule (#31286) 2025-06-02 18:24:35 +00:00
Sam Zhang
2c4e0ab3bc fix: module 'defusedxml' has no attribute 'ElementTree' (#31429) (#31431)
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-06-02 18:09:22 +00:00