Commit Graph

1255 Commits

Author SHA1 Message Date
Chester Curme
844b8b87d7 Merge branch 'standard_outputs' into cc/openai_v1
# Conflicts:
#	libs/core/langchain_core/language_models/v1/chat_models.py
#	libs/core/langchain_core/messages/utils.py
#	libs/core/langchain_core/messages/v1.py
#	libs/partners/openai/langchain_openai/chat_models/_compat.py
#	libs/partners/openai/langchain_openai/chat_models/base.py
2025-07-28 12:38:32 -04:00
Chester Curme
61e329637b lint 2025-07-28 11:02:37 -04:00
Chester Curme
b8fed06409 move get_num_tokens_from_messages to BaseChatModel and BaseChatModelV1 2025-07-28 10:58:57 -04:00
Mason Daugherty
ef9b5a9e18
add back standard_outputs 2025-07-28 10:47:26 -04:00
Mason Daugherty
5e9eb19a83
chore: update branch with changes from master (#32277)
Co-authored-by: Maxime Grenu <69890511+cluster2600@users.noreply.github.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: jmaillefaud <jonathan.maillefaud@evooq.ch>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: tanwirahmad <tanwirahmad@users.noreply.github.com>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: niceg <79145285+growmuye@users.noreply.github.com>
Co-authored-by: Chaitanya varma <varmac301@gmail.com>
Co-authored-by: dishaprakash <57954147+dishaprakash@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Kanav Bansal <13186335+bansalkanav@users.noreply.github.com>
Co-authored-by: Aleksandr Filippov <71711753+alex-feel@users.noreply.github.com>
Co-authored-by: Alex Feel <afilippov@spotware.com>
2025-07-28 10:39:41 -04:00
Chester Curme
c409f723a2 Merge branch 'standard_outputs' into cc/openai_v1
# Conflicts:
#	libs/core/langchain_core/messages/utils.py
2025-07-28 10:19:50 -04:00
ccurme
3d9e694f73
feat(core): start on v1 chat model (#32276)
Co-authored-by: Nuno Campos <nuno@langchain.dev>
2025-07-28 10:17:06 -04:00
Mason Daugherty
c921d08b18
feat(docs): add docstring to _convert_from_v1_message() 2025-07-25 11:01:48 -04:00
Mason Daugherty
3f653011e6
nit: use block instead of content_block for consistency in convert_to_openai_image_block() 2025-07-25 10:57:22 -04:00
Mason Daugherty
ee13a3b6fa
nit: rearrange index to be grouped with other always-present fields 2025-07-25 10:16:35 -04:00
Chester Curme
4899857042 start on openai 2025-07-24 17:12:22 -04:00
Chester Curme
041b196145 Revert "copy BaseChatModel to language_models.v1"
This reverts commit 2d031031e3.
2025-07-24 13:33:41 -04:00
Chester Curme
dd8057a034 remove type ignores for eugene 2025-07-24 13:31:50 -04:00
Chester Curme
b94f23883f move best-effort v1 conversion 2025-07-24 13:31:27 -04:00
Chester Curme
2d031031e3 copy BaseChatModel to language_models.v1 2025-07-24 09:56:45 -04:00
ccurme
e9b0b84675
feat: new message formats (v0.4) (#32208)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-07-23 13:30:21 -04:00
Chester Curme
eb8d32aff2 output_version -> str 2025-07-23 09:38:01 -04:00
Chester Curme
78d036a093 Merge branch 'wip-v0.4' into standard_outputs 2025-07-23 09:34:20 -04:00
Chester Curme
6572656cd2 core: support both old and new data content blocks 2025-07-22 18:19:09 -04:00
Chester Curme
b1a02f971b fix tests 2025-07-22 16:45:19 -04:00
Mason Daugherty
a02ad3d192
docs: formatting cleanup (#32188)
* formatting cleaning
* make `init_chat_model` more prominent in list of guides
2025-07-22 15:46:15 -04:00
ccurme
0c4054a7fc
release(core): 0.3.71 (#32186) 2025-07-22 15:44:36 -04:00
ccurme
ebf2e11bcb
fix(core): exclude api_key from tracing metadata (#32184)
(standard param)
2025-07-22 15:32:12 -04:00
ccurme
8acfd677bc
fix(core): add type key when tracing in some cases (#31825) 2025-07-22 18:08:16 +00:00
Mason Daugherty
b24f90dabe
refactor(core): standard content blocks (#32085) 2025-07-22 09:17:55 -04:00
Copilot
18c64aed6d
feat(core): add sanitize_for_postgres utility to fix PostgreSQL NUL byte DataError (#32157)
This PR fixes the PostgreSQL NUL byte issue that causes
`psycopg.DataError` when inserting documents containing `\x00` bytes
into PostgreSQL-based vector stores.

## Problem

PostgreSQL text fields cannot contain NUL (0x00) bytes. When documents
with such characters are processed by PGVector or langchain-postgres
implementations, they fail with:

```
(psycopg.DataError) PostgreSQL text fields cannot contain NUL (0x00) bytes
```

This commonly occurs when processing PDFs, documents from various
loaders, or text extracted by libraries like unstructured that may
contain embedded NUL bytes.

## Solution

Added `sanitize_for_postgres()` utility function to
`langchain_core.utils.strings` that removes or replaces NUL bytes from
text content.

### Key Features

- **Simple API**: `sanitize_for_postgres(text, replacement="")`
- **Configurable**: Replace NUL bytes with empty string (default) or
space for readability
- **Comprehensive**: Handles all problematic examples from the original
issue
- **Well-tested**: Complete unit tests with real-world examples
- **Backward compatible**: No breaking changes, purely additive

### Usage Example

```python
from langchain_core.utils import sanitize_for_postgres
from langchain_core.documents import Document

# Before: This would fail with DataError
problematic_content = "Getting\x00Started with embeddings"

# After: Clean the content before database insertion
clean_content = sanitize_for_postgres(problematic_content)
# Result: "GettingStarted with embeddings"

# Or preserve readability with spaces
readable_content = sanitize_for_postgres(problematic_content, " ")
# Result: "Getting Started with embeddings"

# Use in Document processing
doc = Document(page_content=clean_content, metadata={...})
```

### Integration Pattern

PostgreSQL vector store implementations should sanitize content before
insertion:

```python
def add_documents(self, documents: List[Document]) -> List[str]:
    # Sanitize documents before insertion
    sanitized_docs = []
    for doc in documents:
        sanitized_content = sanitize_for_postgres(doc.page_content, " ")
        sanitized_doc = Document(
            page_content=sanitized_content,
            metadata=doc.metadata,
            id=doc.id
        )
        sanitized_docs.append(sanitized_doc)
    
    return self._insert_documents_to_db(sanitized_docs)
```

## Changes Made

- Added `sanitize_for_postgres()` function in
`langchain_core/utils/strings.py`
- Updated `langchain_core/utils/__init__.py` to export the new function
- Added comprehensive unit tests in
`tests/unit_tests/utils/test_strings.py`
- Validated against all examples from the original issue report

## Testing

All tests pass, including:
- Basic NUL byte removal and replacement
- Multiple consecutive NUL bytes
- Empty string handling
- Real examples from the GitHub issue
- Backward compatibility with existing string utilities

This utility enables PostgreSQL integrations in both langchain-community
and langchain-postgres packages to handle documents with NUL bytes
reliably.

Fixes #26033.

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-21 20:33:20 -04:00
Mohammad Mohtashim
095f4a7c28
fix(core): fix parse_resultin case of self.first_tool_only with multiple keys matching for JsonOutputKeyToolsParser (#32106)
* **Description:** Updated `parse_result` logic to handle cases where
`self.first_tool_only` is `True` and multiple matching keys share the
same function name. Instead of returning the first match prematurely,
the method now prioritizes filtering results by the specified key to
ensure correct selection.
* **Issue:** #32100

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-21 12:50:22 -04:00
ccurme
0355da3159
release(core): 0.3.70 (#32144) 2025-07-21 10:49:32 -04:00
astraszab
668c084520
docs(core): move incorrect arg limitation in rate limiter's docstring (#32118) 2025-07-20 14:28:35 -04:00
Yoshi
6d71bb83de
fix(core): fix docstrings and add sleep to FakeListChatModel._call (#32108) 2025-07-19 17:30:15 -04:00
Isaac Francisco
98bfd57a76
fix(core): better error message for empty var names (#32073)
Previously, we hit an index out of range error with empty variable names
(accessing tag[0]), now we through a slightly nicer error

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-18 17:00:02 -04:00
Gurram Siddarth Reddy
427d2d6397
fix(core): implement sleep delay in FakeMessagesListChatModel _generate (#32014)
implement sleep delay in FakeMessagesListChatModel._generate so the
sleep parameter is respected, matching the documented behavior. This
adds artificial latency between responses for testing purposes.

Issue: closes
[#31974](https://github.com/langchain-ai/langchain/issues/31974)
following
[docs](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.fake_chat_models.FakeMessagesListChatModel.html#langchain_core.language_models.fake_chat_models.FakeMessagesListChatModel.sleep)

Dependencies: none

Twitter handle: [@siddarthreddyg2](https://x.com/siddarthreddyg2)

---------

Signed-off-by: Siddarthreddygsr <siddarthreddygsr@gmail.com>
2025-07-18 15:54:28 -04:00
open-swe[bot]
5da986c3f6
fix(core): JSON Schema reference resolution for list indices (#32088)
Fixes #32042

## Summary
Fixes a critical bug in JSON Schema reference resolution that prevented
correctly dereferencing numeric components in JSON pointer paths,
specifically for list indices in `anyOf`, `oneOf`, and `allOf` arrays.

## Changes
- Fixed `_retrieve_ref` function in
`libs/core/langchain_core/utils/json_schema.py` to properly handle
numeric components
- Added comprehensive test function `test_dereference_refs_list_index()`
in `libs/core/tests/unit_tests/utils/test_json_schema.py`
- Resolved line length formatting issues
- Improved type checking and index validation for list and dictionary
references

## Key Improvements
- Correctly handles list index references in JSON pointer paths
- Maintains backward compatibility with existing dictionary numeric key
functionality
- Adds robust error handling for out-of-bounds and invalid indices
- Passes all test cases covering various reference scenarios

## Test Coverage
- Verified fix for `#/properties/payload/anyOf/1/properties/startDate`
reference
- Tested edge cases including out-of-bounds and negative indices
- Ensured no regression in existing reference resolution functionality

Resolves the reported issue with JSON Schema reference dereferencing for
list indices.

---------

Co-authored-by: open-swe-dev[bot] <open-swe-dev@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-17 15:54:38 -04:00
efj-amzn
d3072e2d2e
feat(core): update _import_utils.py to not mask the thrown exception (#32071) 2025-07-16 17:11:56 -04:00
Mason Daugherty
3c19cafab0
docs: improve output_version description (#31977) 2025-07-16 12:29:07 -04:00
Mohammad Mohtashim
96bf8262e2
fix: fixing missing Docstring Bug if no Docstring is provided in BaseModel class (#31608)
- **Description:** Ensure that the tool description is an empty string
when creating a Structured Tool from a Pydantic class in case no
description is provided
- **Issue:** Fixes #31606

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-16 11:56:05 -04:00
Casi
686a6b754c
fix: issue a warning if np.nan or np.inf are in _cosine_similarity argument Matrices (#31532)
- **Description**: issues a warning if inf and nan are passed as inputs
to langchain_core.vectorstores.utils._cosine_similarity
- **Issue**: Fixes #31496
- **Dependencies**: no external dependencies added, only warnings module
imported

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-16 11:50:09 -04:00
Mason Daugherty
ad44f0688b
release(core): release 0.3.69 (#32056) 2025-07-15 17:13:46 -04:00
Jacob Lee
535ba43b0d
feat(core): add an option to make deserialization more permissive (#32054)
## Description

Currently when deserializing objects that contain non-deserializable
values, we throw an error. However, there are cases (e.g. proxies that
return response fields containing extra fields like Python datetimes),
where these values are not important and we just want to drop them.

Twitter handle: @hacubu

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-15 17:00:01 -04:00
Eugene Yurtsev
02d0a9af6c
chore(core): unpin packaging dependency (#32032)
Unpin packaging dependency

---------

Co-authored-by: ntjohnson1 <24689722+ntjohnson1@users.noreply.github.com>
2025-07-14 21:42:32 +00:00
董哥的黑板报
553ac1863b
docs: add deprecation notice for PipelinePromptTemplate (#31999)
**PR title**: 
add deprecation notice for PipelinePromptTemplate

**PR message**: 
In the API documentation, PipelinePromptTemplate is marked as
deprecated, but this is not mentioned in the docs.

I'm submitting this PR to add a deprecation notice to the docs.

**Tests**:
N/A (documentation only)

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-14 15:27:29 +00:00
Andreas V. Jonsterhaug
6dcca35a34
fix(core): correct return type hints in BaseChatPromptTemplate (#32009)
This PR changes the return type hints of the `format_prompt` and
`aformat_prompt` methods in `BaseChatPromptTemplate` from `PromptValue`
to `ChatPromptValue`. Since both methods always return a
`ChatPromptValue`.
2025-07-14 11:00:01 -04:00
Christophe Bornet
d57216c295
feat(core): add ruff rules D to tests except D1 (#32000)
Docs are not required for tests but when there are docstrings, they
shall be correctly formatted.
See https://docs.astral.sh/ruff/rules/#pydocstyle-d
2025-07-14 10:42:03 -04:00
Chester Curme
7c1b59d26a add test for beta content 2025-07-11 21:03:18 -04:00
Chester Curme
3460c48af6 cr 2025-07-11 15:25:07 -04:00
Chester Curme
7e740e5e1f cr 2025-07-11 15:16:37 -04:00
Chester Curme
679a9e7c8f implement beta_content 2025-07-11 14:05:45 -04:00
Chester Curme
67fc58011a remove total 2025-07-10 17:53:21 -04:00
Chester Curme
a3a95805eb revert 2025-07-10 17:53:08 -04:00
Chester Curme
354f5d1c7a NotRequired -> Required 2025-07-10 17:53:00 -04:00