Commit Graph

15076 Commits

Author SHA1 Message Date
Sydney Runkle
9bd028d04a fix: disable int tests on release temporarily (#34685) langchain-core==1.2.7 2026-01-09 12:42:25 -05:00
Mason Daugherty
2e8744559d fix(langchain,langchain-classic): more descriptive error msg when dep is not installed (#34679) 2026-01-09 12:41:55 -05:00
ccurme
19edaa8acb chore(openai): delete outdated test (#34682) 2026-01-09 12:37:44 -05:00
Sydney Runkle
b500244250 fix: rm anth test (#34684) 2026-01-09 12:37:33 -05:00
Sydney Runkle
d972d00b3a chore: dropping openai from release matrix (#34681) 2026-01-09 11:22:49 -05:00
Guofang.Tang
384158daec fix(langchain): infer provider from mixed-case prefixes (#34672)
Fix provider inference for mixed-case model prefixes and add matching
unit coverage.
2026-01-09 11:07:14 -05:00
Sydney Runkle
c080296bed release: langchain-core 1.2.7 (#34678) 2026-01-09 16:02:38 +00:00
Sydney Runkle
323c76504a fix: add test confirming we don't inject args based on args_schema alone (#34677)
pending exclusion from function signature
2026-01-09 11:00:13 -05:00
Sydney Runkle
ed2aa9f747 fix: don't trace injected args only found in signature (#34670)
for the case when they're not included in the `args_schema`

this was predicted by @eyurtsev's comment here:
https://github.com/langchain-ai/langchain/pull/33729/files#r2475538173

pairing w/ this PR in mcp adapters:
https://github.com/langchain-ai/langchain-mcp-adapters/pull/407
2026-01-09 09:58:34 -05:00
Mason Daugherty
76da99e022 release(langchain): 1.2.3 (#34668) langchain==1.2.3 2026-01-08 15:24:32 -05:00
Aman Gupta
2847814c70 feat(core): add more file extensions to ignore in HTML link extraction (#34552)
# feat(core): add more file extensions to ignore in HTML link extraction

## Description
This PR enhances the HTML link extraction utility in  
`libs/core/langchain_core/utils/html.py` by expanding the
`SUFFIXES_TO_IGNORE` list to include additional common binary file
extensions:

- `.webp`
- `.pdf`
- `.docx`
- `.xlsx`
- `.pptx`
- `.pptm`

These file types are non-HTML, non-crawlable resources. Ignoring them
prevents `find_all_links` and `extract_sub_links` from mistakenly
treating such binary assets as navigable links. This improves link
filtering, reduces unnecessary crawling, and aligns behavior with
typical web scraping expectations.

## Summary of Changes
- **Updated** `libs/core/langchain_core/utils/html.py`: Added `.webp`,
`.pdf`, `.docx`, `.xlsx`, `.pptx`, `.pptm` to `SUFFIXES_TO_IGNORE`.

## Related Issues
N/A

## Verification
- `ruff check libs/core/langchain_core/utils/html.py`: **Passed**  
- `mypy libs/core/langchain_core/utils/html.py`: **Passed**  
- `pytest libs/core/tests/unit_tests/utils/test_html.py`: **Passed** (11
tests)

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2026-01-08 14:40:22 -05:00
ccurme
d383f00489 refactor(langchain): engage summarization based on reported usage_metadata (#34632) 2026-01-08 11:12:00 -05:00
Aman Gupta
50c5bb5607 refactor(core): improve docstrings for HTML link extraction utilities (#34550)
# refactor(core): improve docstrings for HTML link extraction utilities

## Description
This PR updates and clarifies the docstrings for `find_all_links` and
`extract_sub_links` in
`libs/core/langchain_core/utils/html.py`.

The previous return-value descriptions were vague (e.g., "all links",
"sub links"). They have now been revised to clearly describe the
behavior and output of each function:

- **find_all_links** → “A list of all links found in the HTML.”
- **extract_sub_links** → “A list of absolute paths to sub links.”

These improvements make the utilities more understandable and
developer-friendly without altering functionality.

## Verification
- `ruff check libs/core/langchain_core/utils/html.py`: **Passed**  
- `pytest libs/core/tests/unit_tests/utils/test_html.py`: **Passed**

## Checklists
- PR title follows the required format: `TYPE(SCOPE): DESCRIPTION`  
- Changes are limited to the `langchain-core` package  
- `make format`, `make lint`, and `make test` pass
2026-01-08 10:21:17 -05:00
Mason Daugherty
2b6911d9af fix(langchain): keep tool call / AIMessage pairings when summarizing (#34609)
Fixes #34282

**Before:** When using agents with tools (like file reading, web search,
etc.), the conversation looks like this:

```
[User]     "Read these 10 files and summarize them"
[AI]       "I'll read all 10 files" + [tool_call: read_file x 10]
[Tool]     "Contents of file1.txt..."
[Tool]     "Contents of file2.txt..."
[Tool]     "Contents of file3.txt..."
... (7 more tool responses)
```

When the conversation gets too long, `SummarizationMiddleware` kicks in
to compress older messages. The problem was:

If you asked to keep the last 6 messages, you'd get:

```
[Summary]  "Here's what happened before..."
[Tool]     "Contents of file5.txt..."
[Tool]     "Contents of file6.txt..."
[Tool]     "Contents of file7.txt..."
[Tool]     "Contents of file8.txt..."
[Tool]     "Contents of file9.txt..."
[Tool]     "Contents of file10.txt..."
```

The AI's original request to read the files (`[AI]` message with
`tool_calls`) was summarized away, but the tool responses remained. This
caused the error:

```
Error code: 400 - "No tool call found for function call output with call_id..."
```

Many APIs require that every tool response has a matching tool request.
Without the AI message, the tool responses are "orphaned."

## The fix

Now when the cutoff lands on tool messages, we **move backward** to
include the AI message that requested those tools:

Same scenario, keeping last 6 messages:

```
[Summary]  "Here's what happened before..."
[AI]       "I'll read all 10 files" + [tool_call: read_file x 10]
[Tool]     "Contents of file1.txt..."
[Tool]     "Contents of file2.txt..."
... (all 10 tool responses)
```

The AI message is preserved along with its tool responses, keeping them
paired together.

## Practical examples

### Example 1: Parallel tool calls

**Scenario:** Agent reads 10 files in parallel, summarization triggers
(see above)

### Example 2: Mixed conversation

**Scenario:** User asks question, AI uses tools, user says thanks

```
[User]     "What's the weather?"
[AI]       "Let me check" + [tool_call: get_weather]
[Tool]     "72F and sunny"
[AI]       "It's 72F and sunny!"
[User]     "Thanks!"
```

Keeping last 2 messages:

| Before (Bug) | After (Fix) |
|--------------|-------------|
| Only `[User] "Thanks!"` kept | `[AI] + [Tool] + [AI] + [User]` all
kept |
| Lost the weather info | Tool pair preserved with response |

### Example 3: Multiple tool sequences

```
[User]     "Search for X"
[AI]       [tool_call: search]
[Tool]     "Results for X"
[User]     "Now search for Y"
[AI]       [tool_call: search]
[Tool]     "Results for Y"
[User]     "Great!"
```

**Keeping last 3 messages:** If cutoff lands on `[Tool] "Results for
Y"`, we now include `[AI] [tool_call: search]` to keep the pair
together.
2026-01-08 10:07:56 -05:00
Guofang.Tang
f805ea9601 test(langchain): cover chat model provider inference (#34657)
Add unit coverage for chat model provider inference across common model
name prefixes. This improves regression protection without touching
runtime

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
2026-01-08 09:59:12 -05:00
Stephan Günther
0276cc0290 fix(langchain): fix copy-paste error on azure_openai embedding provider map (#34655)
Fixes a bug introduced with commit 85f1ba2 (released in `langchain ==
1.2.1`).

Whenever the index embedding of the langgraph-server is configured with
`azure_openai` provider, the wrong class is going to be initialized (and
fails to do so if the now unexpected credentials in environment variable
`OPENAI_API_KEY` is not provided).

Example configuration file `langgraph.json` that will reproduce the
issue:
(see
https://docs.langchain.com/langsmith/cli#adding-semantic-search-to-the-store)

```json
{
  "dependencies": ["."],
  "graphs": {
    "chat": "src/agents/chat/graph.py:graph",
  },
  "store": {
    "index": {
      "embed": "azure_openai:text-embedding-3-small",
      "dims": 1536
    }
  },
  "python_version": "3.13",
  "image_distro": "wolfi"
}
```
2026-01-08 09:54:53 -05:00
Eugene Yurtsev
ceca38d3fe fix(langchain): add test to verify version (#34644)
verify version in langchain to avoid accidental drift
langchain==1.2.2
2026-01-07 22:36:10 +00:00
Eugene Yurtsev
5554a36ad5 release(langchain): release 1.2.2 (#34643)
Release langchain 1.2.2
2026-01-07 17:27:58 -05:00
Harrison Chase
bda22aa1d9 fix(langchain): handle parallel usage of the todo tool in planning middleware (#34637)
The agent should only make a single call to update the todo list at a
time. A parallel call doesn't make sense, but also cannot work as
there's no obvious reducer to use.

On parallel calls of the todo tool, we return ToolMessage containing to
guide the LLM to not call the tool in parallel.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2026-01-07 17:23:56 -05:00
Manas karthik
48cd13114f test(core): add edge case for empty examples in LengthBasedExampleSelector (#34641) 2026-01-07 15:26:53 -05:00
Mohammad Mohtashim
e6a9694f5d fix(core): fix strict schema generation for functions with optional args (#34599) 2026-01-07 15:13:18 -05:00
ccurme
25bb36de81 release(openai): 1.1.7 (#34640) langchain-openai==1.1.7 2026-01-07 14:34:23 -05:00
OysterMax
92afcaae60 fix(openai): raise proper exception OpenAIRefusalError on structured output refusal (#34619) 2026-01-07 14:34:02 -05:00
Sujal M H
7ad1c19d9c fix: handle empty assistant content in Responses API (#34272) (#34296) 2026-01-07 14:21:55 -05:00
Christophe Bornet
f10225184d chore(langchain): fix types in test_wrap_model_call (#34573) 2026-01-07 11:49:46 -05:00
Chris Papademetrious
0c7b7e045d feat(core): support custom message separator in get_buffer_string() (#34569) 2026-01-07 11:46:17 -05:00
Aarav Dugar
4c86e8ba39 chore(groq): document vision support (#34620) 2026-01-07 11:37:05 -05:00
Manas karthik
048de6dfb6 test(text-splitters): add edge case tests for CharacterTextSplitter (#34628) 2026-01-07 11:06:44 -05:00
Mason Daugherty
557eddfd51 refactor(core): add warning for fallback GPT-2 tokenizer usage (#34621) 2026-01-06 19:11:10 -05:00
Mason Daugherty
aa9c63b96a release(langchain): 1.2.1 (#34622) langchain==1.2.1 2026-01-06 19:10:49 -05:00
Mason Daugherty
8aeff95341 fix(core,langchain): use get_buffer_string for message summarization (#34607)
Fixes #34517

Supersedes #34557, #34570

Fixes token inflation in `SummarizationMiddleware` that caused context
window overflow during summarization.

**Root cause:** When formatting messages for the summary prompt,
`str(messages)` was implicitly called, which includes all Pydantic
metadata fields (`usage_metadata`, `response_metadata`,
`additional_kwargs`, etc.). This caused the stringified representation
to use ~2.5x more tokens than `count_tokens_approximately` estimates.

**Problem:**
- Summarization triggers at 85% of context window based on
`count_tokens_approximately`
- But `str(messages)` in the prompt uses 2.5x more tokens
- Results in `ContextLengthExceeded`

**Fix:** Use `get_buffer_string()` to format messages, which produces
compact output:

```
Human: What's the weather?
AI: Let me check...[tool_calls]
Tool: 72°F and sunny
```

Instead of verbose Pydantic repr:

```python
[HumanMessage(content='What's the weather?', additional_kwargs={}, response_metadata={}), ...]
```
2026-01-06 19:05:03 -05:00
Christophe Bornet
0438f8c277 chore(langchain): fix types in test_model_fallback (#34615) 2026-01-06 13:07:18 -05:00
Christophe Bornet
7f4f130479 chore(langchain): fix types in test_pii (#34617) 2026-01-06 13:06:25 -05:00
ccurme
6537939f53 chore(langchain): add admonition around redaction_rules (#34618) 2026-01-06 13:01:09 -05:00
Ademola Balogun
a2529cd805 fix(langchain): correct typo 'langchain experiment' to 'langchain_experimental' in error messages (#34608)
Fixed typo in ImportError messages where "langchain experiment" should
be "langchain_experimental" for consistency with the actual package
name.

This helps improve clarity for users who encounter these error messages
when trying to use deprecated tools that have moved to the
langchain_experimental package.

Related issues: #13858, #13859

Co-authored-by: Ademola <ademicho@gmail>
2026-01-05 18:10:06 -05:00
ccurme
c1f1641018 fix(anthropic): fix version (#34606) langchain-anthropic==1.3.1 2026-01-05 16:03:20 -05:00
ccurme
225e0fa8c9 release(anthropic): 1.3.1 (#34605) 2026-01-05 15:55:15 -05:00
Loganaden Velvindron
f021e899dc fix(anthropic): CVE-2025-68664 (#34563) 2026-01-05 15:51:25 -05:00
lwtaiyty
578cef9622 fix(anthropic): skip cache_control for code_execution blocks (#34579) 2026-01-05 15:40:59 -05:00
Christophe Bornet
7979fd3d9f chore(langchain): fix types in test_composition (#34580) 2026-01-05 14:49:34 -05:00
Christophe Bornet
3b65985551 chore(langchain): fix types in test_decorators (#34583) 2026-01-05 14:47:10 -05:00
Christophe Bornet
c4babed5c6 chore(langchain): fix types in test_wrap_tool_call (#34600) 2026-01-05 14:38:31 -05:00
Christophe Bornet
5ae53fdfb3 chore(langchain): fix types in test_model_call_limit_types (#34601) 2026-01-05 14:37:03 -05:00
Christophe Bornet
901690ceec chore(langchain): fix types in test_file_search and test_human_in_the_loop (#34602) 2026-01-05 14:34:35 -05:00
ゆり
be2c7f1aa8 test(core): add tests for formatting utils and merge functions (#34511)
## Summary
Add comprehensive test coverage for previously untested utilities in
`langchain-core`.

## Changes

### New file: `test_formatting.py` (18 tests)

Tests for `StrictFormatter` class:
- `test_vformat_with_keyword_args` - basic functionality
- `test_vformat_with_multiple_keyword_args` - multiple placeholders
- `test_vformat_with_empty_string` - edge case
- `test_vformat_with_no_placeholders` - literal strings
- `test_vformat_raises_on_positional_args` - error handling
- `test_vformat_raises_on_multiple_positional_args` - error handling
- `test_vformat_with_special_characters` - newlines, tabs
- `test_vformat_with_unicode` - emoji, CJK characters
- `test_vformat_with_format_spec` - format specifications
- `test_vformat_with_nested_braces` - escaped braces

Tests for `validate_input_variables`:
- `test_validate_input_variables_success` - valid input
- `test_validate_input_variables_with_extra_variables` - extra vars
allowed
- `test_validate_input_variables_with_missing_variable` - KeyError
- `test_validate_input_variables_empty_format` - edge case
- `test_validate_input_variables_no_placeholders` - edge case

Tests for `formatter` singleton:
- `test_formatter_is_strict_formatter` - type check
- `test_formatter_format_works` - functionality
- `test_formatter_rejects_positional_args` - error handling

### Extended `test_utils.py` (14 new tests)

Tests for `merge_lists`:
- Parametrized tests covering None handling, simple merge, empty lists,
index-based merging
- `test_merge_lists_multiple_others` - merging 3+ lists
- `test_merge_lists_all_none` - all None inputs

Tests for `merge_obj`:
- Parametrized tests for None, strings, dicts, lists, equal values
- `test_merge_obj_type_mismatch` - TypeError on type mismatch
- `test_merge_obj_unmergeable_values` - ValueError on different values
- `test_merge_obj_tuple_raises` - ValueError for tuples

## Test plan
- [x] Tests follow existing patterns in the codebase
- [x] All tests are unit tests (no network calls)
- [x] Tests cover happy paths and error conditions
- [x] Tests verify no mutation of input data

## AI Disclosure
This contribution was developed with AI assistance (Claude Code).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: yurekami <yurekami@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2026-01-05 14:20:11 -05:00
ccurme
b5c5ba0a5f release(xai): 1.2.1 (#34604) langchain-xai==1.2.1 2026-01-05 13:55:38 -05:00
ccurme
944b43dd25 fix(xai): count reasoning tokens in output total (#34603) 2026-01-05 13:25:30 -05:00
aroun-coumar
730a3676f8 fix(core): strip message IDs from cache keys using model_copy (#33915)
**Description:**  

*Closes
#[33883](https://github.com/langchain-ai/langchain/issues/33883)*

Chat model cache keys are generated by serializing messages via
`dumps(messages)`. The optional `BaseMessage.id` field (a UUID used
solely for tracing/threading) is included in this serialization, causing
functionally identical messages to produce different cache keys. This
results in repeated API calls, cache bloat, and degraded performance in
production workloads (e.g., agents, RAG chains, long conversations).

This change normalizes messages **only for cache key generation** by
stripping the nonsemantic `id` field using Pydantic V2’s
`model_copy(update={"id": None})`. The normalization is applied in both
synchronous and asynchronous cache paths (`_generate_with_cache` /
`_agenerate_with_cache`) immediately before `dumps()`.

```python
normalized_messages = [
    msg.model_copy(update={"id": None})
    if getattr(msg, "id", None) is not None
    else msg
    for msg in messages
]
prompt = dumps(normalized_messages)

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2026-01-05 10:37:10 -05:00
Julia (Juli) Huang
cd5b36456a fix(text-splitters): HTMLSemanticPreservingSplitter nested preserved … (#34587)
Summary
Fixes an issue where HTMLSemanticPreservingSplitter failed to preserve
elements nested inside non-container tags. With these changes, preserved
elements are now correctly detected and handled at any nesting depth.

Root Cause
`_process_element()` only recursed into a small set of hard-coded
container tags (`html`, `body`, `div`, `main`). For other tags, the
subtree was flattened into text, preventing nested preserved elements
(inside `<p>`, `<section>`, `<article>`, etc.) from being detected.


Fix
- Updated traversal logic in _process_element (html.py) to recursively
process child elements for any tag that contains nested elements
- Avoided duplicate text extraction
- Preserved correct placeholder ordering
- Treated leaf nodes as text only

Tests
Adds regression tests covering preserved elements nested inside
non-container tags, including:
- table inside section
- nested divs
- code inside paragraph

All existing tests pass (make lint, format, test, etc).

Breaking changes
None.

Fixes
Fixes #31569

Disclaimer
GitHub Copilot was used to assist with test case design in
test_text_splitters.py and documentation comments; all code logic was
manually implemented and reviewed.

---------

Co-authored-by: julih <julih@julihs-MacBook-Pro.local>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2026-01-05 10:28:27 -05:00
Mohan Kumar S
13cfdf1676 fix(core): exclude injected args from tool schema (#34582) 2026-01-05 09:59:59 -05:00