Commit Graph

13863 Commits

Author SHA1 Message Date
Eugene Yurtsev
52daddf076 x 2025-07-24 11:05:04 -04:00
Eugene Yurtsev
37d801fb48 x 2025-07-24 11:02:52 -04:00
Eugene Yurtsev
7995c719c5
chore(langchain_v1): clean anything uncertain (#32228)
Further clean up of namespace:

- Removed prompts (we'll re-add in a separate commit)
- Remove LocalFileStore until we can review whether all the
implementation details are necessary
- Remove message processing logic from memory (we'll figure out where to
expose it)
- Remove `Tool` primitive (should be sufficient to use `BaseTool` for
typing purposes)
- Remove utilities to create kv stores. Unclear if they've had much
usage outside MultiparentRetriever
2025-07-24 14:41:05 +00:00
Mason Daugherty
bdf1cd383c
fix(langchain): update deps 2025-07-24 10:37:08 -04:00
Mason Daugherty
77c981999e
fix(text-splitters): update langchain-core version to 0.3.72 2025-07-24 10:35:07 -04:00
Mason Daugherty
7f015b6f14
fix(text-splitters): update lock for release 2025-07-24 10:32:04 -04:00
Mason Daugherty
71ad451e1f
Merge branch 'master' of github.com:langchain-ai/langchain 2025-07-24 10:24:17 -04:00
Mason Daugherty
2c42893703
fix(langchain): update langchain-core version to 0.3.72 2025-07-24 10:24:04 -04:00
Mason Daugherty
0e139fb9a6
release(langchain): 0.3.27 (#32227) 2025-07-24 10:20:20 -04:00
tanwirahmad
622bb05751
fix(langchain): class HTMLSemanticPreservingSplitter ignores the text inside the div tag (#32213)
**Description:** We collect the text from the "html", "body", "div", and
"main" nodes, if they have any.

**Issue:** Fixes #32206.
2025-07-24 10:09:03 -04:00
Eugene Yurtsev
56dde3ade3
feat(langchain): v1 scaffolding (#32166)
This PR adds scaffolding for langchain 1.0 entry package.

Most contents have been removed. 

Currently remaining entrypoints for:

* chat models
* embedding models
* memory -> trimming messages, filtering messages and counting tokens
[we may remove this]
* prompts -> we may remove some prompts
* storage: primarily to support cache backed embeddings, may remove the
kv store
* tools -> report tool primitives

Things to be added:

* Selected agent implementations
* Selected workflows
* Common primitives: messages, Document
* Primitives for type hinting: BaseChatModel, BaseEmbeddings
* Selected retrievers
* Selected text splitters

Things to be removed:

* Globals needs to be removed (needs an update in langchain core)


Todos: 

* TBD indexing api (requires sqlalchemy which we don't want as a
dependency)
* Be explicit about public/private interfaces (e.g., likely rename
chat_models.base.py to something more internal)
* Remove dockerfiles
* Update module doc-strings and README.md
2025-07-24 09:47:48 -04:00
Mason Daugherty
bd3d6496f3
release(core): 0.3.72 (#32214)
fixes #32170
2025-07-23 20:33:48 -04:00
jmaillefaud
fb5da8384e
fix(core): Dereference Refs for pydantic schema fails in tool schema generation (#32203)
The `_dereference_refs_helper` in `langchain_core.utils.json_schema`
incorrectly handled objects with a reference and other fields.

**Issue**: #32170

# Description

We change the check so that it accepts other keys in the object.
2025-07-23 20:28:27 -04:00
Maxime Grenu
a7d0e42f3f
docs: fix typos in documentation (#32201)
## Summary
- Fixed redundant word "done" in SECURITY.md line 69  
- Fixed grammar errors in Fireworks README.md line 77: "how it fares
compares" → "how it compares" and "in terms just" → "in terms of"

## Test plan
- [x] Verified changes improve readability and correct grammar
- [x] No functional changes, documentation only

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-07-23 10:43:25 -04:00
Christophe Bornet
3496e1739e
feat(langchain): add ruff rules PL (#32079)
See https://docs.astral.sh/ruff/rules/#pylint-pl
2025-07-22 23:55:32 -04:00
Jacob Lee
0f39155f62
docs: Specify environment variables for BedrockConverse (#32194) 2025-07-22 17:37:47 -04:00
ccurme
6aeda24a07
docs(chroma): update feature table (#32193)
Supports multi-tenancy.
2025-07-22 20:55:07 +00:00
Mason Daugherty
3ed804a5f3
fix(perplexity): undo xfails (#32192) 2025-07-22 16:29:37 -04:00
Mason Daugherty
ca137bfe62
. 2025-07-22 16:25:02 -04:00
Mason Daugherty
fa487fb62d
fix(perplexity): temp xfail int tests (#32191)
It appears the API has changes since the 2025-04-15 release, leading to
failed integration tests.
2025-07-22 16:20:51 -04:00
ccurme
053fb16a05
revert: drop anthropic from core test matrix (#32190)
Reverts langchain-ai/langchain#32185
2025-07-22 20:13:02 +00:00
ccurme
3672bbc71e
fix(anthropic): update integration test models (#32189)
Multiple models were
[retired](https://docs.anthropic.com/en/docs/about-claude/model-deprecations#model-status)
yesterday.

Tests remain broken until we figure out what to do with the legacy
Anthropic LLM integration— currently uses their (legacy) text
completions API, for which there appear to be no remaining supported
models.
2025-07-22 19:51:39 +00:00
Mason Daugherty
a02ad3d192
docs: formatting cleanup (#32188)
* formatting cleaning
* make `init_chat_model` more prominent in list of guides
2025-07-22 15:46:15 -04:00
ccurme
0c4054a7fc
release(core): 0.3.71 (#32186) 2025-07-22 15:44:36 -04:00
ccurme
75517c3ea9
chore(infra): drop anthropic from core test matrix (#32185) 2025-07-22 19:38:58 +00:00
ccurme
ebf2e11bcb
fix(core): exclude api_key from tracing metadata (#32184)
(standard param)
2025-07-22 15:32:12 -04:00
ccurme
e41e6ec6aa
release(chroma): 0.2.5 (#32183) 2025-07-22 15:24:03 -04:00
itaismith
09769373b3
feat(chroma): Add Chroma Cloud support (#32125)
* Adding support for more Chroma client options (`HttpClient` and
`CloundClient`). This includes adding arguments necessary for
instantiating these clients.
* Adding support for Chroma's new persisted collection configuration (we
moved index configuration into this new construct).
* Delegate `Settings` configuration to Chroma's client constructors.
2025-07-22 15:14:15 -04:00
ccurme
3fc27e7a95
docs: update feature table for Chroma (#32182) 2025-07-22 18:21:17 +00:00
ccurme
8acfd677bc
fix(core): add type key when tracing in some cases (#31825) 2025-07-22 18:08:16 +00:00
Mason Daugherty
af3789b9ed
fix(deepseek): release openai version (#32181)
used sdk version instead of langchain by accident
2025-07-22 13:29:52 -04:00
Mason Daugherty
a6896794ca
release(ollama): 0.3.6 (#32180) 2025-07-22 13:24:17 -04:00
Copilot
d40fd5a3ce
feat(ollama): warn on empty load responses (#32161)
## Problem

When using `ChatOllama` with `create_react_agent`, agents would
sometimes terminate prematurely with empty responses when Ollama
returned `done_reason: 'load'` responses with no content. This caused
agents to return empty `AIMessage` objects instead of actual generated
text.

```python
from langchain_ollama import ChatOllama
from langgraph.prebuilt import create_react_agent
from langchain_core.messages import HumanMessage

llm = ChatOllama(model='qwen2.5:7b', temperature=0)
agent = create_react_agent(model=llm, tools=[])

result = agent.invoke(HumanMessage('Hello'), {"configurable": {"thread_id": "1"}})
# Before fix: AIMessage(content='', response_metadata={'done_reason': 'load'})
# Expected: AIMessage with actual generated content
```

## Root Cause

The `_iterate_over_stream` and `_aiterate_over_stream` methods treated
any response with `done: True` as final, regardless of `done_reason`.
When Ollama returns `done_reason: 'load'` with empty content, it
indicates the model was loaded but no actual generation occurred - this
should not be considered a complete response.

## Solution

Modified the streaming logic to skip responses when:
- `done: True`
- `done_reason: 'load'` 
- Content is empty or contains only whitespace

This ensures agents only receive actual generated content while
preserving backward compatibility for load responses that do contain
content.

## Changes

- **`_iterate_over_stream`**: Skip empty load responses instead of
yielding them
- **`_aiterate_over_stream`**: Apply same fix to async streaming
- **Tests**: Added comprehensive test cases covering all edge cases

## Testing

All scenarios now work correctly:
-  Empty load responses are skipped (fixes original issue)
-  Load responses with actual content are preserved (backward
compatibility)
-  Normal stop responses work unchanged
-  Streaming behavior preserved
-  `create_react_agent` integration fixed

Fixes #31482.

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-22 13:21:11 -04:00
Mason Daugherty
116b758498
fix: bump deps for release (#32179)
forgot to bump the `pyproject.toml` files
2025-07-22 13:12:14 -04:00
Mason Daugherty
10996a2821
release(perplexity): 0.1.2 (#32176) 2025-07-22 13:02:19 -04:00
Mason Daugherty
2aed07efb6
release(deepseek): 0.1.4 (#32178) 2025-07-22 13:01:54 -04:00
Mason Daugherty
64dac1faf7
release(huggingface): 0.3.1 (#32177) 2025-07-22 13:01:34 -04:00
Mason Daugherty
58768d8aef
release(xai): 0.2.5 (#32174) 2025-07-22 13:01:26 -04:00
Mason Daugherty
d65da13299
docs(ollama): add validate_model_on_init note, bump lock (#32172) 2025-07-22 10:58:45 -04:00
Kanav Bansal
c14bd1fcfe
fix(docs): update RAG tutorials link to point to correct path (#32169)
## **Description:** 
This PR updates the internal documentation link for the RAG tutorials to
reflect the updated path. Previously, the link pointed to the root
`/docs/tutorials/`, which was generic. It now correctly routes to the
RAG-specific tutorial page for the following text-embedding models.

1. DatabricksEmbeddings
2. IBM watsonx.ai
3. OpenAIEmbeddings
4. NomicEmbeddings
5. CohereEmbeddings
6. MistralAIEmbeddings
7. FireworksEmbeddings
8. TogetherEmbeddings
9. LindormAIEmbeddings
10. ModelScopeEmbeddings
11. ClovaXEmbeddings
12. NetmindEmbeddings
13. SambaNovaCloudEmbeddings
14. SambaStudioEmbeddings
15. ZhipuAIEmbeddings

## **Issue:** N/A
## **Dependencies:** None
## **Twitter handle:** N/A
2025-07-22 10:24:50 -04:00
Byeongjin Kang
a1ccabf85d
docs: add documentation about how to use extended thinking with ChatBedrockConverse (#32168) 2025-07-22 08:44:08 -04:00
Copilot
2104cf0d9a
fix: replace deprecated Pydantic .schema() calls with v1/v2 compatible pattern (#32162)
This PR addresses deprecation warnings users encounter when using
LangChain tools with Pydantic v2:

```
PydanticDeprecatedSince20: The `schema` method is deprecated; use `model_json_schema` instead. 
Deprecated in Pydantic V2.0 to be removed in V3.0.
```

## Root Cause

Several LangChain components were still using the deprecated `.schema()`
method directly instead of the Pydantic v1/v2 compatible approach. While
users calling `.schema()` on returned models will still see warnings
(which is correct), LangChain's internal code should not generate these
warnings.

## Changes Made

Updated 3 files to use the standard compatibility pattern:

```python
# Before (deprecated)
schema = model.schema()

# After (compatible with both v1 and v2) 
if hasattr(model, "model_json_schema"):
    schema = model.model_json_schema()  # Pydantic v2
else:
    schema = model.schema()  # Pydantic v1
```

### Files Updated:
- **`evaluation/parsing/json_schema.py`**: Fixed `_parse_json()` method
to handle Pydantic models correctly
- **`output_parsers/yaml.py`**: Fixed `get_format_instructions()` to use
compatible schema access
- **`chains/openai_functions/citation_fuzzy_match.py`**: Fixed direct
`.schema()` call on QuestionAnswer model

## Verification

 **Zero breaking changes** - all existing functionality preserved  
 **No deprecation warnings** from LangChain internal code  
 **Backward compatible** with Pydantic v1  
 **Forward compatible** with Pydantic v2  
 **Edge cases handled** (strings, plain objects, etc.)

## User Impact

LangChain users will no longer see deprecation warnings from internal
LangChain code. Users who directly call `.schema()` on schemas returned
by LangChain should adopt the same compatibility pattern:

```python
# User code should use this pattern
input_schema = tool.get_input_schema()
if hasattr(input_schema, "model_json_schema"):
    schema_result = input_schema.model_json_schema()
else:
    schema_result = input_schema.schema()
```

Fixes #31458.

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-21 21:19:53 -04:00
Copilot
18c64aed6d
feat(core): add sanitize_for_postgres utility to fix PostgreSQL NUL byte DataError (#32157)
This PR fixes the PostgreSQL NUL byte issue that causes
`psycopg.DataError` when inserting documents containing `\x00` bytes
into PostgreSQL-based vector stores.

## Problem

PostgreSQL text fields cannot contain NUL (0x00) bytes. When documents
with such characters are processed by PGVector or langchain-postgres
implementations, they fail with:

```
(psycopg.DataError) PostgreSQL text fields cannot contain NUL (0x00) bytes
```

This commonly occurs when processing PDFs, documents from various
loaders, or text extracted by libraries like unstructured that may
contain embedded NUL bytes.

## Solution

Added `sanitize_for_postgres()` utility function to
`langchain_core.utils.strings` that removes or replaces NUL bytes from
text content.

### Key Features

- **Simple API**: `sanitize_for_postgres(text, replacement="")`
- **Configurable**: Replace NUL bytes with empty string (default) or
space for readability
- **Comprehensive**: Handles all problematic examples from the original
issue
- **Well-tested**: Complete unit tests with real-world examples
- **Backward compatible**: No breaking changes, purely additive

### Usage Example

```python
from langchain_core.utils import sanitize_for_postgres
from langchain_core.documents import Document

# Before: This would fail with DataError
problematic_content = "Getting\x00Started with embeddings"

# After: Clean the content before database insertion
clean_content = sanitize_for_postgres(problematic_content)
# Result: "GettingStarted with embeddings"

# Or preserve readability with spaces
readable_content = sanitize_for_postgres(problematic_content, " ")
# Result: "Getting Started with embeddings"

# Use in Document processing
doc = Document(page_content=clean_content, metadata={...})
```

### Integration Pattern

PostgreSQL vector store implementations should sanitize content before
insertion:

```python
def add_documents(self, documents: List[Document]) -> List[str]:
    # Sanitize documents before insertion
    sanitized_docs = []
    for doc in documents:
        sanitized_content = sanitize_for_postgres(doc.page_content, " ")
        sanitized_doc = Document(
            page_content=sanitized_content,
            metadata=doc.metadata,
            id=doc.id
        )
        sanitized_docs.append(sanitized_doc)
    
    return self._insert_documents_to_db(sanitized_docs)
```

## Changes Made

- Added `sanitize_for_postgres()` function in
`langchain_core/utils/strings.py`
- Updated `langchain_core/utils/__init__.py` to export the new function
- Added comprehensive unit tests in
`tests/unit_tests/utils/test_strings.py`
- Validated against all examples from the original issue report

## Testing

All tests pass, including:
- Basic NUL byte removal and replacement
- Multiple consecutive NUL bytes
- Empty string handling
- Real examples from the GitHub issue
- Backward compatibility with existing string utilities

This utility enables PostgreSQL integrations in both langchain-community
and langchain-postgres packages to handle documents with NUL bytes
reliably.

Fixes #26033.

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-21 20:33:20 -04:00
Copilot
fc802d8f9f
docs: fix vectorstore feature table - correct "IDs in add Documents" values (#32153)
The vectorstore feature table in the documentation was showing incorrect
information for the "IDs in add Documents" capability. Most vectorstores
were marked as  (not supported) when they actually support extracting
IDs from documents.

## Problem

The issue was an inconsistency between two sources of truth:
- **JavaScript feature table** (`docs/src/theme/FeatureTables.js`):
Hardcoded `idsInAddDocuments: false` for most vectorstores
- **Python script** (`docs/scripts/vectorstore_feat_table.py`):
Correctly showed `"IDs in add Documents": True` for most vectorstores

## Root Cause

All vectorstores inherit the base `VectorStore.add_documents()` method
which automatically extracts document IDs:

```python
# From libs/core/langchain_core/vectorstores/base.py lines 277-284
if "ids" not in kwargs:
    ids = [doc.id for doc in documents]
    
    # If there's at least one valid ID, we'll assume that IDs should be used.
    if any(ids):
        kwargs["ids"] = ids
```

Since no vectorstores override `add_documents()`, they all inherit this
behavior and support IDs in documents.

## Solution

Updated `idsInAddDocuments` from `false` to `true` for 13 vectorstores:
- AstraDBVectorStore, Chroma, Clickhouse, DatabricksVectorSearch
- ElasticsearchStore, FAISS, InMemoryVectorStore,
MongoDBAtlasVectorSearch
- PGVector, PineconeVectorStore, Redis, Weaviate, SQLServer

The other 4 vectorstores (CouchbaseSearchVectorStore, Milvus, openGauss,
QdrantVectorStore) were already correctly marked as `true`.

## Impact

Users visiting
https://python.langchain.com/docs/integrations/vectorstores/ will now
see accurate information. The "IDs in add Documents" column will
correctly show  for all vectorstores instead of incorrectly showing 
for most of them.

This aligns with the API documentation which states: "if kwargs contains
ids and documents contain ids, the ids in the kwargs will receive
precedence" - clearly indicating that document IDs are supported.

Fixes #30622.

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
2025-07-21 20:29:34 -04:00
Mason Daugherty
b4d87c709c
chore: update copilot-instructions.md (#32159) 2025-07-21 20:17:41 -04:00
ccurme
383bc8f2ef
revert: drop anthropic from core test matrix (#32152)
Reverts langchain-ai/langchain#32146
2025-07-21 20:15:27 +00:00
Christophe Bornet
64261449b8
feat(langchain): add ruff rules TRY (#32047)
See https://docs.astral.sh/ruff/rules/#tryceratops-try

* TRY004 (replace by TypeError) in main code is escaped with `noqa` to
not break backward compatibility. The rule is still interesting for new
code.
* TRY301 ignored at the moment. This one is quite hard to fix and I'm
not sure it's very interesting to activate it.

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-21 13:41:20 -04:00
Christophe Bornet
8b8d90bea5
feat(langchain): add ruff rules PT (#32010)
See https://docs.astral.sh/ruff/rules/#flake8-pytest-style-pt
2025-07-21 13:15:05 -04:00
Mohammad Mohtashim
095f4a7c28
fix(core): fix parse_resultin case of self.first_tool_only with multiple keys matching for JsonOutputKeyToolsParser (#32106)
* **Description:** Updated `parse_result` logic to handle cases where
`self.first_tool_only` is `True` and multiple matching keys share the
same function name. Instead of returning the first match prematurely,
the method now prioritizes filtering results by the specified key to
ensure correct selection.
* **Issue:** #32100

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-21 12:50:22 -04:00
Mason Daugherty
ddaba21e83
chore: copilot instructions (#32075)
https://docs.github.com/en/copilot/how-tos/custom-instructions/adding-repository-custom-instructions-for-github-copilot
2025-07-21 12:50:07 -04:00