**TL;DR much of the provided `Makefile` targets were broken, and any
time I wanted to preview changes locally I either had to refer to a
command Chester gave me or try waiting on a Vercel preview deployment.
With this PR, everything should behave like normal.**
Significant updates to the `Makefile` and documentation files, focusing
on improving usability, adding clear messaging, and fixing/enhancing
documentation workflows.
### Updates to `Makefile`:
#### Enhanced build and cleaning processes:
- Added informative messages (e.g., "📚 Building LangChain
documentation...") to makefile targets like `docs_build`, `docs_clean`,
and `api_docs_build` for better user feedback during execution.
- Introduced a `clean-cache` target to the `docs` `Makefile` to clear
cached dependencies and ensure clean builds.
#### Improved dependency handling:
- Modified `install-py-deps` to create a `.venv/deps_installed` marker,
preventing redundant/duplicate dependency installations and improving
efficiency.
#### Streamlined file generation and infrastructure setup:
- Added caching for the LangServe README download and parallelized
feature table generation
- Added user-friendly completion messages for targets like `copy-infra`
and `render`.
#### Documentation server updates:
- Enhanced the `start` target with messages indicating server start and
URL for local documentation viewing.
---
### Documentation Improvements:
#### Content clarity and consistency:
- Standardized section titles for consistency across documentation
files.
[[1]](diffhunk://#diff-9b1a85ea8a9dcf79f58246c88692cd7a36316665d7e05a69141cfdc50794c82aL1-R1)
[[2]](diffhunk://#diff-944008ad3a79d8a312183618401fcfa71da0e69c75803eff09b779fc8e03183dL1-R1)
- Refined phrasing and formatting in sections like "Dependency
management" and "Formatting and linting" for better readability.
[[1]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L6-R6)
[[2]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L84-R82)
#### Enhanced workflows:
- Updated instructions for building and viewing documentation locally,
including tips for specifying server ports and handling API reference
previews.
[[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L60-R94)
[[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L82-R126)
- Expanded guidance on cleaning documentation artifacts and using
linting tools effectively.
[[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L82-R126)
[[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L107-R142)
#### API reference documentation:
- Improved instructions for generating and formatting in-code
documentation, highlighting best practices for docstring writing.
[[1]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L107-R142)
[[2]](diffhunk://#diff-048deddcfd44b242e5b23aed9f2e9ec73afc672244ce14df2a0a316d95840c87L144-R186)
---
### Minor Changes:
- Added support for a new package name (`langchain_v1`) in the API
documentation generation script.
- Fixed minor capitalization and formatting issues in documentation
files.
[[1]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L40-R40)
[[2]](diffhunk://#diff-2069d4f956ab606ae6d51b191439283798adaf3a6648542c409d258131617059L166-R160)
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The `_dereference_refs_helper` in `langchain_core.utils.json_schema`
incorrectly handled objects with a reference and other fields.
**Issue**: #32170
# Description
We change the check so that it accepts other keys in the object.
This PR fixes the PostgreSQL NUL byte issue that causes
`psycopg.DataError` when inserting documents containing `\x00` bytes
into PostgreSQL-based vector stores.
## Problem
PostgreSQL text fields cannot contain NUL (0x00) bytes. When documents
with such characters are processed by PGVector or langchain-postgres
implementations, they fail with:
```
(psycopg.DataError) PostgreSQL text fields cannot contain NUL (0x00) bytes
```
This commonly occurs when processing PDFs, documents from various
loaders, or text extracted by libraries like unstructured that may
contain embedded NUL bytes.
## Solution
Added `sanitize_for_postgres()` utility function to
`langchain_core.utils.strings` that removes or replaces NUL bytes from
text content.
### Key Features
- **Simple API**: `sanitize_for_postgres(text, replacement="")`
- **Configurable**: Replace NUL bytes with empty string (default) or
space for readability
- **Comprehensive**: Handles all problematic examples from the original
issue
- **Well-tested**: Complete unit tests with real-world examples
- **Backward compatible**: No breaking changes, purely additive
### Usage Example
```python
from langchain_core.utils import sanitize_for_postgres
from langchain_core.documents import Document
# Before: This would fail with DataError
problematic_content = "Getting\x00Started with embeddings"
# After: Clean the content before database insertion
clean_content = sanitize_for_postgres(problematic_content)
# Result: "GettingStarted with embeddings"
# Or preserve readability with spaces
readable_content = sanitize_for_postgres(problematic_content, " ")
# Result: "Getting Started with embeddings"
# Use in Document processing
doc = Document(page_content=clean_content, metadata={...})
```
### Integration Pattern
PostgreSQL vector store implementations should sanitize content before
insertion:
```python
def add_documents(self, documents: List[Document]) -> List[str]:
# Sanitize documents before insertion
sanitized_docs = []
for doc in documents:
sanitized_content = sanitize_for_postgres(doc.page_content, " ")
sanitized_doc = Document(
page_content=sanitized_content,
metadata=doc.metadata,
id=doc.id
)
sanitized_docs.append(sanitized_doc)
return self._insert_documents_to_db(sanitized_docs)
```
## Changes Made
- Added `sanitize_for_postgres()` function in
`langchain_core/utils/strings.py`
- Updated `langchain_core/utils/__init__.py` to export the new function
- Added comprehensive unit tests in
`tests/unit_tests/utils/test_strings.py`
- Validated against all examples from the original issue report
## Testing
All tests pass, including:
- Basic NUL byte removal and replacement
- Multiple consecutive NUL bytes
- Empty string handling
- Real examples from the GitHub issue
- Backward compatibility with existing string utilities
This utility enables PostgreSQL integrations in both langchain-community
and langchain-postgres packages to handle documents with NUL bytes
reliably.
Fixes#26033.
<!-- START COPILOT CODING AGENT TIPS -->
---
💬 Share your feedback on Copilot coding agent for the chance to win a
$200 gift card! Click
[here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to
start the survey.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Previously, we hit an index out of range error with empty variable names
(accessing tag[0]), now we through a slightly nicer error
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Fixes#32042
## Summary
Fixes a critical bug in JSON Schema reference resolution that prevented
correctly dereferencing numeric components in JSON pointer paths,
specifically for list indices in `anyOf`, `oneOf`, and `allOf` arrays.
## Changes
- Fixed `_retrieve_ref` function in
`libs/core/langchain_core/utils/json_schema.py` to properly handle
numeric components
- Added comprehensive test function `test_dereference_refs_list_index()`
in `libs/core/tests/unit_tests/utils/test_json_schema.py`
- Resolved line length formatting issues
- Improved type checking and index validation for list and dictionary
references
## Key Improvements
- Correctly handles list index references in JSON pointer paths
- Maintains backward compatibility with existing dictionary numeric key
functionality
- Adds robust error handling for out-of-bounds and invalid indices
- Passes all test cases covering various reference scenarios
## Test Coverage
- Verified fix for `#/properties/payload/anyOf/1/properties/startDate`
reference
- Tested edge cases including out-of-bounds and negative indices
- Ensured no regression in existing reference resolution functionality
Resolves the reported issue with JSON Schema reference dereferencing for
list indices.
---------
Co-authored-by: open-swe-dev[bot] <open-swe-dev@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
* Simplified Pydantic handling since Pydantic v1 is not supported
anymore.
* Replace use of deprecated v1 methods by corresponding v2 methods.
* Remove use of other deprecated methods.
* Activate mypy errors on deprecated methods use.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Add image generation tool to the list of well known tools. This is needed for changes in the ChatOpenAI client.
TODO: Some of this logic needs to be moved from core directly into the client as changes in core should not be required to add a new tool to the openai chat client.
**Issue:**[
#309070](https://github.com/langchain-ai/langchain/issues/30970)
**Cause**
Arg type in python code
```
arg: Union[SubSchema1, SubSchema2]
```
is translated to `anyOf` in **json schema**
```
"anyOf" : [{sub schema 1 ...}, {sub schema 1 ...}]
```
The value of anyOf is a list sub schemas.
The bug is caused since the sub schemas inside `anyOf` list is not taken
care of.
The location where the issue happens is `convert_to_openai_function`
function -> `_recursive_set_additional_properties_false` function, that
recursively adds `"additionalProperties": false` to json schema which is
[required by OpenAI's strict function
calling](https://platform.openai.com/docs/guides/structured-outputs?api-mode=responses#additionalproperties-false-must-always-be-set-in-objects)
**Solution:**
This PR fixes this issue by iterating each sub schema inside `anyOf`
list.
A unit test is added.
**Twitter handle:** shengboma
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Most easily reviewed with the "hide whitespace" option toggled.
Seeing 10-50% speed ups in import time for common structures 🚀
The general purpose of this PR is to lazily import structures within
`langchain_core.XXX_module.__init__.py` so that we're not eagerly
importing expensive dependencies (`pydantic`, `requests`, etc).
Analysis of flamegraphs generated with `importtime` motivated these
changes. For example, the one below demonstrates that importing
`HumanMessage` accidentally triggered imports for `importlib.metadata`,
`requests`, etc.
There's still much more to do on this front, and we can start digging
into our own internal code for optimizations now that we're less
concerned about external imports.
<img width="1210" alt="Screenshot 2025-04-11 at 1 10 54 PM"
src="https://github.com/user-attachments/assets/112a3fe7-24a9-4294-92c1-d5ae64df839e"
/>
I've tracked the improvements with some local benchmarks:
## `pytest-benchmark` results
| Name | Before (s) | After (s) | Delta (s) | % Change |
|-----------------------------|------------|-----------|-----------|----------|
| Document | 2.8683 | 1.2775 | -1.5908 | -55.46% |
| HumanMessage | 2.2358 | 1.1673 | -1.0685 | -47.79% |
| ChatPromptTemplate | 5.5235 | 2.9709 | -2.5526 | -46.22% |
| Runnable | 2.9423 | 1.7793 | -1.163 | -39.53% |
| InMemoryVectorStore | 3.1180 | 1.8417 | -1.2763 | -40.93% |
| RunnableLambda | 2.7385 | 1.8745 | -0.864 | -31.55% |
| tool | 5.1231 | 4.0771 | -1.046 | -20.42% |
| CallbackManager | 4.2263 | 3.4099 | -0.8164 | -19.32% |
| LangChainTracer | 3.8394 | 3.3101 | -0.5293 | -13.79% |
| BaseChatModel | 4.3317 | 3.8806 | -0.4511 | -10.41% |
| PydanticOutputParser | 3.2036 | 3.2995 | 0.0959 | 2.99% |
| InMemoryRateLimiter | 0.5311 | 0.5995 | 0.0684 | 12.88% |
Note the lack of change for `InMemoryRateLimiter` and
`PydanticOutputParser` is just random noise, I'm getting comparable
numbers locally.
## Local CodSpeed results
We're still working on configuring CodSpeed on CI. The local usage
produced similar results.
Looks like `pyupgrade` was already used here but missed some docs and
tests.
This helps to keep our docs looking professional and up to date.
Eventually, we should lint / format our inline docs.
This pull request includes various changes to the `langchain_core`
library, focusing on improving compatibility with different versions of
Pydantic. The primary change involves replacing checks for Pydantic
major versions with boolean flags, which simplifies the code and
improves readability.
This also solves ruff rule checks for
[RUF048](https://docs.astral.sh/ruff/rules/map-int-version-parsing/) and
[PLR2004](https://docs.astral.sh/ruff/rules/magic-value-comparison/).
Key changes include:
### Compatibility Improvements:
*
[`libs/core/langchain_core/output_parsers/json.py`](diffhunk://#diff-5add0cf7134636ae4198a1e0df49ee332ae0c9123c3a2395101e02687c717646L22-R24):
Replaced `PYDANTIC_MAJOR_VERSION` with `IS_PYDANTIC_V1` to check for
Pydantic version 1.
*
[`libs/core/langchain_core/output_parsers/pydantic.py`](diffhunk://#diff-2364b5b4aee01c462aa5dbda5dc3a877dcd20f29df173ad540dc8adf8b192361L14-R14):
Updated version checks from `PYDANTIC_MAJOR_VERSION` to `IS_PYDANTIC_V2`
in the `PydanticOutputParser` class.
[[1]](diffhunk://#diff-2364b5b4aee01c462aa5dbda5dc3a877dcd20f29df173ad540dc8adf8b192361L14-R14)
[[2]](diffhunk://#diff-2364b5b4aee01c462aa5dbda5dc3a877dcd20f29df173ad540dc8adf8b192361L27-R27)
### Utility Enhancements:
*
[`libs/core/langchain_core/utils/pydantic.py`](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896R23):
Introduced `IS_PYDANTIC_V1` and `IS_PYDANTIC_V2` flags and deprecated
the `get_pydantic_major_version` function. Updated various functions to
use these flags instead of version numbers.
[[1]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896R23)
[[2]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896R42-R78)
[[3]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L90-R89)
[[4]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L104-R101)
[[5]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L120-R122)
[[6]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L135-R132)
[[7]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L149-R151)
[[8]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L164-R161)
[[9]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L248-R250)
[[10]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L330-R335)
[[11]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L356-R357)
[[12]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L393-R390)
[[13]](diffhunk://#diff-ff28020c5f1073a8b63bcd9d8b756a187fd682cb81935295120c63b207071896L403-R400)
### Test Updates:
*
[`libs/core/tests/unit_tests/output_parsers/test_openai_tools.py`](diffhunk://#diff-694cc0318edbd6bbca34f53304934062ad59ba9f5a788252ce6c5f5452489d67L19-R22):
Updated tests to use `IS_PYDANTIC_V1` and `IS_PYDANTIC_V2` for version
checks.
[[1]](diffhunk://#diff-694cc0318edbd6bbca34f53304934062ad59ba9f5a788252ce6c5f5452489d67L19-R22)
[[2]](diffhunk://#diff-694cc0318edbd6bbca34f53304934062ad59ba9f5a788252ce6c5f5452489d67L532-R535)
[[3]](diffhunk://#diff-694cc0318edbd6bbca34f53304934062ad59ba9f5a788252ce6c5f5452489d67L567-R570)
[[4]](diffhunk://#diff-694cc0318edbd6bbca34f53304934062ad59ba9f5a788252ce6c5f5452489d67L602-R605)
*
[`libs/core/tests/unit_tests/prompts/test_chat.py`](diffhunk://#diff-3e60e744842086a4f3c4b21bc83e819c3435720eab210078e77e2430fb8c7e84R7):
Replaced version tuple checks with `PYDANTIC_VERSION` comparisons.
[[1]](diffhunk://#diff-3e60e744842086a4f3c4b21bc83e819c3435720eab210078e77e2430fb8c7e84R7)
[[2]](diffhunk://#diff-3e60e744842086a4f3c4b21bc83e819c3435720eab210078e77e2430fb8c7e84L35-R38)
[[3]](diffhunk://#diff-3e60e744842086a4f3c4b21bc83e819c3435720eab210078e77e2430fb8c7e84L924-R927)
[[4]](diffhunk://#diff-3e60e744842086a4f3c4b21bc83e819c3435720eab210078e77e2430fb8c7e84L935-R938)
*
[`libs/core/tests/unit_tests/runnables/test_graph.py`](diffhunk://#diff-99a290330ef40103d0ce02e52e21310d6fadea142bfdea13c94d23fc81c0bb5dR3):
Simplified version checks using `PYDANTIC_VERSION`.
[[1]](diffhunk://#diff-99a290330ef40103d0ce02e52e21310d6fadea142bfdea13c94d23fc81c0bb5dR3)
[[2]](diffhunk://#diff-99a290330ef40103d0ce02e52e21310d6fadea142bfdea13c94d23fc81c0bb5dL15-R18)
[[3]](diffhunk://#diff-99a290330ef40103d0ce02e52e21310d6fadea142bfdea13c94d23fc81c0bb5dL234-L239)
*
[`libs/core/tests/unit_tests/runnables/test_runnable.py`](diffhunk://#diff-06bed920c0dad0cfd41d57a8d9e47a7b56832409649c10151061a791860d5bb5L18-R20):
Introduced `PYDANTIC_VERSION_AT_LEAST_29` and
`PYDANTIC_VERSION_AT_LEAST_210` for more readable version checks.
[[1]](diffhunk://#diff-06bed920c0dad0cfd41d57a8d9e47a7b56832409649c10151061a791860d5bb5L18-R20)
[[2]](diffhunk://#diff-06bed920c0dad0cfd41d57a8d9e47a7b56832409649c10151061a791860d5bb5L92-R99)
[[3]](diffhunk://#diff-06bed920c0dad0cfd41d57a8d9e47a7b56832409649c10151061a791860d5bb5L230-R233)
[[4]](diffhunk://#diff-06bed920c0dad0cfd41d57a8d9e47a7b56832409649c10151061a791860d5bb5L652-R655)
See https://docs.astral.sh/ruff/rules/#flake8-annotations-ann
The interest compared to only mypy is that ruff is very fast at
detecting missing annotations.
ANN101 and ANN102 are deprecated so we ignore them
ANN401 (no Any type) ignored to be in sync with mypy config
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
**Description**
Currently, when parsing a partial JSON, if a string ends with the escape
character, the whole key/value is removed. For example:
```
>>> from langchain_core.utils.json import parse_partial_json
>>> my_str = '{"foo": "bar", "baz": "qux\\'
>>>
>>> parse_partial_json(my_str)
{'foo': 'bar'}
```
My expectation (and with this fix) would be for `parse_partial_json()`
to return:
```
>>> from langchain_core.utils.json import parse_partial_json
>>>
>>> my_str = '{"foo": "bar", "baz": "qux\\'
>>> parse_partial_json(my_str)
{'foo': 'bar', 'baz': 'qux'}
```
Notes:
1. It could be argued that current behavior is still desired.
2. I have experienced this issue when the streaming output from an LLM
and the chunk happens to end with `\\`
3. I haven't included tests. Will do if change is accepted.
4. This is specially troublesome when this function is used by
187131c55c/libs/core/langchain_core/output_parsers/transform.py (L111)
since what happens is that, for example, if the received sequence of
chunks are: `{"foo": "b` , `ar\\` :
Then, the result of calling `self.parse_result()` is:
```
{"foo": "b"}
```
and the second time:
```
{}
```
Co-authored-by: Erick Friis <erick@langchain.dev>
TRY004 ("use TypeError rather than ValueError") existing errors are
marked as ignore to preserve backward compatibility.
LMK if you prefer to fix some of them.
Co-authored-by: Erick Friis <erick@langchain.dev>
(Inspired by https://github.com/langchain-ai/langchain/issues/26918)
We rely on some deprecated public functions in the hot path for tool
binding (`convert_pydantic_to_openai_function`,
`convert_python_function_to_openai_function`, and
`format_tool_to_openai_function`). My understanding is that what is
deprecated is not the functionality they implement, but use of them in
the public API -- we expect to continue to rely on them.
Here we update these functions to be private and not deprecated. We keep
the public, deprecated functions as simple wrappers that can be safely
deleted.
The `@deprecated` wrapper adds considerable latency due to its use of
the `inspect` module. This update speeds up `bind_tools` by a factor of
~100x:
Before:

After:

---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
Description:
Improved the `_parse_google_docstring` function in `langchain/core` to
support parsing multi-paragraph descriptions before the `Args:` section
while maintaining compliance with Google-style docstring guidelines.
This change ensures better handling of docstrings with detailed function
descriptions.
Issue:
Fixes#28628
Dependencies:
None.
Twitter handle:
@isatyamks
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "core: google docstring parsing fix"
- [x] **PR message**:
- **Description:** Added a solution for invalid parsing of google
docstring such as:
Args:
net_annual_income (float): The user's net annual income (in current year
dollars).
- **Issue:** Previous code would return arg = "net_annual_income
(float)" which would cause exception in
_validate_docstring_args_against_annotations
- **Dependencies:** None
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Erick Friis <erick@langchain.dev>