Hello.
First of all, thank you for maintaining such a great project.
## Description
In https://github.com/langchain-ai/langchain/pull/25123, support for
structured_output is added. However, `"additionalProperties": false`
needs to be included at all levels when a nested object is generated.
error from current code:
https://gist.github.com/fufufukakaka/e9b475300e6934853d119428e390f204
```
BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for response_format 'JokeWithEvaluation': In context=('properties', 'self_evaluation'), 'additionalProperties' is required to be supplied and to be false", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}}
```
Reference: [Introducing Structured Outputs in the
API](https://openai.com/index/introducing-structured-outputs-in-the-api/)
```json
{
"model": "gpt-4o-2024-08-06",
"messages": [
{
"role": "system",
"content": "You are a helpful math tutor."
},
{
"role": "user",
"content": "solve 8x + 31 = 2"
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "math_response",
"strict": true,
"schema": {
"type": "object",
"properties": {
"steps": {
"type": "array",
"items": {
"type": "object",
"properties": {
"explanation": {
"type": "string"
},
"output": {
"type": "string"
}
},
"required": ["explanation", "output"],
"additionalProperties": false
}
},
"final_answer": {
"type": "string"
}
},
"required": ["steps", "final_answer"],
"additionalProperties": false
}
}
}
}
```
In the current code, `"additionalProperties": false` is only added at
the last level.
This PR introduces the `_add_additional_properties_key` function, which
recursively adds `"additionalProperties": false` to the entire JSON
schema for the request.
Twitter handle: `@fukkaa1225`
Thank you!
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
[langchain_core] Fix UnionType type var replacement
- Added types.UnionType to typing.Union mapping
Type replacement cause `TypeError: 'type' object is not subscriptable`
if any of union type comes as function `_py_38_safe_origin` return
`types.UnionType` instead of `typing.Union`
```python
>>> from types import UnionType
>>> from typing import Union, get_origin
>>> type_ = get_origin(str | None)
>>> type_
<class 'types.UnionType'>
>>> UnionType[(str, None)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'type' object is not subscriptable
>>> Union[(str, None)]
typing.Optional[str]
```
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Add a utility that can be used as a default factory
The goal will be to start migrating from of the pydantic models to use
`from_env` as a default factory if possible.
```python
from pydantic import Field, BaseModel
from langchain_core.utils import from_env
class Foo(BaseModel):
name: str = Field(default_factory=from_env('HELLO'))
```
- **Description:** This includes Pydantic field metadata in
`_create_subset_model_v2` so that it gets included in the final
serialized form that get sent out.
- **Issue:** #25031
- **Dependencies:** n/a
- **Twitter handle:** @gramliu
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Add compatibility for pydantic 2 for a utility function.
This will help push some small changes to master, so they don't have to
be kept track of on a separate branch.
supports following UX
```python
class SubTool(TypedDict):
"""Subtool docstring"""
args: Annotated[Dict[str, Any], {}, "this does bar"]
class Tool(TypedDict):
"""Docstring
Args:
arg1: foo
"""
arg1: str
arg2: Union[int, str]
arg3: Optional[List[SubTool]]
arg4: Annotated[Literal["bar", "baz"], ..., "this does foo"]
arg5: Annotated[Optional[float], None]
```
- can parse google style docstring
- can use Annotated to specify default value (second arg)
- can use Annotated to specify arg description (third arg)
- can have nested complex types
This will allow tools and parsers to accept pydantic models from any of
the
following namespaces:
* pydantic.BaseModel with pydantic 1
* pydantic.BaseModel with pydantic 2
* pydantic.v1.BaseModel with pydantic 2
Description:
This PR fixes a KeyError: 400 that occurs in the JSON schema processing
within the reduce_openapi_spec function. The _retrieve_ref function in
json_schema.py was modified to handle missing components gracefully by
continuing to the next component if the current one is not found. This
ensures that the OpenAPI specification is fully interpreted and the
agent executes without errors.
Issue:
Fixes issue #24335
Dependencies:
No additional dependencies are required for this change.
Twitter handle:
@lunara_x
Disabled by default.
```python
from langchain_core.tools import tool
@tool(parse_docstring=True)
def foo(bar: str, baz: int) -> str:
"""The foo.
Args:
bar: this is the bar
baz: this is the baz
"""
return bar
foo.args_schema.schema()
```
```json
{
"title": "fooSchema",
"description": "The foo.",
"type": "object",
"properties": {
"bar": {
"title": "Bar",
"description": "this is the bar",
"type": "string"
},
"baz": {
"title": "Baz",
"description": "this is the baz",
"type": "integer"
}
},
"required": [
"bar",
"baz"
]
}
```
Decisions to discuss:
1. is a new attr needed or could additional_kwargs be used for this
2. is raw_output a good name for this attr
3. should raw_output default to {} or None
4. should raw_output be included in serialization
5. do we need to update repr/str to exclude raw_output
- add version of AIMessageChunk.__add__ that can add many chunks,
instead of only 2
- In agenerate_from_stream merge and parse chunks in bg thread
- In output parse base classes do more work in bg threads where
appropriate
---------
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
This PR rolls out part of the new proposed interface for vectorstores
(https://github.com/langchain-ai/langchain/pull/23544) to existing store
implementations.
The PR makes the following changes:
1. Adds standard upsert, streaming_upsert, aupsert, astreaming_upsert
methods to the vectorstore.
2. Updates `add_texts` and `aadd_texts` to be non required with a
default implementation that delegates to `upsert` and `aupsert` if those
have been implemented. The original `add_texts` and `aadd_texts` methods
are problematic as they spread object specific information across
document and **kwargs. (e.g., ids are not a part of the document)
3. Adds a default implementation to `add_documents` and `aadd_documents`
that delegates to `upsert` and `aupsert` respectively.
4. Adds standard unit tests to verify that a given vectorstore
implements a correct read/write API.
A downside of this implementation is that it creates `upsert` with a
very similar signature to `add_documents`.
The reason for introducing `upsert` is to:
* Remove any ambiguities about what information is allowed in `kwargs`.
Specifically kwargs should only be used for information common to all
indexed data. (e.g., indexing timeout).
*Allow inheriting from an anticipated generalized interface for indexing
that will allow indexing `BaseMedia` (i.e., allow making a vectorstore
for images/audio etc.)
`add_documents` can be deprecated in the future in favor of `upsert` to
make sure that users have a single correct way of indexing content.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Use pydantic to infer nested schemas and all that fun.
Include bagatur's convenient docstring parser
Include annotation support
Previously we didn't adequately support many typehints in the
bind_tools() method on raw functions (like optionals/unions, nested
types, etc.)
Example error message:
line 206, in _get_python_function_required_args
if is_function_type and required[0] == "self":
~~~~~~~~^^^
IndexError: list index out of range
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Issues (nit):
1. `utils.guard_import` prints wrong error message when there is an
import `error.` It prints the whole `module_name` but should be only the
first part as the pip package name. E.i. `langchain_core.utils` -> print
not `langchain-core` but `langchain_core.utils`. Also replace '_' with
'-' in the pip package name.
2. it does not handle the `ModuleNotFoundError` which raised if
`guard_import("wrong_module")`
Fixed issues; added ut-s. Controversial: I've reraised
`ModuleNotFoundError` as `ImportError`, since in case of the error, the
proposed action is the same - we need to install a missed package.