Motivation: dedicated structured output features are becoming more
common, such that integrations can support structured output without
supporting tool calling.
Here we make two changes:
1. Update the `has_structured_output` method to default to True if a
model supports tool calling (in addition to defaulting to True if
`with_structured_output` is overridden).
2. Update structured output tests to engage if `has_structured_output`
is True.
**Description:**
The response from `tool.invoke()` is always a ToolMessage, with content
and artifact fields, not a tuple.
The tuple is converted to a ToolMessage here
b6ae7ca91d/libs/core/langchain_core/tools/base.py (L726)
**Issue:**
Currently `ToolsIntegrationTests` requires `invoke()` to return a tuple
and so standard tests fail for "content_and_artifact" tools. This fixes
that to check the returned ToolMessage.
This PR also adds a test that now passes.
## Goal
Solve the following problems with `langchain-openai`:
- Structured output with `o1` [breaks out of the
box](https://langchain.slack.com/archives/C050X0VTN56/p1735232400232099).
- `with_structured_output` by default does not use OpenAI’s [structured
output
feature](https://platform.openai.com/docs/guides/structured-outputs).
- We override API defaults for temperature and other parameters.
## Breaking changes:
- Default method for structured output is changing to OpenAI’s dedicated
[structured output
feature](https://platform.openai.com/docs/guides/structured-outputs).
For schemas specified via TypedDict or JSON schema, strict schema
validation is disabled by default but can be enabled by specifying
`strict=True`.
- To recover previous default, pass `method="function_calling"` into
`with_structured_output`.
- Models that don’t support `method="json_schema"` (e.g., `gpt-4` and
`gpt-3.5-turbo`, currently the default model for ChatOpenAI) will raise
an error unless `method` is explicitly specified.
- To recover previous default, pass `method="function_calling"` into
`with_structured_output`.
- Schemas specified via Pydantic `BaseModel` that have fields with
non-null defaults or metadata (like min/max constraints) will raise an
error.
- To recover previous default, pass `method="function_calling"` into
`with_structured_output`.
- `strict` now defaults to False for `method="json_schema"` when schemas
are specified via TypedDict or JSON schema.
- To recover previous behavior, use `with_structured_output(schema,
strict=True)`
- Schemas specified via Pydantic V1 will raise a warning (and use
`method="function_calling"`) unless `method` is explicitly specified.
- To remove the warning, pass `method="function_calling"` into
`with_structured_output`.
- Streaming with default structured output method / Pydantic schema no
longer generates intermediate streamed chunks.
- To recover previous behavior, pass `method="function_calling"` into
`with_structured_output`.
- We no longer override default temperature (was 0.7 in LangChain, now
will follow OpenAI, currently 1.0).
- To recover previous behavior, initialize `ChatOpenAI` or
`AzureChatOpenAI` with `temperature=0.7`.
- Note: conceptually there is a difference between forcing a tool call
and forcing a response format. Tool calls may have more concise
arguments vs. generating content adhering to a schema. Prompts may need
to be adjusted to recover desired behavior.
---------
Co-authored-by: Jacob Lee <jacoblee93@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Populate API reference for test class properties and test methods for
chat models.
Also:
- Make `standard_chat_model_params` private.
- `pytest.skip` some tests that were previously passed if features are
not supported.
We have a test
[test_structured_few_shot_examples](ad4333ca03/libs/standard-tests/langchain_tests/integration_tests/chat_models.py (L546))
in standard integration tests that implements a version of tool-calling
few shot examples that works with ~all tested providers. The formulation
supported by ~all providers is: `human message, tool call, tool message,
AI reponse`.
Here we update
`langchain_core.utils.function_calling.tool_example_to_messages` to
support this formulation.
The `tool_example_to_messages` util is undocumented outside of our API
reference. IMO, if we are testing that this function works across all
providers, it can be helpful to feature it in our guides. The structured
few-shot examples we document at the moment require users to implement
this function and can be simplified.
* Removed `ruff check --select I` as `I` is already selected and checked
in the main `ruff check` command
* Added checks for non-empty `PYTHON_FILES`
* Run `ruff check` only on `PYTHON_FILES`
Co-authored-by: Erick Friis <erick@langchain.dev>
Here we allow standard tests to specify a value for `tool_choice` via a
`tool_choice_value` property, which defaults to None.
Chat models [available in
Together](https://docs.together.ai/docs/chat-models) have issues passing
standard tool calling tests:
- llama 3.1 models currently [appear to rely on user-side
parsing](https://docs.together.ai/docs/llama-3-function-calling) in
Together;
- Mixtral-8x7B and Mistral-7B (currently tested) consistently do not
call tools in some tests.
Specifying tool_choice also lets us remove an existing `xfail` and use a
smaller model in Groq tests.
This PR deprecates the beta upsert APIs in vectorstore.
We'll introduce them in a V2 abstraction instead to keep the existing
vectorstore implementations lighter weight.
The main problem with the existing APIs is that it's a bit more
challenging to
implement the correct behavior w/ respect to IDs since ID can be present
in
both the function signature and as an optional attribute on the document
object.
But VectorStores that pass the standard tests should have implemented
the semantics properly!
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** This includes Pydantic field metadata in
`_create_subset_model_v2` so that it gets included in the final
serialized form that get sent out.
- **Issue:** #25031
- **Dependencies:** n/a
- **Twitter handle:** @gramliu
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
This PR adds a minimal document indexer abstraction.
The goal of this abstraction is to allow developers to create custom
retrievers that also have a standard indexing API and allow updating the
document content in them.
The abstraction comes with a test suite that can verify that the indexer
implements the correct semantics.
This is an iteration over a previous PRs
(https://github.com/langchain-ai/langchain/pull/24364). The main
difference is that we're sub-classing from BaseRetriever in this
iteration and as so have consolidated the sync and async interfaces.
The main problem with the current design is that runt time search
configuration has to be specified at init rather than provided at run
time.
We will likely resolve this issue in one of the two ways:
(1) Define a method (`get_retriever`) that will allow creating a
retriever at run time with a specific configuration.. If we do this, we
will likely break the subclass on BaseRetriever
(2) Generalize base retriever so it can support structured queries
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Anthropic models (including via Bedrock and other cloud platforms)
accept a status/is_error attribute on tool messages/results
(specifically in `tool_result` content blocks for Anthropic API). Adding
a ToolMessage.status attribute so that users can set this attribute when
using those models
- Mixtral with Groq has started consistently failing tool calling tests.
Here we restrict testing to llama 3.1.
- `.schema` is deprecated in pydantic proper in favor of
`.model_json_schema`.
After this standard tests will test with the following combinations:
1. pydantic.BaseModel
2. pydantic.v1.BaseModel
If ran within a matrix, it'll covert both pydantic.BaseModel originating
from
pydantic 1 and the one defined in pydantic 2.
This PR moves the in memory implementation to langchain-core.
* The implementation remains importable from langchain-community.
* Supporting utilities are marked as private for now.
resolves https://github.com/langchain-ai/langchain/issues/23911
When an AIMessageChunk is instantiated, we attempt to parse tool calls
off of the tool_call_chunks.
Here we add a special-case to this parsing, where `""` will be parsed as
`{}`.
This is a reaction to how Anthropic streams tool calls in the case where
a function has no arguments:
```
{'id': 'toolu_01J8CgKcuUVrMqfTQWPYh64r', 'input': {}, 'name': 'magic_function', 'type': 'tool_use', 'index': 1}
{'partial_json': '', 'type': 'tool_use', 'index': 1}
```
The `partial_json` does not accumulate to a valid json string-- most
other providers tend to emit `"{}"` in this case.
This PR rolls out part of the new proposed interface for vectorstores
(https://github.com/langchain-ai/langchain/pull/23544) to existing store
implementations.
The PR makes the following changes:
1. Adds standard upsert, streaming_upsert, aupsert, astreaming_upsert
methods to the vectorstore.
2. Updates `add_texts` and `aadd_texts` to be non required with a
default implementation that delegates to `upsert` and `aupsert` if those
have been implemented. The original `add_texts` and `aadd_texts` methods
are problematic as they spread object specific information across
document and **kwargs. (e.g., ids are not a part of the document)
3. Adds a default implementation to `add_documents` and `aadd_documents`
that delegates to `upsert` and `aupsert` respectively.
4. Adds standard unit tests to verify that a given vectorstore
implements a correct read/write API.
A downside of this implementation is that it creates `upsert` with a
very similar signature to `add_documents`.
The reason for introducing `upsert` is to:
* Remove any ambiguities about what information is allowed in `kwargs`.
Specifically kwargs should only be used for information common to all
indexed data. (e.g., indexing timeout).
*Allow inheriting from an anticipated generalized interface for indexing
that will allow indexing `BaseMedia` (i.e., allow making a vectorstore
for images/audio etc.)
`add_documents` can be deprecated in the future in favor of `upsert` to
make sure that users have a single correct way of indexing content.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
- add test for structured output
- fix bug with structured output for Azure
- better testing on Groq (break out Mixtral + Llama3 and add xfails
where needed)
Add standard tests to base store abstraction. These only work on [str,
str] right now. We'll need to check if it's possible to add
encoder/decoders to generalize
- Refactor standard test classes to make them easier to configure
- Update openai to support stop_sequences init param
- Update groq to support stop_sequences init param
- Update fireworks to support max_retries init param
- Update ChatModel.bind_tools to type tool_choice
- Update groq to handle tool_choice="any". **this may be controversial**
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
```python
class UsageMetadata(TypedDict):
"""Usage metadata for a message, such as token counts.
Attributes:
input_tokens: (int) count of input (or prompt) tokens
output_tokens: (int) count of output (or completion) tokens
total_tokens: (int) total token count
"""
input_tokens: int
output_tokens: int
total_tokens: int
```
```python
class AIMessage(BaseMessage):
...
usage_metadata: Optional[UsageMetadata] = None
"""If provided, token usage information associated with the message."""
...
```