middleware tests have gotten quite unwieldy, major restructuring, sets
the stage for coverage increase
this is super hard to review -- as a proof that we've retained important
tests, I ran coverage on `master` and this branch and confirmed
identical coverage.
* moving all middleware related tests to `agents/middleware` folder
* consolidating related test files
* adding coverage utility to makefile
- **Description:** Updated Function Signature of `create_agent`, the
system prompt can be both a list and string. I see no harm in doing
this, since SystemMessage accepts both.
- **Issue:** #33630
---------
Co-authored-by: Sydney Runkle <54324534+sydney-runkle@users.noreply.github.com>
* for run count + thread count overflow we should warn model not to call
again
* don't tally mocked tool calls in thread limit -- consider the
following
* run limit is 1
* thread limit is 3
* first run calls the tool 2 times, 1 executes, 1 is blocked
* we should only count the successful execution above towards the total
thread count
* raise more helpful warnings on invalid config
* improving typing (covariance)
* adding in support for continuing w/ tool calls not yet at threshold,
switching default to continue
* moving all logic into after model
```py
ExitBehavior = Literal["continue", "error", "end"]
"""How to handle execution when tool call limits are exceeded.
- `"continue"`: Block exceeded tools with error messages, let other tools continue (default)
- `"error"`: Raise a `ToolCallLimitExceededError` exception
- `"end"`: Stop execution immediately, injecting a ToolMessage and an AI message
for the single tool call that exceeded the limit. Raises `NotImplementedError`
if there are multiple tool calls
"""
```
- standardize on using model IDs, no more aliases - makes future
maintenance easier
- use latest models in docstrings to highlight support
- remove remaining sonnet 3-7 usage due to deprecation
Depends on #33751
While working on ToolRuntime in TS I discovered that Python still uses
`thread_model_call_count` and `run_model_call_count` in ToolNode tests
which afaik we removed.
Moving all `ToolNode` related improvements back to LangGraph and
importing them in LC!
pairing w/ https://github.com/langchain-ai/langgraph/pull/6321
this fixes a couple of things:
1. `InjectedState`, store etc will continue to work as expected no
matter where the import is from
2. `ToolRuntime` is now usable w/in langgraph, woohoo!
* attach the latest `AIMessage` to all `StructuredOutputError`s so that
relevant middleware can use as desired
* raise `StructuredOutputError` from `ProviderStrategy` logic in case of
failed parsing (so that we can retry from middleware)
* added a test suite w/ example custom middleware that retries for tool
+ provider strategy
Long term, we could add our own opinionated structured output retry
middleware, but this at least unblocks folks who want to use custom
retry logic in the short term :)
```py
class StructuredOutputRetryMiddleware(AgentMiddleware):
"""Retries model calls when structured output parsing fails."""
def __init__(self, max_retries: int) -> None:
self.max_retries = max_retries
def wrap_model_call(
self, request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]
) -> ModelResponse:
for attempt in range(self.max_retries + 1):
try:
return handler(request)
except StructuredOutputError as exc:
if attempt == self.max_retries:
raise
ai_content = exc.ai_message.content
error_message = (
f"Your previous response was:\n{ai_content}\n\n"
f"Error: {exc}. Please try again with a valid response."
)
request.messages.append(HumanMessage(content=error_message))
```
The LLM shouldn't be seeing parameters it cannot control in the
ToolMessage error it gets when it invokes a tool with incorrect args.
This fixes the behavior within langchain to address immediate issue.
We may want to change the behavior in langchain_core as well to prevent
validation of injected arguments. But this would be done in a separate
change
added some noqas, this is a quick patch to support a bug uncovered in
the quickstart, will resolve fully depending on where we centralize
ToolNode stuff.
the fact that this was broken showcases that we need significantly
better test coverage, this is literally the most minimalistic usage of
this middleware there could be 😿
will document these two gotchas better for custom middleware
```py
from langchain.agents.middleware.shell_tool import ShellToolMiddleware
from langchain.agents import create_agent
agent = create_agent(model="openai:gpt-4",middleware = [ShellToolMiddleware()])
agent.invoke({"messages":[{"role": "user", "content": "hi"}]})
```
mostly #33520
also tacking on change to make sure we're only looking at client side
calls for the jump to end
---------
Co-authored-by: Nuno Campos <nuno@boringbits.io>
- Both middleware share the same implementation, the only difference is
one uses Claude's server-side tool definition, whereas the other one
uses a generic tool definition compatible with all models
- Implemented 3 execution policies (responsible for actually running the
shell process)
- HostExecutionPolicy runs the shell as subprocess, appropriate for
already sandboxed environments, eg when run inside a dedicated docker
container
- CodexSandboxExecutionPolicy runs the shell using the sandbox command
from the Codex CLI which implements sandboxing techniques for Linux and
Mac OS.
- DockerExecutionPolicy runs the shell inside a dedicated Docker
container for isolation.
- Implements all behaviours described in
https://docs.claude.com/en/docs/agents-and-tools/tool-use/bash-tool#handle-large-outputs
including timeouts, truncation, output redaction, etc
---------
Co-authored-by: Sydney Runkle <54324534+sydney-runkle@users.noreply.github.com>
Co-authored-by: Sydney Runkle <sydneymarierunkle@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>