mirror of
https://github.com/hwchase17/langchain.git
synced 2025-07-05 20:58:25 +00:00
docs[minor]: Add how to guide for rate limiting a chat model (#24686)
Add how-to guide for rate limiting a chat model.
This commit is contained in:
parent
c4d2a53f18
commit
e00cc74926
146
docs/docs/how_to/chat_model_rate_limiting.ipynb
Normal file
146
docs/docs/how_to/chat_model_rate_limiting.ipynb
Normal file
@ -0,0 +1,146 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "dcf87b32",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# How to handle rate limits\n",
|
||||||
|
"\n",
|
||||||
|
":::info Prerequisites\n",
|
||||||
|
"\n",
|
||||||
|
"This guide assumes familiarity with the following concepts:\n",
|
||||||
|
"- [Chat models](/docs/concepts/#chat-models)\n",
|
||||||
|
"- [LLMs](/docs/concepts/#llms)\n",
|
||||||
|
":::\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"You may find yourself in a situation where you are getting rate limited by the model provider API because you're making too many requests.\n",
|
||||||
|
"\n",
|
||||||
|
"For example, this might happen if you are running many parallel queries to benchmark the chat model on a test dataset.\n",
|
||||||
|
"\n",
|
||||||
|
"If you are facing such a situation, you can use a rate limiter to help match the rate at which you're making request to the rate allowed\n",
|
||||||
|
"by the API.\n",
|
||||||
|
"\n",
|
||||||
|
":::info Requires ``langchain-core >= 0.2.24``\n",
|
||||||
|
"\n",
|
||||||
|
"This functionality was added in ``langchain-core == 0.2.24``. Please make sure your package is up to date.\n",
|
||||||
|
":::"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "cbc3c873-6109-4e03-b775-b73c1003faea",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Initialize a rate limiter\n",
|
||||||
|
"\n",
|
||||||
|
"Langchain comes with a built-in in memory rate limiter. This rate limiter is thread safe and can be shared by multiple threads in the same process.\n",
|
||||||
|
"\n",
|
||||||
|
"The provided rate limiter can only limit the number of requests per unit time. It will not help if you need to also limited based on the size\n",
|
||||||
|
"of the requests."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 4,
|
||||||
|
"id": "aa9c3c8c-0464-4190-a8c5-d69d173505a6",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain_core.rate_limiters import InMemoryRateLimiter\n",
|
||||||
|
"\n",
|
||||||
|
"rate_limiter = InMemoryRateLimiter(\n",
|
||||||
|
" requests_per_second=0.1, # <-- Super slow! We can only make a request once every 10 seconds!!\n",
|
||||||
|
" check_every_n_seconds=0.1, # Wake up every 100 ms to check whether allowed to make a request,\n",
|
||||||
|
" max_bucket_size=10, # Controls the maximum burst size.\n",
|
||||||
|
")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "8e058bde-9413-4b08-8cc6-0c9cb638f19f",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Choose a model\n",
|
||||||
|
"\n",
|
||||||
|
"Choose any model and pass to it the rate_limiter via the `rate_limiter` attribute."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 2,
|
||||||
|
"id": "0f880a3a-c047-4e94-a323-fff2a4c0e96d",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import os\n",
|
||||||
|
"import time\n",
|
||||||
|
"from getpass import getpass\n",
|
||||||
|
"\n",
|
||||||
|
"if \"ANTHROPIC_API_KEY\" not in os.environ:\n",
|
||||||
|
" os.environ[\"ANTHROPIC_API_KEY\"] = getpass()\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"from langchain_anthropic import ChatAnthropic\n",
|
||||||
|
"\n",
|
||||||
|
"model = ChatAnthropic(model_name=\"claude-3-opus-20240229\", rate_limiter=rate_limiter)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "80c9ab3a-299a-460f-985c-90280a046f52",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Let's confirm that the rate limiter works. We should only be able to invoke the model once per 10 seconds."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 3,
|
||||||
|
"id": "d074265c-9f32-4c5f-b914-944148993c4d",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"11.599073648452759\n",
|
||||||
|
"10.7502121925354\n",
|
||||||
|
"10.244257926940918\n",
|
||||||
|
"8.83088755607605\n",
|
||||||
|
"11.645203590393066\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"for _ in range(5):\n",
|
||||||
|
" tic = time.time()\n",
|
||||||
|
" model.invoke(\"hello\")\n",
|
||||||
|
" toc = time.time()\n",
|
||||||
|
" print(toc - tic)"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.11.4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5
|
||||||
|
}
|
@ -83,6 +83,7 @@ These are the core building blocks you can use when building applications.
|
|||||||
- [How to: track response metadata across providers](/docs/how_to/response_metadata)
|
- [How to: track response metadata across providers](/docs/how_to/response_metadata)
|
||||||
- [How to: use chat model to call tools](/docs/how_to/tool_calling)
|
- [How to: use chat model to call tools](/docs/how_to/tool_calling)
|
||||||
- [How to: stream tool calls](/docs/how_to/tool_streaming)
|
- [How to: stream tool calls](/docs/how_to/tool_streaming)
|
||||||
|
- [How to: handle rate limits](/docs/how_to/chat_model_rate_limiting)
|
||||||
- [How to: few shot prompt tool behavior](/docs/how_to/tools_few_shot)
|
- [How to: few shot prompt tool behavior](/docs/how_to/tools_few_shot)
|
||||||
- [How to: bind model-specific formatted tools](/docs/how_to/tools_model_specific)
|
- [How to: bind model-specific formatted tools](/docs/how_to/tools_model_specific)
|
||||||
- [How to: force a specific tool call](/docs/how_to/tool_choice)
|
- [How to: force a specific tool call](/docs/how_to/tool_choice)
|
||||||
|
Loading…
Reference in New Issue
Block a user