mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-05 13:06:03 +00:00
docs: Added Deploying LLMs into production + a new ecosystem (#4047)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Co-authored-by: Kamil Kaczmarek <kaczmarek.poczta@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
This commit is contained in:
committed by
GitHub
parent
74f8e603d9
commit
625717daa8
@@ -6,6 +6,11 @@ This section covers several options for that. Note that these options are meant
|
||||
|
||||
What follows is a list of template GitHub repositories designed to be easily forked and modified to use your chain. This list is far from exhaustive, and we are EXTREMELY open to contributions here.
|
||||
|
||||
## [Anyscale](https://www.anyscale.com/model-serving)
|
||||
|
||||
Anyscale is a unified compute platform that makes it easy to develop, deploy, and manage scalable LLM applications in production using Ray.
|
||||
With Anyscale you can scale the most challenging LLM-based workloads and both develop and deploy LLM-based apps on a single compute platform.
|
||||
|
||||
## [Streamlit](https://github.com/hwchase17/langchain-streamlit-template)
|
||||
|
||||
This repo serves as a template for how to deploy a LangChain with Streamlit.
|
||||
|
233
docs/ecosystem/ray_serve.ipynb
Normal file
233
docs/ecosystem/ray_serve.ipynb
Normal file
@@ -0,0 +1,233 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ray Serve\n",
|
||||
"\n",
|
||||
"[Ray Serve](https://docs.ray.io/en/latest/serve/index.html) is a scalable model serving library for building online inference APIs. Serve is particularly well suited for system composition, enabling you to build a complex inference service consisting of multiple chains and business logic all in Python code. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goal of this notebook\n",
|
||||
"This notebook shows a simple example of how to deploy an OpenAI chain into production. You can extend it to deploy your own self-hosted models where you can easily define amount of hardware resources (GPUs and CPUs) needed to run your model in production efficiently. Read more about available options including autoscaling in the Ray Serve [documentation](https://docs.ray.io/en/latest/serve/getting_started.html).\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup Ray Serve\n",
|
||||
"Install ray with `pip install ray[serve]`. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## General Skeleton"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The general skeleton for deploying a service is the following:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# 0: Import ray serve and request from starlette\n",
|
||||
"from ray import serve\n",
|
||||
"from starlette.requests import Request\n",
|
||||
"\n",
|
||||
"# 1: Define a Ray Serve deployment.\n",
|
||||
"@serve.deployment\n",
|
||||
"class LLMServe:\n",
|
||||
"\n",
|
||||
" def __init__(self) -> None:\n",
|
||||
" # All the initialization code goes here\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
" async def __call__(self, request: Request) -> str:\n",
|
||||
" # You can parse the request here\n",
|
||||
" # and return a response\n",
|
||||
" return \"Hello World\"\n",
|
||||
"\n",
|
||||
"# 2: Bind the model to deployment\n",
|
||||
"deployment = LLMServe.bind()\n",
|
||||
"\n",
|
||||
"# 3: Run the deployment\n",
|
||||
"serve.api.run(deployment)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Shutdown the deployment\n",
|
||||
"serve.api.shutdown()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example of deploying and OpenAI chain with custom prompts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Get an OpenAI API key from [here](https://platform.openai.com/account/api-keys). By running the following code, you will be asked to provide your API key."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain import PromptTemplate, LLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from getpass import getpass\n",
|
||||
"OPENAI_API_KEY = getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"@serve.deployment\n",
|
||||
"class DeployLLM:\n",
|
||||
"\n",
|
||||
" def __init__(self):\n",
|
||||
" # We initialize the LLM, template and the chain here\n",
|
||||
" llm = OpenAI(openai_api_key=OPENAI_API_KEY)\n",
|
||||
" template = \"Question: {question}\\n\\nAnswer: Let's think step by step.\"\n",
|
||||
" prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
|
||||
" self.chain = LLMChain(llm=llm, prompt=prompt)\n",
|
||||
"\n",
|
||||
" def _run_chain(self, text: str):\n",
|
||||
" return self.chain(text)\n",
|
||||
"\n",
|
||||
" async def __call__(self, request: Request):\n",
|
||||
" # 1. Parse the request\n",
|
||||
" text = request.query_params[\"text\"]\n",
|
||||
" # 2. Run the chain\n",
|
||||
" resp = self._run_chain(text)\n",
|
||||
" # 3. Return the response\n",
|
||||
" return resp[\"text\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we can bind the deployment."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Bind the model to deployment\n",
|
||||
"deployment = DeployLLM.bind()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can assign the port number and host when we want to run the deployment. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Example port number\n",
|
||||
"PORT_NUMBER = 8282\n",
|
||||
"# Run the deployment\n",
|
||||
"serve.api.run(deployment, port=PORT_NUMBER)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now that service is deployed on port `localhost:8282` we can send a post request to get the results back."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import requests\n",
|
||||
"\n",
|
||||
"text = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
|
||||
"response = requests.post(f'http://localhost:{PORT_NUMBER}/?text={text}')\n",
|
||||
"print(response.content.decode())"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "ray",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
Reference in New Issue
Block a user