chore(community): update to OpenLLM 0.6 (#24609)

Update to OpenLLM 0.6, which we decides to make use of OpenLLM's OpenAI-compatible endpoint. Thus, OpenLLM will now just become a thin wrapper around OpenAI wrapper. Signed-off-by: Aaron Pham <contact@aarnphm.xyz> --------- Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: ccurme <chester.curme@gmail.com>
2025-09-01 11:02:37 +00:00 · 2024-12-16 14:30:07 -05:00
parent 5c17a4ace9
commit 12fced13f4
5 changed files with 64 additions and 432 deletions
--- a/docs/docs/integrations/llms/openllm.ipynb
+++ b/docs/docs/integrations/llms/openllm.ipynb
@@ -7,7 +7,14 @@
   "source": [
    "# OpenLLM\n",
    "\n",
-    "[🦾 OpenLLM](https://github.com/bentoml/OpenLLM) is an open platform for operating large language models (LLMs) in production. It enables developers to easily run inference with any open-source LLMs, deploy to the cloud or on-premises, and build powerful AI apps."
+    "[🦾 OpenLLM](https://github.com/bentoml/OpenLLM) lets developers run any **open-source LLMs** as **OpenAI-compatible API** endpoints with **a single command**.\n",
+    "\n",
+    "- 🔬 Build for fast and production usages\n",
+    "- 🚂 Support llama3, qwen2, gemma, etc, and many **quantized** versions [full list](https://github.com/bentoml/openllm-models)\n",
+    "- ⛓️ OpenAI-compatible API\n",
+    "- 💬 Built-in ChatGPT like UI\n",
+    "- 🔥 Accelerated LLM decoding with state-of-the-art inference backends\n",
+    "- 🌥️ Ready for enterprise-grade cloud deployment (Kubernetes, Docker and BentoCloud)"
   ]
  },
  {
@@ -37,10 +44,10 @@
   "source": [
    "## Launch OpenLLM server locally\n",
    "\n",
-    "To start an LLM server, use `openllm start` command. For example, to start a dolly-v2 server, run the following command from a terminal:\n",
+    "To start an LLM server, use `openllm hello` command:\n",
    "\n",
    "```bash\n",
-    "openllm start dolly-v2\n",
+    "openllm hello\n",
    "```\n",
    "\n",
    "\n",
@@ -57,74 +64,7 @@
    "from langchain_community.llms import OpenLLM\n",
    "\n",
    "server_url = \"http://localhost:3000\"  # Replace with remote host if you are running on a remote server\n",
-    "llm = OpenLLM(server_url=server_url)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4f830f9d",
-   "metadata": {},
-   "source": [
-    "### Optional: Local LLM Inference\n",
-    "\n",
-    "You may also choose to initialize an LLM managed by OpenLLM locally from current process. This is useful for development purpose and allows developers to quickly try out different types of LLMs.\n",
-    "\n",
-    "When moving LLM applications to production, we recommend deploying the OpenLLM server separately and access via the `server_url` option demonstrated above.\n",
-    "\n",
-    "To load an LLM locally via the LangChain wrapper:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "82c392b6",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_community.llms import OpenLLM\n",
-    "\n",
-    "llm = OpenLLM(\n",
-    "    model_name=\"dolly-v2\",\n",
-    "    model_id=\"databricks/dolly-v2-3b\",\n",
-    "    temperature=0.94,\n",
-    "    repetition_penalty=1.2,\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f15ebe0d",
-   "metadata": {},
-   "source": [
-    "### Integrate with a LLMChain"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "8b02a97a",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "iLkb\n"
-     ]
-    }
-   ],
-   "source": [
-    "from langchain.chains import LLMChain\n",
-    "from langchain_core.prompts import PromptTemplate\n",
-    "\n",
-    "template = \"What is a good name for a company that makes {product}?\"\n",
-    "\n",
-    "prompt = PromptTemplate.from_template(template)\n",
-    "\n",
-    "llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
-    "\n",
-    "generated = llm_chain.run(product=\"mechanical keyboard\")\n",
-    "print(generated)"
+    "llm = OpenLLM(base_url=server_url, api_key=\"na\")"
   ]
  },
  {
@@ -133,7 +73,9 @@
   "id": "56cb4bc0",
   "metadata": {},
   "outputs": [],
-   "source": []
+   "source": [
+    "llm(\"To build a LLM from scratch, the following are the steps:\")"
+   ]
  }
 ],
 "metadata": {
@@ -152,7 +94,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.10"
+   "version": "3.11.9"
  }
 },
 "nbformat": 4,
--- a/docs/docs/integrations/providers/openllm.mdx
+++ b/docs/docs/integrations/providers/openllm.mdx
@@ -1,11 +1,17 @@
+---
+keywords: [openllm]
+---
+
 # OpenLLM

-This page demonstrates how to use [OpenLLM](https://github.com/bentoml/OpenLLM)
-with LangChain.
+OpenLLM lets developers run any **open-source LLMs** as **OpenAI-compatible API** endpoints with **a single command**.

-`OpenLLM` is an open platform for operating large language models (LLMs) in
-production. It enables developers to easily run inference with any open-source
-LLMs, deploy to the cloud or on-premises, and build powerful AI apps.
+- 🔬 Build for fast and production usages
+- 🚂 Support llama3, qwen2, gemma, etc, and many **quantized** versions [full list](https://github.com/bentoml/openllm-models)
+- ⛓️ OpenAI-compatible API
+- 💬 Built-in ChatGPT like UI
+- 🔥 Accelerated LLM decoding with state-of-the-art inference backends
+- 🌥️ Ready for enterprise-grade cloud deployment (Kubernetes, Docker and BentoCloud)

 ## Installation and Setup

@@ -23,8 +29,7 @@ are pre-optimized for OpenLLM.

 ## Wrappers

-There is a OpenLLM Wrapper which supports loading LLM in-process or accessing a
-remote OpenLLM server:
+There is a OpenLLM Wrapper which supports interacting with running server with OpenLLM:

 ```python
 from langchain_community.llms import OpenLLM
@@ -32,13 +37,12 @@ from langchain_community.llms import OpenLLM

 ### Wrapper for OpenLLM server

-This wrapper supports connecting to an OpenLLM server via HTTP or gRPC. The
-OpenLLM server can run either locally or on the cloud.
+This wrapper supports interacting with OpenLLM's OpenAI-compatible endpoint.

-To try it out locally, start an OpenLLM server:
+To run a model, do:

 ```bash
-openllm start flan-t5
+openllm hello
 ```

 Wrapper usage:
@@ -46,20 +50,7 @@ Wrapper usage:
 ```python
 from langchain_community.llms import OpenLLM

-llm = OpenLLM(server_url='http://localhost:3000')
-
-llm("What is the difference between a duck and a goose? And why there are so many Goose in Canada?")
-```
-
-### Wrapper for Local Inference
-
-You can also use the OpenLLM wrapper to load LLM in current Python process for
-running inference.
-
-```python
-from langchain_community.llms import OpenLLM
-
-llm = OpenLLM(model_name="dolly-v2", model_id='databricks/dolly-v2-7b')
+llm = OpenLLM(base_url="http://localhost:3000/v1", api_key="na")

 llm("What is the difference between a duck and a goose? And why there are so many Goose in Canada?")
 ```