feat: Implement stream interface (#15875)

Major changes: - Rename `wasm_chat.py` to `llama_edge.py` - Rename the `WasmChatService` class to `ChatService` - Implement the `stream` interface for `ChatService` - Add `test_chat_wasm_service_streaming` in the integration test - Update `llama_edge.ipynb` --------- Signed-off-by: Xin Liu <sam@secondstate.io>
2025-09-13 21:47:12 +00:00 · 2024-01-12 13:32:48 +08:00
parent ec4dab0449
commit 5efec068c9
8 changed files with 299 additions and 128 deletions
--- a/docs/docs/integrations/chat/llama_edge.ipynb
+++ b/docs/docs/integrations/chat/llama_edge.ipynb
@@ -0,0 +1,135 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# LlamaEdge\n",
+    "\n",
+    "[LlamaEdge](https://github.com/second-state/LlamaEdge) allows you to chat with LLMs of [GGUF](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/README.md) format both locally and via chat service.\n",
+    "\n",
+    "- `LlamaEdgeChatService` provides developers an OpenAI API compatible service to chat with LLMs via HTTP requests.\n",
+    "\n",
+    "- `LlamaEdgeChatLocal` enables developers to chat with LLMs locally (coming soon).\n",
+    "\n",
+    "Both `LlamaEdgeChatService` and `LlamaEdgeChatLocal` run on the infrastructure driven by [WasmEdge Runtime](https://wasmedge.org/), which provides a lightweight and portable WebAssembly container environment for LLM inference tasks.\n",
+    "\n",
+    "## Chat via API Service\n",
+    "\n",
+    "`LlamaEdgeChatService` works on the `llama-api-server`. Following the steps in [llama-api-server quick-start](https://github.com/second-state/llama-utils/tree/main/api-server#readme), you can host your own API service so that you can chat with any models you like on any device you have anywhere as long as the internet is available."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.chat_models.llama_edge import LlamaEdgeChatService\n",
+    "from langchain_core.messages import HumanMessage, SystemMessage"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Chat with LLMs in the non-streaming mode"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[Bot] Hello! The capital of France is Paris.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# service url\n",
+    "service_url = \"https://b008-54-186-154-209.ngrok-free.app\"\n",
+    "\n",
+    "# create wasm-chat service instance\n",
+    "chat = LlamaEdgeChatService(service_url=service_url)\n",
+    "\n",
+    "# create message sequence\n",
+    "system_message = SystemMessage(content=\"You are an AI assistant\")\n",
+    "user_message = HumanMessage(content=\"What is the capital of France?\")\n",
+    "messages = [system_message, user_message]\n",
+    "\n",
+    "# chat with wasm-chat service\n",
+    "response = chat(messages)\n",
+    "\n",
+    "print(f\"[Bot] {response.content}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Chat with LLMs in the streaming mode"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[Bot]   Hello! I'm happy to help you with your question. The capital of Norway is Oslo.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# service url\n",
+    "service_url = \"https://b008-54-186-154-209.ngrok-free.app\"\n",
+    "\n",
+    "# create wasm-chat service instance\n",
+    "chat = LlamaEdgeChatService(service_url=service_url, streaming=True)\n",
+    "\n",
+    "# create message sequence\n",
+    "system_message = SystemMessage(content=\"You are an AI assistant\")\n",
+    "user_message = HumanMessage(content=\"What is the capital of Norway?\")\n",
+    "messages = [\n",
+    "    system_message,\n",
+    "    user_message,\n",
+    "]\n",
+    "\n",
+    "output = \"\"\n",
+    "for chunk in chat.stream(messages):\n",
+    "    # print(chunk.content, end=\"\", flush=True)\n",
+    "    output += chunk.content\n",
+    "\n",
+    "print(f\"[Bot] {output}\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/docs/docs/integrations/chat/wasm_chat.ipynb
+++ b/docs/docs/integrations/chat/wasm_chat.ipynb
@@ -1,85 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Wasm Chat\n",
-    "\n",
-    "`Wasm-chat` allows you to chat with LLMs of [GGUF](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/README.md) format both locally and via chat service.\n",
-    "\n",
-    "- `WasmChatService` provides developers an OpenAI API compatible service to chat with LLMs via HTTP requests.\n",
-    "\n",
-    "- `WasmChatLocal` enables developers to chat with LLMs locally (coming soon).\n",
-    "\n",
-    "Both `WasmChatService` and `WasmChatLocal` run on the infrastructure driven by [WasmEdge Runtime](https://wasmedge.org/), which provides a lightweight and portable WebAssembly container environment for LLM inference tasks.\n",
-    "\n",
-    "## Chat via API Service\n",
-    "\n",
-    "`WasmChatService` provides chat services by the `llama-api-server`. Following the steps in [llama-api-server quick-start](https://github.com/second-state/llama-utils/tree/main/api-server#readme), you can host your own API service so that you can chat with any models you like on any device you have anywhere as long as the internet is available."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_community.chat_models.wasm_chat import WasmChatService\n",
-    "from langchain_core.messages import AIMessage, HumanMessage, SystemMessage"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[Bot] Paris\n"
-     ]
-    }
-   ],
-   "source": [
-    "# service url\n",
-    "service_url = \"https://b008-54-186-154-209.ngrok-free.app\"\n",
-    "\n",
-    "# create wasm-chat service instance\n",
-    "chat = WasmChatService(service_url=service_url)\n",
-    "\n",
-    "# create message sequence\n",
-    "system_message = SystemMessage(content=\"You are an AI assistant\")\n",
-    "user_message = HumanMessage(content=\"What is the capital of France?\")\n",
-    "messages = [system_message, user_message]\n",
-    "\n",
-    "# chat with wasm-chat service\n",
-    "response = chat(messages)\n",
-    "\n",
-    "print(f\"[Bot] {response.content}\")"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.11.7"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}