mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-13 21:47:12 +00:00
feat: Implement stream
interface (#15875)
<!-- Thank you for contributing to LangChain! Please title your PR "<package>: <description>", where <package> is whichever of langchain, community, core, experimental, etc. is being modified. Replace this entire comment with: - **Description:** a description of the change, - **Issue:** the issue # it fixes if applicable, - **Dependencies:** any dependencies required for this change, - **Twitter handle:** we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` from the root of the package you've modified to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/ If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> Major changes: - Rename `wasm_chat.py` to `llama_edge.py` - Rename the `WasmChatService` class to `ChatService` - Implement the `stream` interface for `ChatService` - Add `test_chat_wasm_service_streaming` in the integration test - Update `llama_edge.ipynb` --------- Signed-off-by: Xin Liu <sam@secondstate.io>
This commit is contained in:
135
docs/docs/integrations/chat/llama_edge.ipynb
Normal file
135
docs/docs/integrations/chat/llama_edge.ipynb
Normal file
@@ -0,0 +1,135 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# LlamaEdge\n",
|
||||
"\n",
|
||||
"[LlamaEdge](https://github.com/second-state/LlamaEdge) allows you to chat with LLMs of [GGUF](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/README.md) format both locally and via chat service.\n",
|
||||
"\n",
|
||||
"- `LlamaEdgeChatService` provides developers an OpenAI API compatible service to chat with LLMs via HTTP requests.\n",
|
||||
"\n",
|
||||
"- `LlamaEdgeChatLocal` enables developers to chat with LLMs locally (coming soon).\n",
|
||||
"\n",
|
||||
"Both `LlamaEdgeChatService` and `LlamaEdgeChatLocal` run on the infrastructure driven by [WasmEdge Runtime](https://wasmedge.org/), which provides a lightweight and portable WebAssembly container environment for LLM inference tasks.\n",
|
||||
"\n",
|
||||
"## Chat via API Service\n",
|
||||
"\n",
|
||||
"`LlamaEdgeChatService` works on the `llama-api-server`. Following the steps in [llama-api-server quick-start](https://github.com/second-state/llama-utils/tree/main/api-server#readme), you can host your own API service so that you can chat with any models you like on any device you have anywhere as long as the internet is available."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.chat_models.llama_edge import LlamaEdgeChatService\n",
|
||||
"from langchain_core.messages import HumanMessage, SystemMessage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Chat with LLMs in the non-streaming mode"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Bot] Hello! The capital of France is Paris.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# service url\n",
|
||||
"service_url = \"https://b008-54-186-154-209.ngrok-free.app\"\n",
|
||||
"\n",
|
||||
"# create wasm-chat service instance\n",
|
||||
"chat = LlamaEdgeChatService(service_url=service_url)\n",
|
||||
"\n",
|
||||
"# create message sequence\n",
|
||||
"system_message = SystemMessage(content=\"You are an AI assistant\")\n",
|
||||
"user_message = HumanMessage(content=\"What is the capital of France?\")\n",
|
||||
"messages = [system_message, user_message]\n",
|
||||
"\n",
|
||||
"# chat with wasm-chat service\n",
|
||||
"response = chat(messages)\n",
|
||||
"\n",
|
||||
"print(f\"[Bot] {response.content}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Chat with LLMs in the streaming mode"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Bot] Hello! I'm happy to help you with your question. The capital of Norway is Oslo.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# service url\n",
|
||||
"service_url = \"https://b008-54-186-154-209.ngrok-free.app\"\n",
|
||||
"\n",
|
||||
"# create wasm-chat service instance\n",
|
||||
"chat = LlamaEdgeChatService(service_url=service_url, streaming=True)\n",
|
||||
"\n",
|
||||
"# create message sequence\n",
|
||||
"system_message = SystemMessage(content=\"You are an AI assistant\")\n",
|
||||
"user_message = HumanMessage(content=\"What is the capital of Norway?\")\n",
|
||||
"messages = [\n",
|
||||
" system_message,\n",
|
||||
" user_message,\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"output = \"\"\n",
|
||||
"for chunk in chat.stream(messages):\n",
|
||||
" # print(chunk.content, end=\"\", flush=True)\n",
|
||||
" output += chunk.content\n",
|
||||
"\n",
|
||||
"print(f\"[Bot] {output}\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.7"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
@@ -1,85 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Wasm Chat\n",
|
||||
"\n",
|
||||
"`Wasm-chat` allows you to chat with LLMs of [GGUF](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/README.md) format both locally and via chat service.\n",
|
||||
"\n",
|
||||
"- `WasmChatService` provides developers an OpenAI API compatible service to chat with LLMs via HTTP requests.\n",
|
||||
"\n",
|
||||
"- `WasmChatLocal` enables developers to chat with LLMs locally (coming soon).\n",
|
||||
"\n",
|
||||
"Both `WasmChatService` and `WasmChatLocal` run on the infrastructure driven by [WasmEdge Runtime](https://wasmedge.org/), which provides a lightweight and portable WebAssembly container environment for LLM inference tasks.\n",
|
||||
"\n",
|
||||
"## Chat via API Service\n",
|
||||
"\n",
|
||||
"`WasmChatService` provides chat services by the `llama-api-server`. Following the steps in [llama-api-server quick-start](https://github.com/second-state/llama-utils/tree/main/api-server#readme), you can host your own API service so that you can chat with any models you like on any device you have anywhere as long as the internet is available."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.chat_models.wasm_chat import WasmChatService\n",
|
||||
"from langchain_core.messages import AIMessage, HumanMessage, SystemMessage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Bot] Paris\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# service url\n",
|
||||
"service_url = \"https://b008-54-186-154-209.ngrok-free.app\"\n",
|
||||
"\n",
|
||||
"# create wasm-chat service instance\n",
|
||||
"chat = WasmChatService(service_url=service_url)\n",
|
||||
"\n",
|
||||
"# create message sequence\n",
|
||||
"system_message = SystemMessage(content=\"You are an AI assistant\")\n",
|
||||
"user_message = HumanMessage(content=\"What is the capital of France?\")\n",
|
||||
"messages = [system_message, user_message]\n",
|
||||
"\n",
|
||||
"# chat with wasm-chat service\n",
|
||||
"response = chat(messages)\n",
|
||||
"\n",
|
||||
"print(f\"[Bot] {response.content}\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.7"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
Reference in New Issue
Block a user