{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# LlamaEdge\n",
    "\n",
    "[LlamaEdge](https://github.com/second-state/LlamaEdge) allows you to chat with LLMs of [GGUF](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/README.md) format both locally and via chat service.\n",
    "\n",
    "- `LlamaEdgeChatService` provides developers an OpenAI API compatible service to chat with LLMs via HTTP requests.\n",
    "\n",
    "- `LlamaEdgeChatLocal` enables developers to chat with LLMs locally (coming soon).\n",
    "\n",
    "Both `LlamaEdgeChatService` and `LlamaEdgeChatLocal` run on the infrastructure driven by [WasmEdge Runtime](https://wasmedge.org/), which provides a lightweight and portable WebAssembly container environment for LLM inference tasks.\n",
    "\n",
    "## Chat via API Service\n",
    "\n",
    "`LlamaEdgeChatService` works on the `llama-api-server`. Following the steps in [llama-api-server quick-start](https://github.com/second-state/llama-utils/tree/main/api-server#readme), you can host your own API service so that you can chat with any models you like on any device you have anywhere as long as the internet is available."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.chat_models.llama_edge import LlamaEdgeChatService\n",
    "from langchain_core.messages import HumanMessage, SystemMessage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Chat with LLMs in the non-streaming mode"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Bot] Hello! The capital of France is Paris.\n"
     ]
    }
   ],
   "source": [
    "# service url\n",
    "service_url = \"https://b008-54-186-154-209.ngrok-free.app\"\n",
    "\n",
    "# create wasm-chat service instance\n",
    "chat = LlamaEdgeChatService(service_url=service_url)\n",
    "\n",
    "# create message sequence\n",
    "system_message = SystemMessage(content=\"You are an AI assistant\")\n",
    "user_message = HumanMessage(content=\"What is the capital of France?\")\n",
    "messages = [system_message, user_message]\n",
    "\n",
    "# chat with wasm-chat service\n",
    "response = chat(messages)\n",
    "\n",
    "print(f\"[Bot] {response.content}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Chat with LLMs in the streaming mode"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Bot]   Hello! I'm happy to help you with your question. The capital of Norway is Oslo.\n"
     ]
    }
   ],
   "source": [
    "# service url\n",
    "service_url = \"https://b008-54-186-154-209.ngrok-free.app\"\n",
    "\n",
    "# create wasm-chat service instance\n",
    "chat = LlamaEdgeChatService(service_url=service_url, streaming=True)\n",
    "\n",
    "# create message sequence\n",
    "system_message = SystemMessage(content=\"You are an AI assistant\")\n",
    "user_message = HumanMessage(content=\"What is the capital of Norway?\")\n",
    "messages = [\n",
    "    system_message,\n",
    "    user_message,\n",
    "]\n",
    "\n",
    "output = \"\"\n",
    "for chunk in chat.stream(messages):\n",
    "    # print(chunk.content, end=\"\", flush=True)\n",
    "    output += chunk.content\n",
    "\n",
    "print(f\"[Bot] {output}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}