Enable streaming for OpenAI LLM (#986)

* Support a callback `on_llm_new_token` that users can implement when `OpenAI.streaming` is set to `True`
2025-09-05 21:12:48 +00:00 · 2023-02-14 15:06:14 -08:00
parent f05f025e41
commit caa8e4742e
26 changed files with 1311 additions and 155 deletions
--- a/docs/modules/llms/getting_started.ipynb
+++ b/docs/modules/llms/getting_started.ipynb
@@ -18,7 +18,9 @@
   "cell_type": "code",
   "execution_count": 1,
   "id": "df924055",
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
   "outputs": [],
   "source": [
    "from langchain.llms import OpenAI"
@@ -207,14 +209,6 @@
   "source": [
    "llm.get_num_tokens(\"what a joke\")"
   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b004ffdd",
-   "metadata": {},
-   "outputs": [],
-   "source": []
  }
 ],
 "metadata": {
--- a/docs/modules/llms/how_to_guides.rst
+++ b/docs/modules/llms/how_to_guides.rst
@@ -8,6 +8,7 @@ They are split into two categories:
 1. `Generic Functionality <./generic_how_to.html>`_: Covering generic functionality all LLMs should have.
 2. `Integrations <./integrations.html>`_: Covering integrations with various LLM providers.
 3. `Asynchronous <./async_llm.html>`_: Covering asynchronous functionality.
+4. `Streaming <./streaming_llm.html>`_: Covering streaming functionality.

 .. toctree::
   :maxdepth: 1
--- a/docs/modules/llms/streaming_llm.ipynb
+++ b/docs/modules/llms/streaming_llm.ipynb
@@ -0,0 +1,140 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "6eaf7e66-f49c-42da-8d11-22ea13bef718",
+   "metadata": {},
+   "source": [
+    "# Streaming with LLMs\n",
+    "\n",
+    "LangChain provides streaming support for LLMs. Currently, we only support streaming for the `OpenAI` LLM implementation, but streaming support for other LLM implementations is on the roadmap. To utilize streaming, use a [`CallbackHandler`](https://github.com/hwchase17/langchain/blob/master/langchain/callbacks/base.py) that implements `on_llm_new_token`. In this example, we are using [`StreamingStdOutCallbackHandler`]()."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "4ac0ff54-540a-4f2b-8d9a-b590fec7fe07",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "Verse 1\n",
+      "I'm sippin' on sparkling water,\n",
+      "It's so refreshing and light,\n",
+      "It's the perfect way to quench my thirst,\n",
+      "On a hot summer night.\n",
+      "\n",
+      "Chorus\n",
+      "Sparkling water, sparkling water,\n",
+      "It's the best way to stay hydrated,\n",
+      "It's so refreshing and light,\n",
+      "It's the perfect way to stay alive.\n",
+      "\n",
+      "Verse 2\n",
+      "I'm sippin' on sparkling water,\n",
+      "It's so bubbly and bright,\n",
+      "It's the perfect way to cool me down,\n",
+      "On a hot summer night.\n",
+      "\n",
+      "Chorus\n",
+      "Sparkling water, sparkling water,\n",
+      "It's the best way to stay hydrated,\n",
+      "It's so refreshing and light,\n",
+      "It's the perfect way to stay alive.\n",
+      "\n",
+      "Verse 3\n",
+      "I'm sippin' on sparkling water,\n",
+      "It's so crisp and clean,\n",
+      "It's the perfect way to keep me going,\n",
+      "On a hot summer day.\n",
+      "\n",
+      "Chorus\n",
+      "Sparkling water, sparkling water,\n",
+      "It's the best way to stay hydrated,\n",
+      "It's so refreshing and light,\n",
+      "It's the perfect way to stay alive."
+     ]
+    }
+   ],
+   "source": [
+    "from langchain.llms import OpenAI\n",
+    "from langchain.callbacks.base import CallbackManager\n",
+    "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
+    "\n",
+    "\n",
+    "llm = OpenAI(streaming=True, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), verbose=True, temperature=0)\n",
+    "resp = llm(\"Write me a song about sparkling water.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "61fb6de7-c6c8-48d0-a48e-1204c027a23c",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "We still have access to the end `LLMResult` if using `generate`. However, `token_usage` is not currently supported for streaming."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "a35373f1-9ee6-4753-a343-5aee749b8527",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "Q: What did the fish say when it hit the wall?\n",
+      "A: Dam!"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "LLMResult(generations=[[Generation(text='\\n\\nQ: What did the fish say when it hit the wall?\\nA: Dam!', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {}})"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "llm.generate([\"Tell me a joke.\"])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}