Add Streaming for GPT4All (#2642)

- Adds support for callback handlers in GPT4All models - Updates notebook and docs
2025-09-09 06:53:59 +00:00 · 2023-04-09 17:54:26 -07:00
parent c51753250d
commit 10ff1fda8e
3 changed files with 87 additions and 15 deletions
--- a/docs/ecosystem/gpt4all.md
+++ b/docs/ecosystem/gpt4all.md
@@ -1,21 +1,21 @@
 # GPT4All

-This page covers how to use the `GPT4All` wrapper within LangChain.
-It is broken into two parts: installation and setup, and then usage with an example.
+This page covers how to use the `GPT4All` wrapper within LangChain. The tutorial is divided into two parts: installation and setup, followed by usage with an example.

 ## Installation and Setup
 - Install the Python package with `pip install pyllamacpp`
- Download a [GPT4All model](https://github.com/nomic-ai/gpt4all) and place it in your desired directory
+- Download a [GPT4All model](https://github.com/nomic-ai/pyllamacpp#supported-model) and place it in your desired directory

 ## Usage

 ### GPT4All

 To use the GPT4All wrapper, you need to provide the path to the pre-trained model file and the model's configuration.
+
 ```python
 from langchain.llms import GPT4All

-# Instantiate the model
+# Instantiate the model. Callbacks support token-wise streaming
 model = GPT4All(model="./models/gpt4all-model.bin", n_ctx=512, n_threads=8)

 # Generate text
@@ -24,14 +24,24 @@ response = model("Once upon a time, ")

 You can also customize the generation parameters, such as n_predict, temp, top_p, top_k, and others.

-Example:
+To stream the model's predictions, add in a CallbackManager.

 ```python
-model = GPT4All(model="./models/gpt4all-model.bin", n_predict=55, temp=0)
-response = model("Once upon a time, ")
+from langchain.llms import GPT4All
+from langchain.callbacks.base import CallbackManager
+from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
+# There are many CallbackHandlers supported, such as
+# from langchain.callbacks.streamlit import StreamlitCallbackHandler
+
+callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
+model = GPT4All(model="./models/gpt4all-model.bin", n_ctx=512, n_threads=8, callback_handler=callback_handler, verbose=True)
+
+# Generate text. Tokens are streamed throught the callback manager.
+model("Once upon a time, ")
 ```
+
 ## Model File

-You can find links to model file downloads at the [GPT4all](https://github.com/nomic-ai/gpt4all) repository. They will need to be converted to `ggml` format to work, as specified in the [pyllamacpp](https://github.com/nomic-ai/pyllamacpp) repository.
+You can find links to model file downloads in the [pyllamacpp](https://github.com/nomic-ai/pyllamacpp) repository.

 For a more detailed walkthrough of this, see [this notebook](../modules/models/llms/integrations/gpt4all.ipynb)