Python Bindings: Improved unit tests, documentation and unification of API (#1090)

* Makefiles, black, isort * Black and isort * unit tests and generation method * chat context provider * context does not reset * Current state * Fixup * Python bindings with unit tests * GPT4All Python Bindings: chat contexts, tests * New python bindings and backend fixes * Black and Isort * Documentation error * preserved n_predict for backwords compat with langchain --------- Co-authored-by: Adam Treat <treat.adam@gmail.com>
2025-09-23 04:21:45 +00:00 · 2023-06-30 16:02:02 -04:00
parent 40a3faeb05
commit 46a0762bd5
15 changed files with 437 additions and 407 deletions
--- a/gpt4all-bindings/python/docs/gpt4all_python.md
+++ b/gpt4all-bindings/python/docs/gpt4all_python.md
@@ -1,6 +1,6 @@
 # GPT4All Python API
-The `GPT4All` package provides Python bindings and an API to our C/C++ model backend libraries.
-The source code, README, and local build instructions can be found [here](https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-bindings/python).
+The `GPT4All` python package provides bindings to our C/C++ model backend libraries.
+The source code and local build instructions can be found [here](https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-bindings/python).


 ## Quickstart
@@ -9,29 +9,88 @@ The source code, README, and local build instructions can be found [here](https:
 pip install gpt4all
 ```

-In Python, run the following commands to retrieve a GPT4All model and generate a response
-to a prompt.
+=== "GPT4All Example"
+    ``` py
+    from gpt4all import GPT4All
+    model = GPT4All("orca-mini-3b.ggmlv3.q4_0.bin")
+    output = model.generate("The capital of France is ", max_tokens=3)
+    print(output)
+    ```
+=== "Output"
+    ```
+    1. Paris
+    ```

-**Download Note:**
-By default, models are stored in `~/.cache/gpt4all/` (you can change this with `model_path`). If the file already exists, model download will be skipped.
+### Chatting with GPT4All
+Local LLMs can be optimized for chat conversions by reusing previous computational history.

-```python
-import gpt4all
-gptj = gpt4all.GPT4All("ggml-gpt4all-j-v1.3-groovy")
-messages = [{"role": "user", "content": "Name 3 colors"}]
-gptj.chat_completion(messages)
-```
+Use the GPT4All `chat_session` context manager to hold chat conversations with the model.

-## Give it a try!
-[Google Colab Tutorial](https://colab.research.google.com/drive/1QRFHV5lj1Kb7_tGZZGZ-E6BfX6izpeMI?usp=sharing)
+=== "GPT4All Example"
+    ``` py
+    model = GPT4All(model_name='orca-mini-3b.ggmlv3.q4_0.bin')
+    with model.chat_session():
+        response = model.generate(prompt='hello', top_k=1)
+        response = model.generate(prompt='write me a short poem', top_k=1)
+        response = model.generate(prompt='thank you', top_k=1)
+        print(model.current_chat_session)
+    ```
+=== "Output"
+    ``` json
+    [
+       {
+          'role': 'user',
+          'content': 'hello'
+       },
+       {
+          'role': 'assistant',
+          'content': 'What is your name?'
+       },
+       {
+          'role': 'user',
+          'content': 'write me a short poem'
+       },
+       {
+          'role': 'assistant',
+          'content': "I would love to help you with that! Here's a short poem I came up with:\nBeneath the autumn leaves,\nThe wind whispers through the trees.\nA gentle breeze, so at ease,\nAs if it were born to play.\nAnd as the sun sets in the sky,\nThe world around us grows still."
+       },
+       {
+          'role': 'user',
+          'content': 'thank you'
+       },
+       {
+          'role': 'assistant',
+          'content': "You're welcome! I hope this poem was helpful or inspiring for you. Let me know if there is anything else I can assist you with."
+       }
+    ]
+    ```

-## Supported Models
-Python bindings support the following ggml architectures: `gptj`, `llama`, `mpt`. See API reference for more details.
+When using GPT4All models in the chat_session context:

-## Best Practices
+- The model is given a prompt template which makes it chatty.
+- Internal K/V caches are preserved from previous conversation history speeding up inference.

-There are two methods to interface with the underlying language model, `chat_completion()` and `generate()`. Chat completion formats a user-provided message dictionary into a prompt template (see API documentation for more details and options). This will usually produce much better results and is the approach we recommend. You may also prompt the model with `generate()` which will just pass the raw input string to the model. 

-## API Reference
+### Generation Parameters
+
+::: gpt4all.gpt4all.GPT4All.generate
+
+
+### Streaming Generations
+To interact with GPT4All responses as the model generates, use the `streaming = True` flag during generation.
+
+=== "GPT4All Example"
+    ``` py
+    from gpt4all import GPT4All
+    model = GPT4All("orca-mini-3b.ggmlv3.q4_0.bin")
+    tokens = []
+    for token in model.generate("The capital of France is", max_tokens=20, streaming=True):
+        tokens.append(token)
+    print(tokens)
+    ```
+=== "Output"
+    ```
+    [' Paris', ' is', ' a', ' city', ' that', ' has', ' been', ' a', ' major', ' cultural', ' and', ' economic', ' center', ' for', ' over', ' ', '2', ',', '0', '0']
+    ```

 ::: gpt4all.gpt4all.GPT4All