Embed4All: optionally count tokens, misc fixes (#2145)

Key changes: * python: optionally return token count in Embed4All.embed * python and docs: models2.json -> models3.json * Embed4All: require explicit prefix for unknown models * llamamodel: fix shouldAddBOS for Bert and Nomic Bert Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2025-10-30 05:15:56 +00:00 · 2024-03-20 11:24:02 -04:00
parent 271e6a529c
commit 0455b80b7f
11 changed files with 105 additions and 52 deletions
--- a/gpt4all-bindings/python/docs/gpt4all_chat.md
+++ b/gpt4all-bindings/python/docs/gpt4all_chat.md
@@ -7,7 +7,7 @@ It is optimized to run 7-13B parameter LLMs on the CPU's of any computer running
 ## Running LLMs on CPU
 The GPT4All Chat UI supports models from all newer versions of `llama.cpp` with `GGUF` models including the `Mistral`, `LLaMA2`, `LLaMA`, `OpenLLaMa`, `Falcon`, `MPT`, `Replit`, `Starcoder`, and `Bert` architectures

-GPT4All maintains an official list of recommended models located in [models2.json](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models2.json). You can pull request new models to it and if accepted they will show up in the official download dialog.
+GPT4All maintains an official list of recommended models located in [models3.json](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models3.json). You can pull request new models to it and if accepted they will show up in the official download dialog.

 #### Sideloading any GGUF model
 If a model is compatible with the gpt4all-backend, you can sideload it into GPT4All Chat by:
--- a/gpt4all-bindings/python/docs/gpt4all_faq.md
+++ b/gpt4all-bindings/python/docs/gpt4all_faq.md
@@ -61,12 +61,12 @@ or `allowDownload=true` (default), a model is automatically downloaded into `.ca
 unless it already exists.

 In case of connection issues or errors during the download, you might want to manually verify the model file's MD5
-checksum by comparing it with the one listed in [models2.json].
+checksum by comparing it with the one listed in [models3.json].

 As an alternative to the basic downloader built into the bindings, you can choose to download from the 
 <https://gpt4all.io/> website instead. Scroll down to 'Model Explorer' and pick your preferred model.

-[models2.json]: https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models2.json
+[models3.json]: https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models3.json

 #### I need the chat GUI and bindings to behave the same

@@ -93,7 +93,7 @@ The chat GUI and bindings are based on the same backend. You can make them behav
 - Next you'll have to compare the templates, adjusting them as necessary, based on how you're using the bindings.
    - Specifically, in Python:
        - With simple `generate()` calls, the input has to be surrounded with system and prompt templates.
-        - When using a chat session, it depends on whether the bindings are allowed to download [models2.json]. If yes,
+        - When using a chat session, it depends on whether the bindings are allowed to download [models3.json]. If yes,
          and in the chat GUI the default templates are used, it'll be handled automatically. If no, use
          `chat_session()` template parameters to customize them.

--- a/gpt4all-bindings/python/docs/index.md
+++ b/gpt4all-bindings/python/docs/index.md
@@ -38,7 +38,7 @@ The GPT4All software ecosystem is compatible with the following Transformer arch
 - `MPT` (including `Replit`)
 - `GPT-J`

-You can find an exhaustive list of supported models on the [website](https://gpt4all.io) or in the [models directory](https://raw.githubusercontent.com/nomic-ai/gpt4all/main/gpt4all-chat/metadata/models2.json)
+You can find an exhaustive list of supported models on the [website](https://gpt4all.io) or in the [models directory](https://raw.githubusercontent.com/nomic-ai/gpt4all/main/gpt4all-chat/metadata/models3.json)


 GPT4All models are artifacts produced through a process known as neural network quantization.