mirror of
https://github.com/nomic-ai/gpt4all.git
synced 2025-09-05 10:30:29 +00:00
V3 docs max (#2488)
* new skeleton Signed-off-by: Max Cembalest <max@nomic.ai> * v3 docs Signed-off-by: Max Cembalest <max@nomic.ai> --------- Signed-off-by: Max Cembalest <max@nomic.ai>
This commit is contained in:
100
gpt4all-bindings/python/docs/old/gpt4all_faq.md
Normal file
100
gpt4all-bindings/python/docs/old/gpt4all_faq.md
Normal file
@@ -0,0 +1,100 @@
|
||||
# GPT4All FAQ
|
||||
|
||||
## What models are supported by the GPT4All ecosystem?
|
||||
|
||||
Currently, there are six different model architectures that are supported:
|
||||
|
||||
1. GPT-J - Based off of the GPT-J architecture with examples found [here](https://huggingface.co/EleutherAI/gpt-j-6b)
|
||||
2. LLaMA - Based off of the LLaMA architecture with examples found [here](https://huggingface.co/models?sort=downloads&search=llama)
|
||||
3. MPT - Based off of Mosaic ML's MPT architecture with examples found [here](https://huggingface.co/mosaicml/mpt-7b)
|
||||
4. Replit - Based off of Replit Inc.'s Replit architecture with examples found [here](https://huggingface.co/replit/replit-code-v1-3b)
|
||||
5. Falcon - Based off of TII's Falcon architecture with examples found [here](https://huggingface.co/tiiuae/falcon-40b)
|
||||
6. StarCoder - Based off of BigCode's StarCoder architecture with examples found [here](https://huggingface.co/bigcode/starcoder)
|
||||
|
||||
## Why so many different architectures? What differentiates them?
|
||||
|
||||
One of the major differences is license. Currently, the LLaMA based models are subject to a non-commercial license, whereas the GPTJ and MPT base
|
||||
models allow commercial usage. However, its successor [Llama 2 is commercially licensable](https://ai.meta.com/llama/license/), too. In the early
|
||||
advent of the recent explosion of activity in open source local models, the LLaMA models have generally been seen as performing better, but that is
|
||||
changing quickly. Every week - even every day! - new models are released with some of the GPTJ and MPT models competitive in performance/quality with
|
||||
LLaMA. What's more, there are some very nice architectural innovations with the MPT models that could lead to new performance/quality gains.
|
||||
|
||||
## How does GPT4All make these models available for CPU inference?
|
||||
|
||||
By leveraging the ggml library written by Georgi Gerganov and a growing community of developers. There are currently multiple different versions of
|
||||
this library. The original GitHub repo can be found [here](https://github.com/ggerganov/ggml), but the developer of the library has also created a
|
||||
LLaMA based version [here](https://github.com/ggerganov/llama.cpp). Currently, this backend is using the latter as a submodule.
|
||||
|
||||
## Does that mean GPT4All is compatible with all llama.cpp models and vice versa?
|
||||
|
||||
Yes!
|
||||
|
||||
The upstream [llama.cpp](https://github.com/ggerganov/llama.cpp) project has introduced several [compatibility breaking] quantization methods recently.
|
||||
This is a breaking change that renders all previous models (including the ones that GPT4All uses) inoperative with newer versions of llama.cpp since
|
||||
that change.
|
||||
|
||||
Fortunately, we have engineered a submoduling system allowing us to dynamically load different versions of the underlying library so that
|
||||
GPT4All just works.
|
||||
|
||||
[compatibility breaking]: https://github.com/ggerganov/llama.cpp/commit/b9fd7eee57df101d4a3e3eabc9fd6c2cb13c9ca1
|
||||
|
||||
## What are the system requirements?
|
||||
|
||||
Your CPU needs to support [AVX or AVX2 instructions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) and you need enough RAM to load a model into memory.
|
||||
|
||||
## What about GPU inference?
|
||||
|
||||
In newer versions of llama.cpp, there has been some added support for NVIDIA GPU's for inference. We're investigating how to incorporate this into our downloadable installers.
|
||||
|
||||
## Ok, so bottom line... how do I make my model on Hugging Face compatible with GPT4All ecosystem right now?
|
||||
|
||||
1. Check to make sure the Hugging Face model is available in one of our three supported architectures
|
||||
2. If it is, then you can use the conversion script inside of our pinned llama.cpp submodule for GPTJ and LLaMA based models
|
||||
3. Or if your model is an MPT model you can use the conversion script located directly in this backend directory under the scripts subdirectory
|
||||
|
||||
## Language Bindings
|
||||
|
||||
#### There's a problem with the download
|
||||
|
||||
Some bindings can download a model, if allowed to do so. For example, in Python or TypeScript if `allow_download=True`
|
||||
or `allowDownload=true` (default), a model is automatically downloaded into `.cache/gpt4all/` in the user's home folder,
|
||||
unless it already exists.
|
||||
|
||||
In case of connection issues or errors during the download, you might want to manually verify the model file's MD5
|
||||
checksum by comparing it with the one listed in [models3.json].
|
||||
|
||||
As an alternative to the basic downloader built into the bindings, you can choose to download from the
|
||||
<https://gpt4all.io/> website instead. Scroll down to 'Model Explorer' and pick your preferred model.
|
||||
|
||||
[models3.json]: https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models3.json
|
||||
|
||||
#### I need the chat GUI and bindings to behave the same
|
||||
|
||||
The chat GUI and bindings are based on the same backend. You can make them behave the same way by following these steps:
|
||||
|
||||
- First of all, ensure that all parameters in the chat GUI settings match those passed to the generating API, e.g.:
|
||||
|
||||
=== "Python"
|
||||
``` py
|
||||
from gpt4all import GPT4All
|
||||
model = GPT4All(...)
|
||||
model.generate("prompt text", temp=0, ...) # adjust parameters
|
||||
```
|
||||
=== "TypeScript"
|
||||
``` ts
|
||||
import { createCompletion, loadModel } from '../src/gpt4all.js'
|
||||
const ll = await loadModel(...);
|
||||
const messages = ...
|
||||
const re = await createCompletion(ll, messages, { temp: 0, ... }); // adjust parameters
|
||||
```
|
||||
|
||||
- To make comparing the output easier, set _Temperature_ in both to 0 for now. This will make the output deterministic.
|
||||
|
||||
- Next you'll have to compare the templates, adjusting them as necessary, based on how you're using the bindings.
|
||||
- Specifically, in Python:
|
||||
- With simple `generate()` calls, the input has to be surrounded with system and prompt templates.
|
||||
- When using a chat session, it depends on whether the bindings are allowed to download [models3.json]. If yes,
|
||||
and in the chat GUI the default templates are used, it'll be handled automatically. If no, use
|
||||
`chat_session()` template parameters to customize them.
|
||||
|
||||
- Once you're done, remember to reset _Temperature_ to its previous value in both chat GUI and your custom code.
|
Reference in New Issue
Block a user