* disable llama.cpp logging unless GPT4ALL_VERBOSE_LLAMACPP envvar is
nonempty
* make verbose flag for retrieve_model default false (but also be
overridable via gpt4all constructor)
should be able to run a basic test:
```python
import gpt4all
model = gpt4all.GPT4All('/Users/aaron/Downloads/rift-coder-v0-7b-q4_0.gguf')
print(model.generate('def fib(n):'))
```
and see no non-model output when successful
- custom callbacks & session improvements PR (v1.0.6) had one too many checks
- remove the problematic config['url'] check
- add a crude test
- fixes#1261
* Added the following features: \n 1) Now prompt_model uses the positional argument callback to return the response tokens. \n 2) Due to the callback argument of prompt_model, prompt_model_streaming only manages the queue and threading now, which reduces duplication of the code. \n 3) Added optional verbose argument to prompt_model which prints out the prompt that is passed to the model. \n 4) Chat sessions can now have a header, i.e. an instruction before the transcript of the conversation. The header is set at the creation of the chat session context. \n 5) generate function now accepts an optional callback. \n 6) When streaming and using chat session, the user doesn't need to save assistant's messages by himself. This is done automatically.
* added _empty_response_callback so I don't have to check if callback is None
* added docs
* now if the callback stop generation, the last token is ignored
* fixed type hints, reimplemented chat session header as a system prompt, minor refactoring, docs: removed section about manual update of chat session for streaming
* forgot to add some type hints!
* keep the config of the model in GPT4All class which is taken from models.json if the download is allowed
* During chat sessions, the model-specific systemPrompt and promptTemplate are applied.
* implemented the changes
* Fixed typing. Now the user can set a prompt template that will be applied even outside of a chat session. The template can also have multiple placeholders that can be filled by passing a dictionary to the generate function
* reversed some changes concerning the prompt templates and their functionality
* fixed some type hints, changed list[float] to List[Float]
* fixed type hints, changed List[Float] to List[float]
* fix typo in the comment: Pepare => Prepare
---------
Signed-off-by: 385olt <385olt@gmail.com>
* Handle edge cases when generating embeddings
* Improve Python handling & add llmodel_c.h note
- In the Python bindings fail fast with a ValueError when text is empty
- Advice other bindings authors to do likewise in llmodel_c.h
* python: do not mutate locals()
* python: fix (some) typing complaints
* python: queue sentinel need not be a str
* python: make long inference tests opt in
* Makefiles, black, isort
* Black and isort
* unit tests and generation method
* chat context provider
* context does not reset
* Current state
* Fixup
* Python bindings with unit tests
* GPT4All Python Bindings: chat contexts, tests
* New python bindings and backend fixes
* Black and Isort
* Documentation error
* preserved n_predict for backwords compat with langchain
---------
Co-authored-by: Adam Treat <treat.adam@gmail.com>
most of these can just shortcut out of the model loading logic llama is a bit worse to deal with because we submodule it so I have to at least parse the hparams, and then I just use the size on disk as an estimate for the mem size (which seems reasonable since we mmap() the llama files anyway)
* generator method
* cleanup
* bump version number for clarity
* added replace in decode to avoid unicodedecode exception
* revert back to _build_prompt
* porting over replit code model to gpt4all
* replaced memory with kv_self struct
* continuing debug
* welp it built but lot of sus things
* working model loading and somewhat working generate.. need to format response?
* revert back to semi working version
* finally got rid of weird formatting
* figured out problem is with python bindings - this is good to go for testing
* addressing PR feedback
* output refactor
* fixed prompt reponse collection
* cleanup
* addressing PR comments
* building replit backend with new ggmlver code
* chatllm replit and clean python files
* cleanup
* updated replit to match new llmodel api
* match llmodel api and change size_t to Token
* resolve PR comments
* replit model commit comment