Commit Graph

136 Commits

Author SHA1 Message Date
Adam Treat
d948a4f2ee Complete revamp of model loading to allow for more discreet control by
the user of the models loading behavior.

Signed-off-by: Adam Treat <treat.adam@gmail.com>
2024-02-21 10:15:20 -06:00
Adam Treat
4461af35c7 Fix includes.
Signed-off-by: Adam Treat <treat.adam@gmail.com>
2024-02-05 16:46:16 -05:00
Jared Van Bortel
10e3f7bbf5
Fix VRAM leak when model loading fails (#1901)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-01 15:45:45 -05:00
Adam Treat
d14b95f4bd Add Nomic Embed model for atlas with localdocs. 2024-01-31 22:22:08 -05:00
Jared Van Bortel
061d1969f8
expose n_gpu_layers parameter of llama.cpp (#1890)
Also dynamically limit the GPU layers and context length fields to the maximum supported by the model.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 14:17:44 -05:00
Jared Van Bortel
c7ea283f1f
chatllm: fix deserialization version mismatch (#1859)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-22 10:01:31 -05:00
Jared Van Bortel
d1c56b8b28
Implement configurable context length (#1749) 2023-12-16 17:58:15 -05:00
Jared Van Bortel
0600f551b3
chatllm: do not attempt to serialize incompatible state (#1742) 2023-12-12 11:45:03 -05:00
Adam Treat
fb3b1ceba2 Do not attempt to do a blocking retrieval if we don't have any collections. 2023-12-04 12:58:40 -05:00
Moritz Tim W
012f399639
fix typo (#1697) 2023-11-30 12:37:52 -05:00
Adam Treat
9e27a118ed Fix system prompt. 2023-11-21 10:42:12 -05:00
Adam Treat
5c0d077f74 Remove leading whitespace in responses. 2023-10-28 16:53:42 -04:00
Adam Treat
dc2e7d6e9b Don't start recalculating context immediately upon switching to a new chat
but rather wait until the first prompt. This allows users to switch between
chats fast and to delete chats more easily.

Fixes issue #1545
2023-10-28 16:41:23 -04:00
cebtenzzre
4338e72a51
MPT: use upstream llama.cpp implementation (#1515) 2023-10-19 15:25:17 -04:00
cebtenzzre
04499d1c7d
chatllm: do not write uninitialized data to stream (#1486) 2023-10-11 11:31:34 -04:00
Adam Treat
f0742c22f4 Restore state from text if necessary. 2023-10-11 09:16:02 -04:00
Adam Treat
b2cd3bdb3f Fix crasher with an empty string for prompt template. 2023-10-06 12:44:53 -04:00
Cebtenzzre
5fe685427a chat: clearer CPU fallback messages 2023-10-06 11:35:14 -04:00
Cebtenzzre
1534df3e9f backend: do not use Vulkan with non-LLaMA models 2023-10-05 18:16:19 -04:00
Cebtenzzre
672cb850f9 differentiate between init failure and unsupported models 2023-10-05 18:16:19 -04:00
Cebtenzzre
a5b93cf095 more accurate fallback descriptions 2023-10-05 18:16:19 -04:00
Cebtenzzre
75deee9adb chat: make sure to clear fallback reason on success 2023-10-05 18:16:19 -04:00
Cebtenzzre
2eb83b9f2a chat: report reason for fallback to CPU 2023-10-05 18:16:19 -04:00
Adam Treat
12f943e966 Fix regenerate button to be deterministic and bump the llama version to latest we have for gguf. 2023-10-05 18:16:19 -04:00
Cebtenzzre
a49a1dcdf4 chatllm: grammar fix 2023-10-05 18:16:19 -04:00
Cebtenzzre
8f3abb37ca fix references to removed model types 2023-10-05 18:16:19 -04:00
Adam Treat
d90d003a1d Latest rebase on llama.cpp with gguf support. 2023-10-05 18:16:19 -04:00
Adam Treat
045f6e6cdc Link against ggml in bin so we can get the available devices without loading a model. 2023-09-15 14:45:25 -04:00
Adam Treat
aa33419c6e Fallback to CPU more robustly. 2023-09-14 16:53:11 -04:00
Adam Treat
3076e0bf26 Only show GPU when we're actually using it. 2023-09-14 09:59:19 -04:00
Adam Treat
1fa67a585c Report the actual device we're using. 2023-09-14 08:25:37 -04:00
Adam Treat
21a3244645 Fix a bug where we're not properly falling back to CPU. 2023-09-13 19:30:27 -04:00
Aaron Miller
6f038c136b init at most one vulkan device, submodule update
fixes issues w/ multiple of the same gpu
2023-09-13 12:49:53 -07:00
Adam Treat
891ddafc33 When device is Auto (the default) then we will only consider discrete GPU's otherwise fallback to CPU. 2023-09-13 11:59:36 -04:00
Adam Treat
8f99dca70f Bring the vulkan backend to the GUI. 2023-09-13 11:26:10 -04:00
Adam Treat
987546c63b Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0. 2023-08-31 15:29:54 -04:00
Adam Treat
6d03b3e500 Add starcoder support. 2023-07-27 09:15:16 -04:00
Adam Treat
0efdbfcffe Bert 2023-07-13 14:21:46 -04:00
Adam Treat
315a1f2aa2 Move it back as internal class. 2023-07-13 14:21:46 -04:00
Adam Treat
1f749d7633 Clean up backend code a bit and hide impl. details. 2023-07-13 14:21:46 -04:00
Adam Treat
8eb0844277 Check if the trimmed version is empty. 2023-07-12 14:31:43 -04:00
Adam Treat
be395c12cc Make all system prompts empty by default if model does not include in training data. 2023-07-12 14:31:43 -04:00
Adam Treat
34a3b9c857 Don't block on exit when not connected. 2023-07-11 12:37:21 -04:00
Adam Treat
88bbe30952 Provide a guardrail for OOM errors. 2023-07-11 12:09:33 -04:00
Adam Treat
99cd555743 Provide some guardrails for thread count. 2023-07-10 17:29:51 -04:00
Adam Treat
3e3b05a2a4 Don't process the system prompt when restoring state. 2023-07-10 16:20:19 -04:00
Adam Treat
12083fcdeb When deleting chats we sometimes have to update our modelinfo. 2023-07-09 14:52:08 -04:00
Adam Treat
59f3c093cb Stop generating anything on shutdown. 2023-07-09 14:42:11 -04:00
Adam Treat
6d9cdf228c Huge change that completely revamps the settings dialog and implements
per model settings as well as the ability to clone a model into a "character."
This also implements system prompts as well as quite a few bugfixes for
instance this fixes chatgpt.
2023-07-05 15:51:42 -04:00
Adam Treat
7f252b4970 This completes the work of consolidating all settings that can be changed by the user on new settings object. 2023-06-29 00:44:48 -03:00
Adam Treat
267601d670 Enable the force metal setting. 2023-06-27 14:23:56 -03:00
Aaron Miller
e22dd164d8 add falcon to chatllm::serialize 2023-06-27 14:06:39 -03:00
Aaron Miller
198b5e4832 add Falcon 7B model
Tested with https://huggingface.co/TheBloke/falcon-7b-instruct-GGML/blob/main/falcon7b-instruct.ggmlv3.q4_0.bin
2023-06-27 14:06:39 -03:00
Adam Treat
7f01b153b3 Modellist temp 2023-06-26 14:14:46 -04:00
Adam Treat
c8a590bc6f Get rid of last blocking operations and make the chat/llm thread safe. 2023-06-20 18:18:10 -03:00
Adam Treat
84ec4311e9 Remove duplicated state tracking for chatgpt. 2023-06-20 18:18:10 -03:00
Adam Treat
7d2ce06029 Start working on more thread safety and model load error handling. 2023-06-20 14:39:22 -03:00
Adam Treat
aa2c824258 Initialize these. 2023-06-19 15:38:01 -07:00
Adam Treat
a3a6a20146 Don't store db results in ChatLLM. 2023-06-19 15:38:01 -07:00
Adam Treat
0cfe225506 Remove this as unnecessary. 2023-06-19 15:38:01 -07:00
AT
2b6cc99a31
Show token generation speed in gui. (#1020) 2023-06-19 14:34:53 -04:00
AT
a576220b18
Support loading files if 'ggml' is found anywhere in the name not just at (#1001)
the beginning and add deprecated flag to models.json so older versions will
show a model, but later versions don't. This will allow us to transition
away from models < ggmlv2 and still allow older installs of gpt4all to work.
2023-06-16 11:09:33 -04:00
Richard Guo
c4706d0c14
Replit Model (#713)
* porting over replit code model to gpt4all

* replaced memory with kv_self struct

* continuing debug

* welp it built but lot of sus things

* working model loading and somewhat working generate.. need to format response?

* revert back to semi working version

* finally got rid of weird formatting

* figured out problem is with python bindings - this is good to go for testing

* addressing PR feedback

* output refactor

* fixed prompt reponse collection

* cleanup

* addressing PR comments

* building replit backend with new ggmlver code

* chatllm replit and clean python files

* cleanup

* updated replit to match new llmodel api

* match llmodel api and change size_t to Token

* resolve PR comments

* replit model commit comment
2023-06-06 17:09:00 -04:00
Andriy Mulyar
d8e821134e
Revert "Fix bug with resetting context with chatgpt model." (#859)
This reverts commit 031d7149a7.
2023-06-05 14:25:37 -04:00
Adam Treat
9f590db98d Better error handling when the model fails to load. 2023-06-04 14:55:05 -04:00
niansa/tuxifan
f3564ac6b9
Fixed tons of warnings and clazy findings (#811) 2023-06-02 15:46:41 -04:00
Adam Treat
031d7149a7 Fix bug with resetting context with chatgpt model. 2023-06-01 17:34:13 -04:00
Adam Treat
aea94f756d Better name for database results. 2023-06-01 17:14:17 -04:00
Adam Treat
f62e439a2d Make localdocs work with server mode. 2023-06-01 17:14:17 -04:00
Adam Treat
f74363bb3a Fix compile 2023-06-01 10:58:31 -04:00
niansa
a3d08cdcd5 Dlopen better implementation management (Version 2) 2023-06-01 07:44:15 -04:00
AT
48275d0dcc
Dlopen backend 5 (#779)
Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved.
2023-05-31 17:04:01 -04:00
Adam Treat
912cb2a842 Get rid of blocking behavior for regenerate response. 2023-05-30 18:17:59 -04:00
Adam Treat
c800291e7f Add prompt processing and localdocs to the busy indicator in UI. 2023-05-25 11:28:06 -04:00
Adam Treat
618895f0a1 Turn off the debugging messages by default. 2023-05-25 11:28:06 -04:00
Adam Treat
7e42af5f33 localdocs 2023-05-25 11:28:06 -04:00
Adam Treat
748e7977ca Generate the new prompt/response pair before model loading in server mode. 2023-05-16 10:31:55 -04:00
Adam Treat
f931de21c5 Add save/restore to chatgpt chats and allow serialize/deseralize from disk. 2023-05-16 10:31:55 -04:00
Adam Treat
0cd509d530 Add large network icon background for chatgpt and server modes. 2023-05-16 10:31:55 -04:00
Adam Treat
dd27c10f54 Preliminary support for chatgpt models. 2023-05-16 10:31:55 -04:00
Adam Treat
b71c0ac3bd The server has different lifetime mgmt than the other chats. 2023-05-13 19:34:54 -04:00
Adam Treat
ddc24acf33 Much better memory mgmt for multi-threaded model loading/unloading. 2023-05-13 19:10:56 -04:00
Adam Treat
2989b74d43 httpserver 2023-05-13 19:07:06 -04:00
Adam Treat
76675536b0 Cleanup the chatllm properly. 2023-05-12 17:11:52 -04:00
Adam Treat
d918b02c29 Move the llmodel C API to new top-level directory and version it. 2023-05-10 11:46:40 -04:00
Adam Treat
6015154bef Moving everything to subdir for monorepo merge. 2023-05-10 10:26:55 -04:00