expose n_gpu_layers parameter of llama.cpp (#1890)

Also dynamically limit the GPU layers and context length fields to the maximum supported by the model.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
This commit is contained in:
Jared Van Bortel
2024-01-31 14:17:44 -05:00
committed by GitHub
parent f549d5a70a
commit 061d1969f8
31 changed files with 381 additions and 157 deletions

View File

@@ -63,5 +63,9 @@ int main(int argc, char *argv[])
}
#endif
// Make sure ChatLLM threads are joined before global destructors run.
// Otherwise, we can get a heap-use-after-free inside of llama.cpp.
ChatListModel::globalInstance()->clearChats();
return app.exec();
}