* Adds support for GPT-NeoX, Gemma 2, OpenELM, ChatGLM, and Jais architectures (all with Kompute support)
* Also enables Kompute support for StarCoder2, XVERSE, Command R, and OLMo
* Includes a number of Kompute resource management fixes
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
This fixes a regression in commit 4fc4d94b ("fix chat-style prompt
templates (#1970)"), which moved some return statements into a new
function (LLModel::decodePrompt) without making them return from the
parent as well.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* chat: remove unused oscompat source files
These files are no longer needed now that the hnswlib index is gone.
This fixes an issue with the Windows build as there was a compilation
error in oscompat.cpp.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llm: fix pragma to be recognized by MSVC
Replaces this MSVC warning:
C:\msys64\home\Jared\gpt4all\gpt4all-chat\llm.cpp(53,21): warning C4081: expected '('; found 'string'
With this:
C:\msys64\home\Jared\gpt4all\gpt4all-chat\llm.cpp : warning : offline installer build will not check for updates!
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* usearch: fork usearch to fix `CreateFile` build error
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* dlhandle: fix incorrect assertion on Windows
SetErrorMode returns the previous value of the error mode flags, not an
indicator of success.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llamamodel: fix UB in LLamaModel::embedInternal
It is undefined behavior to increment an STL iterator past the end of
the container. Use offsets to do the math instead.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* cmake: install embedding model to bundle's Resources dir on macOS
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* ci: fix macOS build by explicitly installing Rosetta
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
---------
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
As discussed on Discord, this PR was not ready to be merged. CI fails on
it.
This reverts commit a602f7fde7.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* remove outdated comments
Signed-off-by: limez <limez@protonmail.com>
* simpler build from source
Signed-off-by: limez <limez@protonmail.com>
* update unix build script to create .so runtimes correctly
Signed-off-by: limez <limez@protonmail.com>
* configure ci build type, use RelWithDebInfo for dev build script
Signed-off-by: limez <limez@protonmail.com>
* add clean script
Signed-off-by: limez <limez@protonmail.com>
* fix streamed token decoding / emoji
Signed-off-by: limez <limez@protonmail.com>
* remove deprecated nCtx
Signed-off-by: limez <limez@protonmail.com>
* update typings
Signed-off-by: jacob <jacoobes@sern.dev>
update typings
Signed-off-by: jacob <jacoobes@sern.dev>
* readme,mspell
Signed-off-by: jacob <jacoobes@sern.dev>
* cuda/backend logic changes + name napi methods like their js counterparts
Signed-off-by: limez <limez@protonmail.com>
* convert llmodel example into a test, separate test suite that can run in ci
Signed-off-by: limez <limez@protonmail.com>
* update examples / naming
Signed-off-by: limez <limez@protonmail.com>
* update deps, remove the need for binding.ci.gyp, make node-gyp-build fallback easier testable
Signed-off-by: limez <limez@protonmail.com>
* make sure the assert-backend-sources.js script is published, but not the others
Signed-off-by: limez <limez@protonmail.com>
* build correctly on windows (regression on node-gyp-build)
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
* codespell
Signed-off-by: limez <limez@protonmail.com>
* make sure dlhandle.cpp gets linked correctly
Signed-off-by: limez <limez@protonmail.com>
* add include for check_cxx_compiler_flag call during aarch64 builds
Signed-off-by: limez <limez@protonmail.com>
* x86 > arm64 cross compilation of runtimes and bindings
Signed-off-by: limez <limez@protonmail.com>
* default to cpu instead of kompute on arm64
Signed-off-by: limez <limez@protonmail.com>
* formatting, more minimal example
Signed-off-by: limez <limez@protonmail.com>
---------
Signed-off-by: limez <limez@protonmail.com>
Signed-off-by: jacob <jacoobes@sern.dev>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
Co-authored-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
Co-authored-by: jacob <jacoobes@sern.dev>
* backend: refactor dlhandle.h into oscompat.{cpp,h}
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llmodel: alias std::filesystem
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llmodel: use wide strings for paths on Windows
Using the native path representation allows us to manipulate paths and
call LoadLibraryEx without mangling non-ASCII characters.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llmodel: prefer built-in std::filesystem functionality
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* oscompat: fix string type error
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* backend: rename oscompat back to dlhandle
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* dlhandle: fix #includes
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* dlhandle: remove another #include
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* dlhandle: move dlhandle #include
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* dlhandle: remove #includes that are covered by dlhandle.h
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llmodel: fix #include order
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
---------
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* rebase onto llama.cpp commit ggerganov/llama.cpp@d46dbc76f
* support for CUDA backend (enabled by default)
* partial support for Occam's Vulkan backend (disabled by default)
* partial support for HIP/ROCm backend (disabled by default)
* sync llama.cpp.cmake with upstream llama.cpp CMakeLists.txt
* changes to GPT4All backend, bindings, and chat UI to handle choice of llama.cpp backend (Kompute or CUDA)
* ship CUDA runtime with installed version
* make device selection in the UI on macOS actually do something
* model whitelist: remove dbrx, mamba, persimmon, plamo; add internlm and starcoder2
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Other changes:
- Always display first start dialog if privacy options are unset (e.g. if the user closed GPT4All without selecting them)
- LocalDocs scanQueue is now always deferred
- Fix a potential crash in magic_match
- LocalDocs indexing is now started after the first start dialog is dismissed so usage stats are included
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llamamodel: only print device used in verbose mode
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* python: expose backend and device via GPT4All properties
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* backend: const correctness fixes
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* python: bump version
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* python: typing fixups
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* python: fix segfault with closed GPT4All
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
---------
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* actually submit larger batches with increased n_ctx
* fix crash when llama_tokenize returns no tokens
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Other changes:
* fix memory leak in llmodel_available_gpu_devices
* drop model argument from llmodel_available_gpu_devices
* breaking: make GPT4All/Embed4All arguments past model_name keyword-only
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* chat: fix non-AVX CPU detection on Windows
* bindings: throw exception instead of logging to console
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Key changes:
* honor empty system prompt argument
* current_chat_session is now read-only and defaults to None
* deprecate fallback prompt template for unknown models
* fix mistakes from #2086
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
These are Baichuan, Bert and Nomic Bert, CodeShell, GPT-2, InternLM,
MiniCPM, Orion, Qwen, and StarCoder.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Also dynamically limit the GPU layers and context length fields to the maximum supported by the model.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
this fixes some issues that were being seen on installed windows builds of 2.5.0
only load dlls that actually might be model impl dlls, otherwise we pull all sorts of random junk into the process before it might expect to be
Signed-off-by: Aaron Miller <apage43@ninjawhale.com>
so far my from-scratch mat*mats are still slower than just running more
invocations of the existing Metal ported mat*vec shaders - it should be
theoretically possible to make a mat*mat that's faster (for actual
mat*mat cases) than an optimal mat*vec, but it will need to be at
*least* as fast as the mat*vec op and then take special care to be
cache-friendly and save memory bandwidth, as the # of compute ops is the
same