Commit Graph

318 Commits

Author SHA1 Message Date
AT
8c834a5177
Update llama.cpp to include upstream Llama 3.1 RoPE fix. (#2758)
Signed-off-by: Adam Treat <treat.adam@gmail.com>
2024-07-27 14:14:19 -04:00
Jared Van Bortel
2a7fe95ff4
llamamodel: always print special tokens (#2701)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-07-22 13:32:17 -04:00
Jared Van Bortel
4ca1d0411f
llamamodel: add DeepSeek-V2 to whitelist (#2702)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-07-22 13:32:04 -04:00
Jared Van Bortel
290c629442
backend: rebase llama.cpp submodule on latest upstream (#2694)
* Adds support for GPT-NeoX, Gemma 2, OpenELM, ChatGLM, and Jais architectures (all with Kompute support)
* Also enables Kompute support for StarCoder2, XVERSE, Command R, and OLMo
* Includes a number of Kompute resource management fixes

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-07-19 14:52:58 -04:00
AT
ca72428783
Remove support for GPT-J models. (#2676)
Signed-off-by: Adam Treat <treat.adam@gmail.com>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2024-07-17 16:07:37 -04:00
Jared Van Bortel
6cb3ddafd6
llama.cpp: update submodule for CPU fallback fix (#2640)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-07-10 17:56:19 -04:00
Jared Van Bortel
bd307abfe6
backend: fix a crash on inputs greater than n_ctx (#2498)
This fixes a regression in commit 4fc4d94b ("fix chat-style prompt
templates (#1970)"), which moved some return statements into a new
function (LLModel::decodePrompt) without making them return from the
parent as well.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-07-01 11:33:46 -04:00
Jared Van Bortel
01870b4a46
chat: fix blank device in UI and improve Mixpanel reporting (#2409)
Also remove LLModel::hasGPUDevice.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-06-26 15:26:27 -04:00
Jared Van Bortel
da1823ed7a
cmake: fix CMAKE_CUDA_ARCHITECTURES default (#2421)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-06-26 14:48:18 -04:00
Jared Van Bortel
88d85be0f9
chat: fix build on Windows and Nomic Embed path on macOS (#2467)
* chat: remove unused oscompat source files

These files are no longer needed now that the hnswlib index is gone.
This fixes an issue with the Windows build as there was a compilation
error in oscompat.cpp.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* llm: fix pragma to be recognized by MSVC

Replaces this MSVC warning:
C:\msys64\home\Jared\gpt4all\gpt4all-chat\llm.cpp(53,21): warning C4081: expected '('; found 'string'

With this:
C:\msys64\home\Jared\gpt4all\gpt4all-chat\llm.cpp : warning : offline installer build will not check for updates!

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* usearch: fork usearch to fix `CreateFile` build error

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* dlhandle: fix incorrect assertion on Windows

SetErrorMode returns the previous value of the error mode flags, not an
indicator of success.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* llamamodel: fix UB in LLamaModel::embedInternal

It is undefined behavior to increment an STL iterator past the end of
the container. Use offsets to do the math instead.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* cmake: install embedding model to bundle's Resources dir on macOS

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* ci: fix macOS build by explicitly installing Rosetta

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

---------

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-06-25 17:22:51 -04:00
AT
9273b49b62
chat: major UI redesign for v3.0.0 (#2396)
Signed-off-by: Adam Treat <treat.adam@gmail.com>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2024-06-24 18:49:23 -04:00
Jared Van Bortel
55d709862f Revert "typescript bindings maintenance (#2363)"
As discussed on Discord, this PR was not ready to be merged. CI fails on
it.

This reverts commit a602f7fde7.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-06-03 17:26:19 -04:00
Andreas Obersteiner
a602f7fde7
typescript bindings maintenance (#2363)
* remove outdated comments

Signed-off-by: limez <limez@protonmail.com>

* simpler build from source

Signed-off-by: limez <limez@protonmail.com>

* update unix build script to create .so runtimes correctly

Signed-off-by: limez <limez@protonmail.com>

* configure ci build type, use RelWithDebInfo for dev build script

Signed-off-by: limez <limez@protonmail.com>

* add clean script

Signed-off-by: limez <limez@protonmail.com>

* fix streamed token decoding / emoji

Signed-off-by: limez <limez@protonmail.com>

* remove deprecated nCtx

Signed-off-by: limez <limez@protonmail.com>

* update typings

Signed-off-by: jacob <jacoobes@sern.dev>

update typings

Signed-off-by: jacob <jacoobes@sern.dev>

* readme,mspell

Signed-off-by: jacob <jacoobes@sern.dev>

* cuda/backend logic changes + name napi methods like their js counterparts

Signed-off-by: limez <limez@protonmail.com>

* convert llmodel example into a test, separate test suite that can run in ci

Signed-off-by: limez <limez@protonmail.com>

* update examples / naming

Signed-off-by: limez <limez@protonmail.com>

* update deps, remove the need for binding.ci.gyp, make node-gyp-build fallback easier testable

Signed-off-by: limez <limez@protonmail.com>

* make sure the assert-backend-sources.js script is published, but not the others

Signed-off-by: limez <limez@protonmail.com>

* build correctly on windows (regression on node-gyp-build)

Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>

* codespell

Signed-off-by: limez <limez@protonmail.com>

* make sure dlhandle.cpp gets linked correctly

Signed-off-by: limez <limez@protonmail.com>

* add include for check_cxx_compiler_flag call during aarch64 builds

Signed-off-by: limez <limez@protonmail.com>

* x86 > arm64 cross compilation of runtimes and bindings

Signed-off-by: limez <limez@protonmail.com>

* default to cpu instead of kompute on arm64

Signed-off-by: limez <limez@protonmail.com>

* formatting, more minimal example

Signed-off-by: limez <limez@protonmail.com>

---------

Signed-off-by: limez <limez@protonmail.com>
Signed-off-by: jacob <jacoobes@sern.dev>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
Co-authored-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
Co-authored-by: jacob <jacoobes@sern.dev>
2024-06-03 11:12:55 -05:00
Jared Van Bortel
636307160e
backend: fix #includes with include-what-you-use (#2371)
Also fix a PARENT_SCOPE warning when building the backend.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-05-31 16:34:54 -04:00
Jared Van Bortel
8ba7ef4832
dlhandle: suppress DLL errors on Windows (#2389)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-05-31 16:33:40 -04:00
Jared Van Bortel
4e89a9c44f
backend: support non-ASCII characters in path to llmodel libs on Windows (#2388)
* backend: refactor dlhandle.h into oscompat.{cpp,h}

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* llmodel: alias std::filesystem

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* llmodel: use wide strings for paths on Windows

Using the native path representation allows us to manipulate paths and
call LoadLibraryEx without mangling non-ASCII characters.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* llmodel: prefer built-in std::filesystem functionality

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* oscompat: fix string type error

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* backend: rename oscompat back to dlhandle

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* dlhandle: fix #includes

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* dlhandle: remove another #include

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* dlhandle: move dlhandle #include

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* dlhandle: remove #includes that are covered by dlhandle.h

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* llmodel: fix #include order

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

---------

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-05-31 13:12:28 -04:00
Jared Van Bortel
e94177ee9a
llamamodel: fix embedding crash for >512 tokens after #2310 (#2383)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-05-29 10:51:00 -04:00
Jared Van Bortel
f047f383d0
llama.cpp: update submodule for "code" model crash workaround (#2382)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-05-29 10:50:00 -04:00
Jared Van Bortel
f1b4092ca6
llamamodel: fix BERT tokenization after llama.cpp update (#2381)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-05-28 13:11:57 -04:00
Jared Van Bortel
c779d8a32d
python: init_gpu fixes (#2368)
* python: tweak GPU init failure message
* llama.cpp: update submodule for use-after-free fix

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-05-20 18:04:11 -04:00
Jared Van Bortel
2025d2d15b
llmodel: add CUDA to the DLL search path if CUDA_PATH is set (#2357)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-05-16 17:39:49 -04:00
Jared Van Bortel
a92d266cea
cmake: fix Metal build after #2310 (#2350)
I don't understand why this is needed, but it works.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-05-15 18:12:32 -04:00
Jared Van Bortel
d2a99d9bc6
support the llama.cpp CUDA backend (#2310)
* rebase onto llama.cpp commit ggerganov/llama.cpp@d46dbc76f
* support for CUDA backend (enabled by default)
* partial support for Occam's Vulkan backend (disabled by default)
* partial support for HIP/ROCm backend (disabled by default)
* sync llama.cpp.cmake with upstream llama.cpp CMakeLists.txt
* changes to GPT4All backend, bindings, and chat UI to handle choice of llama.cpp backend (Kompute or CUDA)
* ship CUDA runtime with installed version
* make device selection in the UI on macOS actually do something
* model whitelist: remove dbrx, mamba, persimmon, plamo; add internlm and starcoder2

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-05-15 15:27:50 -04:00
Jared Van Bortel
9f9d8e636f
backend: do not crash if GGUF lacks general.architecture (#2346)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-05-15 13:57:13 -04:00
Jared Van Bortel
6d8888b267
llamamodel: free the batch in embedInternal (#2348)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-05-15 12:46:12 -04:00
Jared Van Bortel
577ebd4826
mixpanel: report cpu_supports_avx2 on startup (#2299)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-05-02 16:09:41 -04:00
Jared Van Bortel
adaecb7a72
mixpanel: improved GPU device statistics (plus GPU sort order fix) (#2297)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-05-01 16:15:48 -04:00
Jared Van Bortel
c622921894
improve mixpanel usage statistics (#2238)
Other changes:
- Always display first start dialog if privacy options are unset (e.g. if the user closed GPT4All without selecting them)
- LocalDocs scanQueue is now always deferred
- Fix a potential crash in magic_match
- LocalDocs indexing is now started after the first start dialog is dismissed so usage stats are included

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-04-25 13:16:52 -04:00
Jared Van Bortel
ba53ab5da0
python: do not print GPU name with verbose=False, expose this info via properties (#2222)
* llamamodel: only print device used in verbose mode

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* python: expose backend and device via GPT4All properties

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* backend: const correctness fixes

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* python: bump version

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* python: typing fixups

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* python: fix segfault with closed GPT4All

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

---------

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-04-18 14:52:02 -04:00
Jared Van Bortel
ac498f79ac
fix regressions in system prompt handling (#2219)
* python: fix system prompt being ignored
* fix unintended whitespace after system prompt

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-04-15 11:39:48 -04:00
Jared Van Bortel
3f8257c563
llamamodel: fix semantic typo in nomic client dynamic mode (#2216)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-04-12 17:25:15 -04:00
Jared Van Bortel
46818e466e
python: embedding cancel callback for nomic client dynamic mode (#2214)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-04-12 16:00:39 -04:00
Jared Van Bortel
459289b94c
embed4all: small fixes related to nomic client local embeddings (#2213)
* actually submit larger batches with increased n_ctx
* fix crash when llama_tokenize returns no tokens

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-04-12 10:54:15 -04:00
Jared Van Bortel
1b84a48c47
python: add list_gpus to the GPT4All API (#2194)
Other changes:
* fix memory leak in llmodel_available_gpu_devices
* drop model argument from llmodel_available_gpu_devices
* breaking: make GPT4All/Embed4All arguments past model_name keyword-only

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-04-04 14:52:13 -04:00
Jared Van Bortel
67843edc7c
backend: update llama.cpp submodule for wpm locale fix (#2163)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-03-26 11:04:22 -04:00
Jared Van Bortel
83ada4ca89
backend: update llama.cpp submodule for Unicode paths fix (#2162)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-03-26 11:01:02 -04:00
Jared Van Bortel
0455b80b7f
Embed4All: optionally count tokens, misc fixes (#2145)
Key changes:
* python: optionally return token count in Embed4All.embed
* python and docs: models2.json -> models3.json
* Embed4All: require explicit prefix for unknown models
* llamamodel: fix shouldAddBOS for Bert and Nomic Bert

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-03-20 11:24:02 -04:00
Jared Van Bortel
a1bb6084ed
python: documentation update and typing improvements (#2129)
Key changes:
* revert "python: tweak constructor docstrings"
* docs: update python GPT4All and Embed4All documentation
* breaking: require keyword args to GPT4All.generate

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-03-19 17:25:22 -04:00
Jared Van Bortel
699410014a
fix non-AVX CPU detection (#2141)
* chat: fix non-AVX CPU detection on Windows
* bindings: throw exception instead of logging to console

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-03-19 10:56:14 -04:00
Jared Van Bortel
255568fb9a
python: various fixes for GPT4All and Embed4All (#2130)
Key changes:
* honor empty system prompt argument
* current_chat_session is now read-only and defaults to None
* deprecate fallback prompt template for unknown models
* fix mistakes from #2086

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-03-15 11:49:58 -04:00
Jared Van Bortel
53f109f519
llamamodel: fix macOS build (#2125)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-03-14 12:06:07 -04:00
Jared Van Bortel
406e88b59a
implement local Nomic Embed via llama.cpp (#2086)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-03-13 18:09:24 -04:00
Jared Van Bortel
5c248dbec9
models: new MPT model file without duplicated token_embd.weight (#2006)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-03-08 17:18:38 -05:00
Jared Van Bortel
c19b763e03
llmodel_c: expose fakeReply to the bindings (#2061)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-03-06 13:32:24 -05:00
Jared Van Bortel
f500bcf6e5
llmodel: default to a blank line between reply and next prompt (#1996)
Also make some related adjustments to the provided Alpaca-style prompt templates
and system prompts.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-26 13:11:15 -05:00
Jared Van Bortel
007d469034
bert: fix layer norm epsilon value (#1946)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-26 13:09:01 -05:00
Adam Treat
f720261d46 Fix another vulnerable spot for crashes.
Signed-off-by: Adam Treat <treat.adam@gmail.com>
2024-02-26 12:04:16 -06:00
chrisbarrera
f8b1069a1c
add min_p sampling parameter (#2014)
Signed-off-by: Christopher Barrera <cb@arda.tx.rr.com>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-02-24 17:51:34 -05:00
Jared Van Bortel
e7f2ff189f fix some compilation warnings on macOS
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-22 15:09:06 -05:00
Jared Van Bortel
88e330ef0e
llama.cpp: enable Kompute support for 10 more model arches (#2005)
These are Baichuan, Bert and Nomic Bert, CodeShell, GPT-2, InternLM,
MiniCPM, Orion, Qwen, and StarCoder.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-22 14:34:42 -05:00
Jared Van Bortel
fc6c5ea0c7
llama.cpp: gemma: allow offloading the output tensor (#1997)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-22 14:06:18 -05:00
Jared Van Bortel
4fc4d94be4
fix chat-style prompt templates (#1970)
Also use a new version of Mistral OpenOrca.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-21 15:45:32 -05:00
Jared Van Bortel
7810b757c9 llamamodel: add gemma model support
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-21 13:36:31 -06:00
Adam Treat
d948a4f2ee Complete revamp of model loading to allow for more discreet control by
the user of the models loading behavior.

Signed-off-by: Adam Treat <treat.adam@gmail.com>
2024-02-21 10:15:20 -06:00
Jared Van Bortel
6fdec808b2 backend: update llama.cpp for faster state serialization
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-13 17:39:18 -05:00
Jared Van Bortel
a1471becf3 backend: update llama.cpp for Intel GPU blacklist
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-12 13:16:24 -05:00
Jared Van Bortel
eb1081d37e cmake: fix LLAMA_DIR use before set
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-09 22:00:14 -05:00
Jared Van Bortel
e60b388a2e cmake: fix backwards LLAMA_KOMPUTE default
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-09 21:53:32 -05:00
Jared Van Bortel
fc7e5f4a09
ci: fix missing Kompute support in python bindings (#1953)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-09 21:40:32 -05:00
Jared Van Bortel
bf493bb048
Mixtral crash fix and python bindings v2.2.0 (#1931)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-06 11:01:15 -05:00
Jared Van Bortel
92c025a7f6
llamamodel: add 12 new architectures for CPU inference (#1914)
Baichuan, BLOOM, CodeShell, GPT-2, Orion, Persimmon, Phi and Phi-2,
Plamo, Qwen, Qwen2, Refact, StableLM

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-05 16:49:31 -05:00
Jared Van Bortel
10e3f7bbf5
Fix VRAM leak when model loading fails (#1901)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-01 15:45:45 -05:00
Jared Van Bortel
eadc3b8d80 backend: bump llama.cpp for VRAM leak fix when switching models
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 17:24:01 -05:00
Jared Van Bortel
6db5307730 update llama.cpp for unhandled Vulkan OOM exception fix
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 16:44:58 -05:00
Jared Van Bortel
0a40e71652
Maxwell/Pascal GPU support and crash fix (#1895)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 16:32:32 -05:00
Jared Van Bortel
b11c3f679e bump llama.cpp-mainline for C++11 compat
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 15:02:34 -05:00
Jared Van Bortel
061d1969f8
expose n_gpu_layers parameter of llama.cpp (#1890)
Also dynamically limit the GPU layers and context length fields to the maximum supported by the model.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 14:17:44 -05:00
Jared Van Bortel
f549d5a70a backend : quick llama.cpp update to fix fallback to CPU
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-29 17:16:40 -05:00
Jared Van Bortel
38c61493d2 backend: update to latest commit of llama.cpp Vulkan PR
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-29 15:47:26 -06:00
Jared Van Bortel
26acdebafa
convert: replace GPTJConfig with AutoConfig (#1866)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-22 12:14:55 -05:00
Jared Van Bortel
a9c5f53562 update llama.cpp for nomic-ai/llama.cpp#12
Fixes #1477

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-17 14:05:33 -05:00
Jared Van Bortel
b7c92c5afd
sync llama.cpp with latest Vulkan PR and newer upstream (#1819) 2024-01-16 16:36:21 -05:00
Jared Van Bortel
7e9786fccf chat: set search path early
This fixes the issues with installed versions of v2.6.0.
2024-01-11 12:04:18 -05:00
AT
96cee4f9ac
Explicitly clear the kv cache each time we eval tokens to match n_past. (#1808) 2024-01-03 14:06:08 -05:00
ThiloteE
2d566710e5 Address review 2024-01-03 11:13:07 -06:00
ThiloteE
a0f7d7ae0e Fix for "LLModel ERROR: Could not find CPU LLaMA implementation" v2 2024-01-03 11:13:07 -06:00
ThiloteE
38d81c14d0 Fixes https://github.com/nomic-ai/gpt4all/issues/1760 LLModel ERROR: Could not find CPU LLaMA implementation.
Inspired by Microsoft docs for LoadLibraryExA (https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa).
When using LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR, the lpFileName parameter must specify a fully qualified path, also it needs to be backslashes (\), not forward slashes (/).
2024-01-03 11:13:07 -06:00
Jared Van Bortel
d1c56b8b28
Implement configurable context length (#1749) 2023-12-16 17:58:15 -05:00
Jared Van Bortel
3acbef14b7
fix AVX support by removing direct linking to AVX2 libs (#1750) 2023-12-13 12:11:09 -05:00
Jared Van Bortel
0600f551b3
chatllm: do not attempt to serialize incompatible state (#1742) 2023-12-12 11:45:03 -05:00
Jared Van Bortel
1df3da0a88 update llama.cpp for clang warning fix 2023-12-11 13:07:41 -05:00
Jared Van Bortel
dfd8ef0186
backend: use ggml_new_graph for GGML backend v2 (#1719) 2023-12-06 14:38:53 -05:00
Jared Van Bortel
9e28dfac9c
Update to latest llama.cpp (#1706) 2023-12-01 16:51:15 -05:00
Adam Treat
cce5fe2045 Fix macos build. 2023-11-17 11:59:31 -05:00
Adam Treat
371e2a5cbc LocalDocs version 2 with text embeddings. 2023-11-17 11:59:31 -05:00
Jared Van Bortel
d4ce9f4a7c
llmodel_c: improve quality of error messages (#1625) 2023-11-07 11:20:14 -05:00
cebtenzzre
64101d3af5 update llama.cpp-mainline 2023-11-01 09:47:39 -04:00
Adam Treat
ffef60912f Update to llama.cpp 2023-10-30 11:40:16 -04:00
Adam Treat
f5f22fdbd0 Update llama.cpp for latest bugfixes. 2023-10-28 17:47:55 -04:00
cebtenzzre
7bcd9e8089 update llama.cpp-mainline 2023-10-27 19:29:36 -04:00
cebtenzzre
fd0c501d68
backend: support GGUFv3 (#1582) 2023-10-27 17:07:23 -04:00
Adam Treat
14b410a12a Update to latest version of llama.cpp which fixes issue 1507. 2023-10-27 12:08:35 -04:00
Adam Treat
ab96035bec Update to llama.cpp submodule for some vulkan fixes. 2023-10-26 13:46:38 -04:00
cebtenzzre
e90263c23f
make scripts executable (#1555) 2023-10-24 09:28:21 -04:00
Aaron Miller
f414c28589 llmodel: whitelist library name patterns
this fixes some issues that were being seen on installed windows builds of 2.5.0

only load dlls that actually might be model impl dlls, otherwise we pull all sorts of random junk into the process before it might expect to be

Signed-off-by: Aaron Miller <apage43@ninjawhale.com>
2023-10-23 21:40:14 -07:00
cebtenzzre
4338e72a51
MPT: use upstream llama.cpp implementation (#1515) 2023-10-19 15:25:17 -04:00
cebtenzzre
0fe2e19691
llamamodel: re-enable error messages by default (#1537) 2023-10-19 13:46:33 -04:00
cebtenzzre
017c3a9649
python: prepare version 2.0.0rc1 (#1529) 2023-10-18 20:24:54 -04:00
cebtenzzre
9a19c740ee
kompute: fix library loading issues with kp_logger (#1517) 2023-10-16 16:58:17 -04:00
Aaron Miller
f79557d2aa speedup: just use mat*vec shaders for mat*mat
so far my from-scratch mat*mats are still slower than just running more
invocations of the existing Metal ported mat*vec shaders - it should be
theoretically possible to make a mat*mat that's faster (for actual
mat*mat cases) than an optimal mat*vec, but it will need to be at
*least* as fast as the mat*vec op and then take special care to be
cache-friendly and save memory bandwidth, as the # of compute ops is the
same
2023-10-16 13:45:51 -04:00