Commit Graph

318 Commits

Author SHA1 Message Date
Jared Van Bortel
fc6c5ea0c7
llama.cpp: gemma: allow offloading the output tensor (#1997)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-22 14:06:18 -05:00
Jared Van Bortel
4fc4d94be4
fix chat-style prompt templates (#1970)
Also use a new version of Mistral OpenOrca.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-21 15:45:32 -05:00
Jared Van Bortel
7810b757c9 llamamodel: add gemma model support
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-21 13:36:31 -06:00
Adam Treat
d948a4f2ee Complete revamp of model loading to allow for more discreet control by
the user of the models loading behavior.

Signed-off-by: Adam Treat <treat.adam@gmail.com>
2024-02-21 10:15:20 -06:00
Jared Van Bortel
6fdec808b2 backend: update llama.cpp for faster state serialization
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-13 17:39:18 -05:00
Jared Van Bortel
a1471becf3 backend: update llama.cpp for Intel GPU blacklist
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-12 13:16:24 -05:00
Jared Van Bortel
eb1081d37e cmake: fix LLAMA_DIR use before set
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-09 22:00:14 -05:00
Jared Van Bortel
e60b388a2e cmake: fix backwards LLAMA_KOMPUTE default
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-09 21:53:32 -05:00
Jared Van Bortel
fc7e5f4a09
ci: fix missing Kompute support in python bindings (#1953)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-09 21:40:32 -05:00
Jared Van Bortel
bf493bb048
Mixtral crash fix and python bindings v2.2.0 (#1931)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-06 11:01:15 -05:00
Jared Van Bortel
92c025a7f6
llamamodel: add 12 new architectures for CPU inference (#1914)
Baichuan, BLOOM, CodeShell, GPT-2, Orion, Persimmon, Phi and Phi-2,
Plamo, Qwen, Qwen2, Refact, StableLM

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-05 16:49:31 -05:00
Jared Van Bortel
10e3f7bbf5
Fix VRAM leak when model loading fails (#1901)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-01 15:45:45 -05:00
Jared Van Bortel
eadc3b8d80 backend: bump llama.cpp for VRAM leak fix when switching models
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 17:24:01 -05:00
Jared Van Bortel
6db5307730 update llama.cpp for unhandled Vulkan OOM exception fix
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 16:44:58 -05:00
Jared Van Bortel
0a40e71652
Maxwell/Pascal GPU support and crash fix (#1895)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 16:32:32 -05:00
Jared Van Bortel
b11c3f679e bump llama.cpp-mainline for C++11 compat
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 15:02:34 -05:00
Jared Van Bortel
061d1969f8
expose n_gpu_layers parameter of llama.cpp (#1890)
Also dynamically limit the GPU layers and context length fields to the maximum supported by the model.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 14:17:44 -05:00
Jared Van Bortel
f549d5a70a backend : quick llama.cpp update to fix fallback to CPU
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-29 17:16:40 -05:00
Jared Van Bortel
38c61493d2 backend: update to latest commit of llama.cpp Vulkan PR
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-29 15:47:26 -06:00
Jared Van Bortel
26acdebafa
convert: replace GPTJConfig with AutoConfig (#1866)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-22 12:14:55 -05:00
Jared Van Bortel
a9c5f53562 update llama.cpp for nomic-ai/llama.cpp#12
Fixes #1477

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-17 14:05:33 -05:00
Jared Van Bortel
b7c92c5afd
sync llama.cpp with latest Vulkan PR and newer upstream (#1819) 2024-01-16 16:36:21 -05:00
Jared Van Bortel
7e9786fccf chat: set search path early
This fixes the issues with installed versions of v2.6.0.
2024-01-11 12:04:18 -05:00
AT
96cee4f9ac
Explicitly clear the kv cache each time we eval tokens to match n_past. (#1808) 2024-01-03 14:06:08 -05:00
ThiloteE
2d566710e5 Address review 2024-01-03 11:13:07 -06:00
ThiloteE
a0f7d7ae0e Fix for "LLModel ERROR: Could not find CPU LLaMA implementation" v2 2024-01-03 11:13:07 -06:00
ThiloteE
38d81c14d0 Fixes https://github.com/nomic-ai/gpt4all/issues/1760 LLModel ERROR: Could not find CPU LLaMA implementation.
Inspired by Microsoft docs for LoadLibraryExA (https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa).
When using LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR, the lpFileName parameter must specify a fully qualified path, also it needs to be backslashes (\), not forward slashes (/).
2024-01-03 11:13:07 -06:00
Jared Van Bortel
d1c56b8b28
Implement configurable context length (#1749) 2023-12-16 17:58:15 -05:00
Jared Van Bortel
3acbef14b7
fix AVX support by removing direct linking to AVX2 libs (#1750) 2023-12-13 12:11:09 -05:00
Jared Van Bortel
0600f551b3
chatllm: do not attempt to serialize incompatible state (#1742) 2023-12-12 11:45:03 -05:00
Jared Van Bortel
1df3da0a88 update llama.cpp for clang warning fix 2023-12-11 13:07:41 -05:00
Jared Van Bortel
dfd8ef0186
backend: use ggml_new_graph for GGML backend v2 (#1719) 2023-12-06 14:38:53 -05:00
Jared Van Bortel
9e28dfac9c
Update to latest llama.cpp (#1706) 2023-12-01 16:51:15 -05:00
Adam Treat
cce5fe2045 Fix macos build. 2023-11-17 11:59:31 -05:00
Adam Treat
371e2a5cbc LocalDocs version 2 with text embeddings. 2023-11-17 11:59:31 -05:00
Jared Van Bortel
d4ce9f4a7c
llmodel_c: improve quality of error messages (#1625) 2023-11-07 11:20:14 -05:00
cebtenzzre
64101d3af5 update llama.cpp-mainline 2023-11-01 09:47:39 -04:00
Adam Treat
ffef60912f Update to llama.cpp 2023-10-30 11:40:16 -04:00
Adam Treat
f5f22fdbd0 Update llama.cpp for latest bugfixes. 2023-10-28 17:47:55 -04:00
cebtenzzre
7bcd9e8089 update llama.cpp-mainline 2023-10-27 19:29:36 -04:00
cebtenzzre
fd0c501d68
backend: support GGUFv3 (#1582) 2023-10-27 17:07:23 -04:00
Adam Treat
14b410a12a Update to latest version of llama.cpp which fixes issue 1507. 2023-10-27 12:08:35 -04:00
Adam Treat
ab96035bec Update to llama.cpp submodule for some vulkan fixes. 2023-10-26 13:46:38 -04:00
cebtenzzre
e90263c23f
make scripts executable (#1555) 2023-10-24 09:28:21 -04:00
Aaron Miller
f414c28589 llmodel: whitelist library name patterns
this fixes some issues that were being seen on installed windows builds of 2.5.0

only load dlls that actually might be model impl dlls, otherwise we pull all sorts of random junk into the process before it might expect to be

Signed-off-by: Aaron Miller <apage43@ninjawhale.com>
2023-10-23 21:40:14 -07:00
cebtenzzre
4338e72a51
MPT: use upstream llama.cpp implementation (#1515) 2023-10-19 15:25:17 -04:00
cebtenzzre
0fe2e19691
llamamodel: re-enable error messages by default (#1537) 2023-10-19 13:46:33 -04:00
cebtenzzre
017c3a9649
python: prepare version 2.0.0rc1 (#1529) 2023-10-18 20:24:54 -04:00
cebtenzzre
9a19c740ee
kompute: fix library loading issues with kp_logger (#1517) 2023-10-16 16:58:17 -04:00
Aaron Miller
f79557d2aa speedup: just use mat*vec shaders for mat*mat
so far my from-scratch mat*mats are still slower than just running more
invocations of the existing Metal ported mat*vec shaders - it should be
theoretically possible to make a mat*mat that's faster (for actual
mat*mat cases) than an optimal mat*vec, but it will need to be at
*least* as fast as the mat*vec op and then take special care to be
cache-friendly and save memory bandwidth, as the # of compute ops is the
same
2023-10-16 13:45:51 -04:00
cebtenzzre
22de3c56bd
convert scripts: fix AutoConfig typo (#1512) 2023-10-13 14:16:51 -04:00
Aaron Miller
2490977f89 q6k, q4_1 mat*mat 2023-10-12 14:56:54 -04:00
Aaron Miller
afaa291eab python bindings should be quiet by default
* disable llama.cpp logging unless GPT4ALL_VERBOSE_LLAMACPP envvar is
  nonempty
* make verbose flag for retrieve_model default false (but also be
  overridable via gpt4all constructor)

should be able to run a basic test:

```python
import gpt4all
model = gpt4all.GPT4All('/Users/aaron/Downloads/rift-coder-v0-7b-q4_0.gguf')
print(model.generate('def fib(n):'))
```

and see no non-model output when successful
2023-10-11 14:14:36 -07:00
cebtenzzre
7b611b49f2
llmodel: print an error if the CPU does not support AVX (#1499) 2023-10-11 15:09:40 -04:00
Aaron Miller
043617168e do not process prompts on gpu yet 2023-10-11 13:15:50 -04:00
Aaron Miller
64001a480a mat*mat for q4_0, q8_0 2023-10-11 13:15:50 -04:00
cebtenzzre
7a19047329
llmodel: do not call magic_match unless build variant is correct (#1488) 2023-10-11 11:30:48 -04:00
Cebtenzzre
5fe685427a chat: clearer CPU fallback messages 2023-10-06 11:35:14 -04:00
Adam Treat
eec906aa05 Speculative fix for build on mac. 2023-10-05 18:37:33 -04:00
Adam Treat
a9acdd25de Push a new version number for llmodel backend now that it is based on gguf. 2023-10-05 18:18:07 -04:00
Cebtenzzre
8bb6a6c201 rebase on newer llama.cpp 2023-10-05 18:16:19 -04:00
Cebtenzzre
d87573ea75 remove old llama.cpp submodules 2023-10-05 18:16:19 -04:00
Cebtenzzre
cc6db61c93 backend: fix build with Visual Studio generator
Use the $<CONFIG> generator expression instead of CMAKE_BUILD_TYPE. This
is needed because Visual Studio is a multi-configuration generator, so
we do not know what the build type will be until `cmake --build` is
called.

Fixes #1470
2023-10-05 18:16:19 -04:00
Adam Treat
f605a5b686 Add q8_0 kernels to kompute shaders and bump to latest llama/gguf. 2023-10-05 18:16:19 -04:00
Cebtenzzre
672cb850f9 differentiate between init failure and unsupported models 2023-10-05 18:16:19 -04:00
Adam Treat
906699e8e9 Bump to latest llama/gguf branch. 2023-10-05 18:16:19 -04:00
Cebtenzzre
088afada49 llamamodel: fix static vector in LLamaModel::endTokens 2023-10-05 18:16:19 -04:00
Adam Treat
b4d82ea289 Bump to the latest fixes for vulkan in llama. 2023-10-05 18:16:19 -04:00
Adam Treat
12f943e966 Fix regenerate button to be deterministic and bump the llama version to latest we have for gguf. 2023-10-05 18:16:19 -04:00
Adam Treat
5d346e13d7 Add q6_k kernels for vulkan. 2023-10-05 18:16:19 -04:00
Adam Treat
4eefd386d0 Refactor for subgroups on mat * vec kernel. 2023-10-05 18:16:19 -04:00
Cebtenzzre
3c2aa299d8 gptj: remove unused variables 2023-10-05 18:16:19 -04:00
Cebtenzzre
f9deb87d20 convert scripts: add feed-forward length for better compatiblilty
This GGUF key is used by all llama.cpp models with upstream support.
2023-10-05 18:16:19 -04:00
Cebtenzzre
cc7675d432 convert scripts: make gptj script executable 2023-10-05 18:16:19 -04:00
Cebtenzzre
0493e6eb07 convert scripts: use bytes_to_unicode from transformers 2023-10-05 18:16:19 -04:00
Cebtenzzre
d5d72f0361 gpt-j: update inference to match latest llama.cpp insights
- Use F16 KV cache
- Store transposed V in the cache
- Avoid unnecessary Q copy

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggml upstream commit 0265f0813492602fec0e1159fe61de1bf0ccaf78
2023-10-05 18:16:19 -04:00
Cebtenzzre
050e7f076e backend: port GPT-J to GGUF 2023-10-05 18:16:19 -04:00
Cebtenzzre
8f3abb37ca fix references to removed model types 2023-10-05 18:16:19 -04:00
Cebtenzzre
4219c0e2e7 convert scripts: make them directly executable 2023-10-05 18:16:19 -04:00
Cebtenzzre
ce7be1db48 backend: use llamamodel.cpp for Falcon 2023-10-05 18:16:19 -04:00
Cebtenzzre
cca9e6ce81 convert_mpt_hf_to_gguf.py: better tokenizer decoding 2023-10-05 18:16:19 -04:00
Cebtenzzre
25297786db convert scripts: load model as late as possible 2023-10-05 18:16:19 -04:00
Cebtenzzre
fd47088f2b conversion scripts: cleanup 2023-10-05 18:16:19 -04:00
Cebtenzzre
6277eac9cc backend: use llamamodel.cpp for StarCoder 2023-10-05 18:16:19 -04:00
Cebtenzzre
17fc9e3e58 backend: port Replit to GGUF 2023-10-05 18:16:19 -04:00
Cebtenzzre
7c67262a13 backend: port MPT to GGUF 2023-10-05 18:16:19 -04:00
Cebtenzzre
42bcb814b3 backend: port BERT to GGUF 2023-10-05 18:16:19 -04:00
Cebtenzzre
1d29e4696c llamamodel: metal supports all quantization types now 2023-10-05 18:16:19 -04:00
Aaron Miller
507753a37c macos build fixes 2023-10-05 18:16:19 -04:00
Adam Treat
d90d003a1d Latest rebase on llama.cpp with gguf support. 2023-10-05 18:16:19 -04:00
Adam Treat
99c106e6b5 Fix a bug seen on AMD RADEON cards with vulkan backend. 2023-09-26 11:59:47 -04:00
Jacob Nguyen
e86c63750d Update llama.cpp.cmake
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
2023-09-16 11:42:56 -07:00
Adam Treat
84905aa281 Fix for crashes on systems where vulkan is not installed properly. 2023-09-16 12:19:46 -04:00
Adam Treat
045f6e6cdc Link against ggml in bin so we can get the available devices without loading a model. 2023-09-15 14:45:25 -04:00
Adam Treat
aa33419c6e Fallback to CPU more robustly. 2023-09-14 16:53:11 -04:00
Adam Treat
9013a089bd Bump to new llama with new bugfix. 2023-09-14 10:02:11 -04:00
Adam Treat
3076e0bf26 Only show GPU when we're actually using it. 2023-09-14 09:59:19 -04:00
Adam Treat
cf4eb530ce Sync to a newer version of llama.cpp with bugfix for vulkan. 2023-09-13 21:01:44 -04:00
Adam Treat
4b9a345aee Update the submodule. 2023-09-13 17:05:46 -04:00
Aaron Miller
6f038c136b init at most one vulkan device, submodule update
fixes issues w/ multiple of the same gpu
2023-09-13 12:49:53 -07:00
Adam Treat
8f99dca70f Bring the vulkan backend to the GUI. 2023-09-13 11:26:10 -04:00
Aaron Miller
f0735efa7d vulkan python bindings on windows fixes 2023-09-12 14:16:02 -07:00
Adam Treat
c953b321b7 Don't link against libvulkan. 2023-09-12 14:26:56 -04:00
Aaron Miller
c4d23512e4 remove extra dynamic linker deps when building with vulkan 2023-09-11 08:44:39 -07:00
Adam Treat
85e34598f9 more circleci 2023-08-31 15:29:54 -04:00
Adam Treat
f578fa6cdf Fix for windows. 2023-08-31 15:29:54 -04:00
Adam Treat
17d3e4976c Add a comment indicating future work. 2023-08-31 15:29:54 -04:00
Adam Treat
987546c63b Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0. 2023-08-31 15:29:54 -04:00
Adam Treat
d55cbbee32 Update to newer llama.cpp and disable older forks. 2023-08-31 15:29:54 -04:00
Aaron Miller
0bc2274869 bump llama.cpp version + needed fixes for that 2023-08-31 15:29:54 -04:00
aaron miller
33c22be2aa starcoder: use ggml_graph_plan 2023-08-31 15:29:54 -04:00
Cosmic Snow
108d950874 Fix Windows unable to load models on older Windows builds
- Replace high-level IsProcessorFeaturePresent
- Reintroduce low-level compiler intrinsics implementation
2023-08-09 09:27:43 +02:00
Adam Treat
6d03b3e500 Add starcoder support. 2023-07-27 09:15:16 -04:00
cosmic-snow
2d02c65177
Handle edge cases when generating embeddings (#1215)
* Handle edge cases when generating embeddings
* Improve Python handling & add llmodel_c.h note
- In the Python bindings fail fast with a ValueError when text is empty
- Advice other bindings authors to do likewise in llmodel_c.h
2023-07-17 13:21:03 -07:00
Aaron Miller
1c4a244291 bump mem allocation a bit 2023-07-14 09:48:57 -04:00
Adam Treat
ee4186d579 Fixup bert python bindings. 2023-07-14 09:48:57 -04:00
cosmic-snow
6200900677
Fix Windows MSVC arch detection (#1194)
- in llmodel.cpp to fix AVX-only handling

Signed-off-by: cosmic-snow <134004613+cosmic-snow@users.noreply.github.com>
2023-07-13 14:44:17 -04:00
Adam Treat
4963db8f43 Bump the version numbers for both python and c backend. 2023-07-13 14:21:46 -04:00
Adam Treat
0efdbfcffe Bert 2023-07-13 14:21:46 -04:00
Adam Treat
315a1f2aa2 Move it back as internal class. 2023-07-13 14:21:46 -04:00
Adam Treat
ae8eb297ac Add sbert backend. 2023-07-13 14:21:46 -04:00
Adam Treat
1f749d7633 Clean up backend code a bit and hide impl. details. 2023-07-13 14:21:46 -04:00
Adam Treat
33557b1f39 Move the implementation out of llmodel class. 2023-07-13 14:21:46 -04:00
Aaron Miller
432b7ebbd7 include windows.h just to be safe 2023-07-12 12:46:46 -04:00
Aaron Miller
95b8fb312e windows/msvc: use high level processor feature detection API
see https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-isprocessorfeaturepresent
2023-07-12 12:46:46 -04:00
Aaron Miller
f0faa23ad5
cmakelists: always export build commands (#1179)
friendly for using editors with clangd integration that don't also
manage the build themselves
2023-07-12 10:49:24 -04:00
Aaron Miller
4a24b586df llama.cpp: metal buffer freeing 2023-06-30 21:07:21 -03:00
Aaron Miller
137bc2c367 replit: free metal context 2023-06-30 21:07:21 -03:00
Aaron Miller
57dc0c8953 adjust eval buf sizes to pass long input test 2023-06-30 21:07:21 -03:00
Aaron Miller
7a5f6e4726 limit prompt batch size to 128 2023-06-30 21:07:21 -03:00
Aaron Miller
883775bc5f move 230511 submodule to nomic fork, fix alibi assert 2023-06-30 21:07:21 -03:00
Andriy Mulyar
46a0762bd5
Python Bindings: Improved unit tests, documentation and unification of API (#1090)
* Makefiles, black, isort

* Black and isort

* unit tests and generation method

* chat context provider

* context does not reset

* Current state

* Fixup

* Python bindings with unit tests

* GPT4All Python Bindings: chat contexts, tests

* New python bindings and backend fixes

* Black and Isort

* Documentation error

* preserved n_predict for backwords compat with langchain

---------

Co-authored-by: Adam Treat <treat.adam@gmail.com>
2023-06-30 16:02:02 -04:00
Aaron Miller
40a3faeb05
Use ggml scratch bufs for mpt and gptj models (#1104)
* backend/gptj: use scratch buffers

reduces total memory required and makes eval buf not grow with n_past

* backend/mpt: use scratch bufs

* fix format-related compile warnings
2023-06-30 10:53:45 -07:00
Aaron Miller
8d19ef3909
backend: factor out common elements in model code (#1089)
* backend: factor out common structs in model code

prepping to hack on these by hopefully making there be fewer places to fix the same bug

rename

* use common buffer wrapper instead of manual malloc

* fix replit compile warnings
2023-06-28 17:35:07 -07:00
Aaron Miller
28d41d4f6d
falcon: use *model-local* eval & scratch bufs (#1079)
fixes memory leaks copied from ggml/examples based implementation
2023-06-27 16:09:11 -07:00
Zach Nussbaum
2565f6a94a feat: add conversion script 2023-06-27 14:06:39 -03:00
Aaron Miller
198b5e4832 add Falcon 7B model
Tested with https://huggingface.co/TheBloke/falcon-7b-instruct-GGML/blob/main/falcon7b-instruct.ggmlv3.q4_0.bin
2023-06-27 14:06:39 -03:00
Aaron Miller
db34a2f670 llmodel: skip attempting Metal if model+kvcache > 53% of system ram 2023-06-26 19:46:49 -03:00
Aaron Miller
b19a3e5b2c add requiredMem method to llmodel impls
most of these can just shortcut out of the model loading logic llama is a bit worse to deal with because we submodule it so I have to at least parse the hparams, and then I just use the size on disk as an estimate for the mem size (which seems reasonable since we mmap() the llama files anyway)
2023-06-26 18:27:58 -03:00
Adam Treat
a0f80453e5 Use sysinfo in backend. 2023-06-26 14:14:49 -04:00
niansa/tuxifan
47323f8591 Update replit.cpp
replit_tokenizer_detokenize returnins std::string now

Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-26 14:49:58 -03:00
niansa
0855c0df1d Fixed Replit implementation compile warnings 2023-06-26 14:49:58 -03:00
Aaron Miller
1290b32451 update to latest mainline llama.cpp
add max_size param to ggml_metal_add_buffer - introduced in https://github.com/ggerganov/llama.cpp/pull/1826
2023-06-26 14:40:52 -03:00
niansa/tuxifan
5eee16c97c Do not specify "success" as error for unsupported models
Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-22 09:28:40 +02:00
Adam Treat
bd58c46da0 Initialize these to nullptr to prevent double deletion when a model fails to load. 2023-06-20 18:23:45 -04:00
niansa/tuxifan
68f9786ed9
Use operator ""_MiB (#991) 2023-06-16 15:56:22 -04:00
Aaron Miller
abc081e48d
fix llama.cpp k-quants (#988)
* enable k-quants on *all* mainline builds
2023-06-15 14:06:14 -07:00
Aaron Miller
c4319d2c8e
dlhandle: prevent libs from using each other's symbols (#977)
use RTLD_LOCAL so that symbols are *only* exposed via dlsym

without this all symbols exported by the libs are available for symbol
resolution, resulting in different lib versions potentially resolving
*each other's* symbols, causing incredibly cursed behavior such as
https://gist.github.com/apage43/085c1ff69f6dd05387793ebc301840f6
2023-06-13 14:52:11 -04:00
Aaron Miller
f71d8efc71
metal replit (#931)
metal+replit

makes replit work with Metal and removes its use of `mem_per_token`
in favor of fixed size scratch buffers (closer to llama.cpp)
2023-06-13 07:29:14 -07:00
Aaron Miller
85964a7635
bump llama.cpp mainline to latest (#964) 2023-06-13 08:40:38 -04:00