Commit Graph

318 Commits

Author SHA1 Message Date
Jared Van Bortel
9772027e5e WIP: provider page in the "add models" view 2025-03-19 10:49:39 -04:00
Jared Van Bortel
f7cd880f96 make it build - still plenty of TODOs 2025-03-11 17:08:11 -04:00
Jared Van Bortel
7745f208bc WIP (clang is crashing) 2025-03-11 13:33:06 -04:00
Jared Van Bortel
1dc9f22d5b WIP 2025-03-03 11:16:36 -05:00
Jared Van Bortel
1ba555a174 fix handling of responses that come in chunks, and non-200 status codes 2025-02-28 12:38:48 -05:00
Jared Van Bortel
d20cfbbec9 stuff is working now 2025-02-27 18:35:33 -05:00
Jared Van Bortel
068845e1a2 don't duplicate QCoro's exception passing 2025-02-27 14:56:14 -05:00
Jared Van Bortel
ea2ced8c8b fix json EOF handling 2025-02-27 14:34:13 -05:00
Jared Van Bortel
cc6f995795 fix #includes 2025-02-27 11:50:16 -05:00
Jared Van Bortel
d4e9a6177b finished initial impl of /show and tested -> hangs! 2025-02-26 20:01:58 -05:00
Jared Van Bortel
7ce2ea57e0 implement /api/show (not tested) 2025-02-26 19:47:25 -05:00
Jared Van Bortel
85eaa41e6d base url should include /api/ 2025-02-26 17:18:56 -05:00
Jared Van Bortel
e872f1db2d undercores to dashes 2025-02-26 17:14:39 -05:00
Jared Van Bortel
86de26ead2 implement and test /api/tags 2025-02-26 16:58:00 -05:00
Jared Van Bortel
4c5dcf59ea rename the class to "OllamaClient" 2025-02-26 16:48:05 -05:00
Jared Van Bortel
06475dd113 WIP: use Boost::json for incremental parsing and reflection 2025-02-26 15:57:11 -05:00
Jared Van Bortel
927e963076 parse the JSON response 2025-02-25 16:11:40 -05:00
Jared Van Bortel
b5144decde fix #includes 2025-02-25 14:58:04 -05:00
Jared Van Bortel
407cb81725 stop using C++20 modules
2025 is too soon to use C++ features from 2020 without running into bugs
in every build tool that touches the project.
2025-02-25 12:10:40 -05:00
Jared Van Bortel
1699e77e97 WIP: get build working on macOS 2025-02-25 12:10:09 -05:00
Jared Van Bortel
ebe6352fc8 WIP (hit a clang bug causing an incorrect compiler error) 2025-02-25 12:10:08 -05:00
Jared Van Bortel
196c387bf7 WIP: bring back old backend so we can test the gpt4all-chat build 2025-02-25 12:01:37 -05:00
Jared Van Bortel
729a5b0d9f ollama-hpp immediately segfaulted. will try something else
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2025-02-25 12:00:48 -05:00
Jared Van Bortel
f4a350d606 WIP: working fmt dep
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2025-02-25 12:00:48 -05:00
Jared Van Bortel
c6d0a1f2b9 enable color diagnostics with ninja
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2025-02-25 12:00:48 -05:00
Jared Van Bortel
b194d71e86 WIP: backend dependencies
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2025-02-25 12:00:48 -05:00
Jared Van Bortel
8e94409be9 WIP: gpt4all backend stub
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2025-02-25 12:00:05 -05:00
Jared Van Bortel
96aeb44210
backend: build with CUDA compute 5.0 support by default (#3499)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2025-02-19 11:27:06 -05:00
ThiloteE
02e12089d3
Add Granite arch to model whitelist (#3487)
Signed-off-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2025-02-12 14:17:49 -05:00
Jared Van Bortel
22ebd42c32
Misc fixes for undefined behavior, crashes, and build failure (#3465)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2025-02-06 11:22:52 -05:00
ThiloteE
6ef0bd518e
Whitelist OLMoE and Granite MoE (#3449)
Signed-off-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2025-02-04 18:00:07 -05:00
Jared Van Bortel
343a4b6b6a
Support DeepSeek-R1 Qwen (#3431)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2025-01-29 09:51:50 -05:00
Jared Van Bortel
0c70b5a5f4
llamamodel: add missing softmax to fix temperature (#3202)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-12-04 10:56:19 -05:00
Jared Van Bortel
225bf6be93
Remove binary state from high-level API and use Jinja templates (#3147)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Adam Treat <treat.adam@gmail.com>
Co-authored-by: Adam Treat <treat.adam@gmail.com>
2024-11-25 10:04:17 -05:00
Jared Van Bortel
f07e2e63df
Use the token cache to infer greater n_past and reuse results (#3073)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-10-31 11:19:12 -04:00
Jared Van Bortel
c3357b7625
Enable more warning flags, and fix more warnings (#3065)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-10-18 12:11:03 -04:00
Jared Van Bortel
8e3108fe1f
Establish basic compiler warnings, and fix a few style issues (#3039)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-10-09 09:11:50 -04:00
AT
ea1ade8668
Use different language for prompt size too large. (#3004)
Signed-off-by: Adam Treat <treat.adam@gmail.com>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2024-09-27 12:29:22 -04:00
Jared Van Bortel
f9d6be8afb
backend: rebase llama.cpp on upstream as of Sep 26th (#2998)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-09-27 12:05:59 -04:00
Ikko Eltociear Ashimine
1047c5e038
docs: update README.md (#2979)
Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Signed-off-by: AT <manyoso@users.noreply.github.com>
Co-authored-by: AT <manyoso@users.noreply.github.com>
2024-09-23 16:12:52 -04:00
Jared Van Bortel
69782cf713
chat(build): fix broken installer on macOS (#2973)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-09-20 15:34:20 -04:00
Jared Van Bortel
39005288c5
server: improve correctness of request parsing and responses (#2929)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-09-09 10:48:57 -04:00
Jared Van Bortel
ca151f3519
repo: organize sources, headers, and deps into subdirectories (#2917)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-08-27 17:22:40 -04:00
Jared Van Bortel
6518b33697
llamamodel: use greedy sampling when temp=0 (#2854)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-08-13 17:04:50 -04:00
Jared Van Bortel
7463b2170b
backend(build): set CUDA arch defaults before enable_language(CUDA) (#2855)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-08-13 14:47:48 -04:00
Jared Van Bortel
971c83d1d3
llama.cpp: pull in fix for Kompute-related nvidia-egl crash (#2843)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-08-13 11:10:10 -04:00
Jared Van Bortel
26113a17fb
don't use ranges::contains due to clang incompatibility (#2812)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-08-08 11:49:01 -04:00
Jared Van Bortel
de7cb36fcc
python: reduce size of wheels built by CI, other build tweaks (#2802)
* Read CMAKE_CUDA_ARCHITECTURES directly
* Disable CUBINs for python build in CI
* Search for CUDA 11 as well as CUDA 12

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-08-07 11:27:50 -04:00
Jared Van Bortel
be66ec8ab5
chat: faster KV shift, continue generating, fix stop sequences (#2781)
* Don't stop generating at end of context
* Use llama_kv_cache ops to shift context
* Fix and improve reverse prompt detection
* Replace prompt recalc callback with a flag to disallow context shift
2024-08-07 11:25:24 -04:00
Jared Van Bortel
51bd01ae05
backend: fix extra spaces in tokenization and a CUDA crash (#2778)
Also potentially improves accuracy of BOS insertion, token cache, and logit indexing.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-08-01 10:46:36 -04:00