* Read CMAKE_CUDA_ARCHITECTURES directly
* Disable CUBINs for python build in CI
* Search for CUDA 11 as well as CUDA 12
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* Don't stop generating at end of context
* Use llama_kv_cache ops to shift context
* Fix and improve reverse prompt detection
* Replace prompt recalc callback with a flag to disallow context shift
* Adds support for GPT-NeoX, Gemma 2, OpenELM, ChatGLM, and Jais architectures (all with Kompute support)
* Also enables Kompute support for StarCoder2, XVERSE, Command R, and OLMo
* Includes a number of Kompute resource management fixes
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
As discussed on Discord, this PR was not ready to be merged. CI fails on
it.
This reverts commit a602f7fde7.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* backend: refactor dlhandle.h into oscompat.{cpp,h}
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llmodel: alias std::filesystem
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llmodel: use wide strings for paths on Windows
Using the native path representation allows us to manipulate paths and
call LoadLibraryEx without mangling non-ASCII characters.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llmodel: prefer built-in std::filesystem functionality
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* oscompat: fix string type error
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* backend: rename oscompat back to dlhandle
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* dlhandle: fix #includes
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* dlhandle: remove another #include
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* dlhandle: move dlhandle #include
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* dlhandle: remove #includes that are covered by dlhandle.h
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llmodel: fix #include order
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
---------
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* rebase onto llama.cpp commit ggerganov/llama.cpp@d46dbc76f
* support for CUDA backend (enabled by default)
* partial support for Occam's Vulkan backend (disabled by default)
* partial support for HIP/ROCm backend (disabled by default)
* sync llama.cpp.cmake with upstream llama.cpp CMakeLists.txt
* changes to GPT4All backend, bindings, and chat UI to handle choice of llama.cpp backend (Kompute or CUDA)
* ship CUDA runtime with installed version
* make device selection in the UI on macOS actually do something
* model whitelist: remove dbrx, mamba, persimmon, plamo; add internlm and starcoder2
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* backend: factor out common structs in model code
prepping to hack on these by hopefully making there be fewer places to fix the same bug
rename
* use common buffer wrapper instead of manual malloc
* fix replit compile warnings
* Initial Library Loader
* Load library as part of Model factory
* Dynamically search and find the dlls
* Update tests to use locally built runtimes
* Fix dylib loading, add macos runtime support for sample/tests
* Bypass automatic loading by default.
* Only set CMAKE_OSX_ARCHITECTURES if not already set, allow cross-compile
* Switch Loading again
* Update build scripts for mac/linux
* Update bindings to support newest breaking changes
* Fix build
* Use llmodel for Windows
* Actually, it does need to be libllmodel
* Name
* Remove TFMs, bypass loading by default
* Fix script
* Delete mac script
---------
Co-authored-by: Tim Miller <innerlogic4321@ghmail.com>
* porting over replit code model to gpt4all
* replaced memory with kv_self struct
* continuing debug
* welp it built but lot of sus things
* working model loading and somewhat working generate.. need to format response?
* revert back to semi working version
* finally got rid of weird formatting
* figured out problem is with python bindings - this is good to go for testing
* addressing PR feedback
* output refactor
* fixed prompt reponse collection
* cleanup
* addressing PR comments
* building replit backend with new ggmlver code
* chatllm replit and clean python files
* cleanup
* updated replit to match new llmodel api
* match llmodel api and change size_t to Token
* resolve PR comments
* replit model commit comment