backend: fix extra spaces in tokenization and a CUDA crash (#2778)

Also potentially improves accuracy of BOS insertion, token cache, and logit indexing. Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2025-09-06 19:10:15 +00:00 · 2024-08-01 10:46:36 -04:00
parent da59c9f5ea
commit 51bd01ae05
10 changed files with 46 additions and 36 deletions
--- a/gpt4all-backend/llmodel_c.h
+++ b/gpt4all-backend/llmodel_c.h
@@ -30,8 +30,6 @@ typedef void *llmodel_model;
 * behavior.
 */
 struct llmodel_prompt_context {
-    float *logits;          // logits of current context
-    size_t logits_size;     // the size of the raw logits vector
    int32_t *tokens;        // current tokens in the context window
    size_t tokens_size;     // the size of the raw tokens vector
    int32_t n_past;         // number of tokens in past conversation