backend: fix extra spaces in tokenization and a CUDA crash (#2778)

Also potentially improves accuracy of BOS insertion, token cache, and logit indexing.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
This commit is contained in:
Jared Van Bortel
2024-08-01 10:46:36 -04:00
committed by GitHub
parent da59c9f5ea
commit 51bd01ae05
10 changed files with 46 additions and 36 deletions

View File

@@ -611,6 +611,7 @@ std::string trim_whitespace(const std::string& input)
return std::string(first_non_whitespace, last_non_whitespace);
}
// FIXME(jared): we don't actually have to re-decode the prompt to generate a new response
void ChatLLM::regenerateResponse()
{
// ChatGPT uses a different semantic meaning for n_past than local models. For ChatGPT, the meaning