mirror of
https://github.com/nomic-ai/gpt4all.git
synced 2025-10-15 21:19:02 +00:00
Improves output quality by making these tokenizers more closely match the behavior of the huggingface `tokenizers` based BPE tokenizers these models were trained with. Featuring: * Fixed unicode handling (via ICU) * Fixed BPE token merge handling * Complete added vocabulary handling