mirror of
https://github.com/nomic-ai/gpt4all.git
synced 2025-09-02 00:57:09 +00:00
New tokenizer implementation for MPT and GPT-J
Improves output quality by making these tokenizers more closely match the behavior of the huggingface `tokenizers` based BPE tokenizers these models were trained with. Featuring: * Fixed unicode handling (via ICU) * Fixed BPE token merge handling * Complete added vocabulary handling
This commit is contained in:
@@ -1,4 +1,4 @@
|
||||
[codespell]
|
||||
skip = .git,*.pdf,*.svg
|
||||
skip = .git,*.pdf,*.svg,*_tokenizer_config.h
|
||||
#
|
||||
# ignore-words-list =
|
||||
|
Reference in New Issue
Block a user