mirror of
https://github.com/nomic-ai/gpt4all.git
synced 2025-06-01 20:06:29 +00:00
Improves output quality by making these tokenizers more closely match the behavior of the huggingface `tokenizers` based BPE tokenizers these models were trained with. Featuring: * Fixed unicode handling (via ICU) * Fixed BPE token merge handling * Complete added vocabulary handling |
||
---|---|---|
.. | ||
convert_mpt_hf_to_ggml.py | ||
gen_tokenizer_include.py |