mirror of
https://github.com/nomic-ai/gpt4all.git
synced 2025-11-20 01:11:04 +00:00
Converting From A Trained Huggingface Model to GGML Quantized Model
Currently, converting from a Huggingface Model to a GGML Quantized model is a tedious process that involves a few different steps. Here we will outline the current process.
convert_llama_hf_to_ggml.py is from llama.cpp and doesn't rely on Huggingface or PyTorch.
The other scripts rely on Huggingface and PyTorch and are adapted from ggml.
For the following example, we will use a LLaMa style model.
-
Install the depenedencies
pip install -r requirements.txt -
Convert the model to
ggmlformat
python converter/convert_llama_hf_to_ggml.py <model_name> <output_dir> --outtype=<output_type>
-
Navigate to the
llama.cppdirectory -
Build
llama.cppmkdir build cd build cmake .. cmake --build . --config Release -
Run the
quantizebinary./quantize <ggmlfp32.bin> <output_model.bin> <quantization_level>