gpt4all/gpt4all-quantizer
2023-07-14 17:50:05 -04:00
..
converter feat: converter scripts from hf 2023-07-14 17:49:46 -04:00
README.md docs: steps to convert 2023-07-14 17:49:56 -04:00
requirements.txt chore: reqs.txt 2023-07-14 17:50:05 -04:00

Converting From A Trained Huggingface Model to GGML Quantized Model

Currently, converting from a Huggingface Model to a GGML Quantized model is a tedious process that involves a few different steps. Here we will outline the current process.

convert_llama_hf_to_ggml.py is from llama.cpp and doesn't rely on Huggingface or PyTorch.

The other scripts rely on Huggingface and PyTorch and are adapted from ggml.

For the following example, we will use a LLaMa style model.

  1. Install the depenedencies

    pip install -r requirements.txt
    
  2. Convert the model to ggml format

python converter/convert_llama_hf_to_ggml.py <model_name> <output_dir> --outtype=<output_type>
  1. Navigate to the llama.cpp directory

  2. Build llama.cpp

    mkdir build
    cd build
    cmake ..
    cmake --build . --config Release
    
  3. Run the quantize binary

    ./quantize <ggmlfp32.bin>  <output_model.bin> <quantization_level>