docs: steps to convert

2025-08-12 05:12:07 +00:00 · 2023-07-14 17:49:56 -04:00 · 2023-07-14 17:49:56 -04:00 · 1b3b32c630
commit 1b3b32c630
parent c7f0cf0cd2
1 changed files with 38 additions and 0 deletions
--- a/gpt4all-quantizer/README.md
+++ b/gpt4all-quantizer/README.md
@ -0,0 +1,38 @@
+# Converting From A Trained Huggingface Model to GGML Quantized Model
+
+Currently, converting from a Huggingface Model to a GGML Quantized model is a tedious process that involves a few different steps. Here we will outline the current process.
+
+`convert_llama_hf_to_ggml.py` is from [llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/convert.py) and doesn't rely on Huggingface or PyTorch.
+
+The other scripts rely on Huggingface and PyTorch and are adapted from [ggml](https://github.com/ggerganov/ggml).
+
+For the following example, we will use a LLaMa style model.
+
+1. Install the depenedencies
+
+    ```bash
+    pip install -r requirements.txt
+    ```
+
+2. Convert the model to `ggml` format
+
+```bash
+python converter/convert_llama_hf_to_ggml.py <model_name> <output_dir> --outtype=<output_type>
+```
+
+1. Navigate to the `llama.cpp` directory
+
+1. Build `llama.cpp`
+
+    ```bash
+    mkdir build
+    cd build
+    cmake ..
+    cmake --build . --config Release
+    ```
+
+1. Run the `quantize` binary
+
+    ```bash
+    ./quantize <ggmlfp32.bin>  <output_model.bin> <quantization_level>
+    ```