fix(config): make tokenizer optional and include a troubleshooting doc (#1998)

* docs: add troubleshooting * fix: pass HF token to setup script and prevent to download tokenizer when it is empty * fix: improve log and disable specific tokenizer by default * chore: change HF_TOKEN environment to be aligned with default config * ifx: mypy
2025-08-01 07:31:21 +00:00 · 2024-07-17 10:06:27 +02:00 · 2024-07-17 10:06:27 +02:00 · 01b7ccd064
commit 01b7ccd064
parent 15f73dbc48
6 changed files with 65 additions and 12 deletions
--- a/fern/docs.yml
+++ b/fern/docs.yml
@ -41,6 +41,8 @@ navigation:
            path: ./docs/pages/installation/concepts.mdx
          - page: Installation
            path: ./docs/pages/installation/installation.mdx
+          - page: Troubleshooting
+            path: ./docs/pages/installation/troubleshooting.mdx
  # Manual of privateGPT: how to use it and configure it
  - tab: manual
    layout:
--- a/fern/docs/pages/installation/installation.mdx
+++ b/fern/docs/pages/installation/installation.mdx
@ -81,6 +81,8 @@ set PGPT_PROFILES=ollama
 make run
 ```

+Refer to the [troubleshooting](./troubleshooting) section for specific issues you might encounter.
+
 ### Local, Ollama-powered setup - RECOMMENDED

 **The easiest way to run PrivateGPT fully locally** is to depend on Ollama for the LLM. Ollama provides local LLM and Embeddings super easy to install and use, abstracting the complexity of GPU support. It's the recommended setup for local development.
--- a/fern/docs/pages/installation/troubleshooting.mdx
+++ b/fern/docs/pages/installation/troubleshooting.mdx
@ -0,0 +1,44 @@
+# Downloading Gated and Private Models
+
+Many models are gated or private, requiring special access to use them. Follow these steps to gain access and set up your environment for using these models.
+
+## Accessing Gated Models
+
+1. **Request Access:**
+   Follow the instructions provided [here](https://huggingface.co/docs/hub/en/models-gated) to request access to the gated model.
+
+2. **Generate a Token:**
+   Once you have access, generate a token by following the instructions [here](https://huggingface.co/docs/hub/en/security-tokens).
+
+3. **Set the Token:**
+   Add the generated token to your `settings.yaml` file:
+
+   ```yaml
+   huggingface:
+     access_token: <your-token>
+   ```
+
+   Alternatively, set the `HF_TOKEN` environment variable:
+
+   ```bash
+   export HF_TOKEN=<your-token>
+   ```
+
+# Tokenizer Setup
+
+PrivateGPT uses the `AutoTokenizer` library to tokenize input text accurately. It connects to HuggingFace's API to download the appropriate tokenizer for the specified model.
+
+## Configuring the Tokenizer
+
+1. **Specify the Model:**
+   In your `settings.yaml` file, specify the model you want to use:
+
+   ```yaml
+   llm:
+     tokenizer: mistralai/Mistral-7B-Instruct-v0.2
+   ```
+
+2. **Set Access Token for Gated Models:**
+   If you are using a gated model, ensure the `access_token` is set as mentioned in the previous section.
+
+This configuration ensures that PrivateGPT can download and use the correct tokenizer for the model you are working with.
--- a/private_gpt/components/llm/llm_component.py
+++ b/private_gpt/components/llm/llm_component.py
@ -35,10 +35,10 @@ class LLMComponent:
                )
            except Exception as e:
                logger.warning(
-                    "Failed to download tokenizer %s. Falling back to "
-                    "default tokenizer.",
-                    settings.llm.tokenizer,
-                    e,
+                    f"Failed to download tokenizer {settings.llm.tokenizer}: {e!s}"
+                    f"Please follow the instructions in the documentation to download it if needed: "
+                    f"https://docs.privategpt.dev/installation/getting-started/troubleshooting#tokenizer-setup."
+                    f"Falling back to default tokenizer."
                )

        logger.info("Initializing the LLM in mode=%s", llm_mode)
--- a/scripts/setup
+++ b/scripts/setup
@ -24,6 +24,7 @@ snapshot_download(
    repo_id=settings().huggingface.embedding_hf_model_name,
    cache_dir=models_cache_path,
    local_dir=embedding_path,
+    token=settings().huggingface.access_token,
 )
 print("Embedding model downloaded!")

@ -35,15 +36,18 @@ hf_hub_download(
    cache_dir=models_cache_path,
    local_dir=models_path,
    resume_download=resume_download,
+    token=settings().huggingface.access_token,
 )
 print("LLM model downloaded!")

 # Download Tokenizer
-print(f"Downloading tokenizer {settings().llm.tokenizer}")
-AutoTokenizer.from_pretrained(
-    pretrained_model_name_or_path=settings().llm.tokenizer,
-    cache_dir=models_cache_path,
-)
-print("Tokenizer downloaded!")
+if settings().llm.tokenizer:
+    print(f"Downloading tokenizer {settings().llm.tokenizer}")
+    AutoTokenizer.from_pretrained(
+        pretrained_model_name_or_path=settings().llm.tokenizer,
+        cache_dir=models_cache_path,
+        token=settings().huggingface.access_token,
+    )
+    print("Tokenizer downloaded!")

 print("Setup done")
--- a/settings.yaml
+++ b/settings.yaml
@ -40,7 +40,8 @@ llm:
  # Should be matching the selected model
  max_new_tokens: 512
  context_window: 3900
-  tokenizer: mistralai/Mistral-7B-Instruct-v0.2
+  # Select your tokenizer. Llama-index tokenizer is the default.
+  # tokenizer: mistralai/Mistral-7B-Instruct-v0.2
  temperature: 0.1      # The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)

 rag:
@ -76,7 +77,7 @@ embedding:

 huggingface:
  embedding_hf_model_name: BAAI/bge-small-en-v1.5
-  access_token: ${HUGGINGFACE_TOKEN:}
+  access_token: ${HF_TOKEN:}

 vectorstore:
  database: qdrant