Documentation updates and default settings reviewed

2025-07-31 07:05:02 +00:00 · 2024-02-29 14:48:55 +01:00 · 2024-02-29 14:48:55 +01:00 · 8c390812ff
commit 8c390812ff
parent 3373e80850
10 changed files with 241 additions and 100 deletions
--- a/fern/docs.yml
+++ b/fern/docs.yml
@ -30,15 +30,15 @@ navigation:
    layout:
      - section: Welcome
        contents:
-          - page: Welcome
+          - page: Introduction
            path: ./docs/pages/overview/welcome.mdx
-          - page: Quickstart
-            path: ./docs/pages/overview/quickstart.mdx
  # How to install privateGPT, with FAQ and troubleshooting
  - tab: installation
    layout:
      - section: Getting started
        contents:
+          - page: Main Concepts
+            path: ./docs/pages/installation/concepts.mdx
          - page: Installation
            path: ./docs/pages/installation/installation.mdx
  # Manual of privateGPT: how to use it and configure it
--- a/fern/docs/pages/installation/concepts.mdx
+++ b/fern/docs/pages/installation/concepts.mdx
@ -0,0 +1,56 @@
+PrivateGPT is a service that wraps a set of AI RAG primitives in a comprehensive set of APIs providing a private, secure, customizable and easy to use GenAI development framework.
+
+It uses FastAPI and LLamaIndex as its core frameworks. Those can be customized by changing the codebase itself.
+
+It supports a variety of LLM providers, embeddings providers, and vector stores, both local and remote. Those can be easily changed without changing the codebase.
+
+# Different Setups support
+
+## Setup configurations available
+You get to decide the setup for these 3 main components:
+- LLM: the large language model provider used for inference. It can be local, or remote, or even OpenAI.
+- Embeddings: the embeddings provider used to encode the input, the documents and the users' queries. Same as the LLM, it can be local, or remote, or even OpenAI.
+- Vector store: the store used to index and retrieve the documents.
+
+There is an extra component that can be enabled or disabled: the UI. It is a Gradio UI that allows to interact with the API in a more user-friendly way.
+
+### Setups and Dependencies
+Your setup will be the combination of the different options available. You'll find recommended setups in the [installation](/installation) section.
+PrivateGPT uses poetry to manage its dependencies. You can install the dependencies for the different setups by running `poetry install --extras "<extra1> <extra2>..."`.
+Extras are the different options available for each component. For example, to install the dependencies for a local setup with UI and qdrant as vector database, you would run `poetry install --extras "ui local qdrant"`.
+Refer to the [installation](/installation) section for more details.
+
+### Setups and Configuration
+PrivateGPT uses yaml to define its configuration in files named `settings-<profile>.yaml`.
+Different configuration files can be created in the root directory of the project.
+PrivateGPT will load the configuration at startup from the profile specified in the `PGPT_PROFILES` environment variable.
+For example, running:
+```bash
+PGPT_PROFILES=ollama make run
+```
+will load the configuration from `settings.yaml` and `settings-ollama.yaml`.
+- `settings.yaml` is always loaded and contains the default configuration.
+- `settings-ollama.yaml` is loaded if the `ollama` profile is specified in the `PGPT_PROFILES` environment variable. It can override configuration from the default `settings.yaml`
+
+## About Fully Local Setups
+In order to run PrivateGPT in a fully local setup, you will need to run the LLM, Embeddings and Vector Store locally.
+### Vector stores
+The 3 vector stores supported (Qdrant, ChromaDB and Postgres) run locally by default.
+### Embeddings
+For local embeddings you need to install the 'local' extra dependencies. It will use Huggingface Embeddings.
+
+Note: Ollama will support Embeddings in the short term for easier installation, but it doesn't as of today.
+
+In order for local embeddings to work, you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
+```bash
+poetry run python scripts/setup
+```
+### LLM
+For local LLM there are two options:
+* (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
+* You can use the 'local' option in PrivateGPT, which will use LlamaCPP. It works great on Mac with Metal most of the times (leverages Metal GPU), but it can be tricky in certain Linux and Windows distributions, depending on the GPU. In the installation document you'll find guides and troubleshooting.
+
+In order for local LLM to work (the second option), you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
+```bash
+poetry run python scripts/setup
+```
--- a/fern/docs/pages/installation/installation.mdx
+++ b/fern/docs/pages/installation/installation.mdx
@ -1,8 +1,8 @@
-## Installation and Settings
+It is important that you review the Main Concepts before you start the installation process.

-### Base requirements to run PrivateGPT
+## Base requirements to run PrivateGPT

-* Git clone PrivateGPT repository, and navigate to it:
+* Clone PrivateGPT repository, and navigate to it:

 ```bash
  git clone https://github.com/imartinez/privateGPT
@ -21,93 +21,128 @@ pyenv local 3.11

 * Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) for dependency management:

-* Have a valid C++ compiler like gcc. See [Troubleshooting: C++ Compiler](#troubleshooting-c-compiler) for more details.
-
-* Install `make` for scripts:
+* Install `make` to be able to run the different scripts:
    * osx: (Using homebrew): `brew install make`
    * windows: (Using chocolatey) `choco install make`

-### Install dependencies
+## Install and run your desired setup

-Install the dependencies:
+PrivateGPT allows to customize the setup -from fully local to cloud based- by deciding the modules to use.
+Here are the different options available:
+
+- LLM: "local" (uses LlamaCPP), "ollama", "sagemaker", "openai", "openailike"
+- Embeddings: "local" (uses HuggingFace embeddings), "openai", "sagemaker"
+- Vector stores: "qdrant", "chroma", "postgres"
+- UI: whether or not to enable UI (Gradio) or just go with the API
+
+In order to only install the required dependencies, PrivateGPT offers different `extras` that can be combined during the installation process:

 ```bash
-poetry install --with ui
+poetry install --extras "<extra1> <extra2>..."
 ```

-Verify everything is working by running `make run` (or `poetry run python -m private_gpt`) and navigate to
-http://localhost:8001. You should see a [Gradio UI](https://gradio.app/) **configured with a mock LLM** that will
-echo back the input. Below we'll see how to configure a real LLM.
+Where `<extra>` can be any of the following:

-### Settings
+- ui: adds support for UI using Gradio
+- local: adds support for local LLM and Embeddings using LlamaCPP - expect a messy installation process on some platforms
+- openai: adds support for OpenAI LLM and Embeddings, requires OpenAI API key
+- sagemaker: adds support for Amazon Sagemaker LLM and Embeddings, requires Sagemaker endpoints
+- ollama: adds support for Ollama LLM, the easiest way to get a local LLM running
+- openai-like: adds support for 3rd party LLM providers that are compatible with OpenAI's API
+- qdrant: adds support for Qdrant vector store
+- chroma: adds support for Chroma DB vector store
+- postgres: adds support for Postgres vector store

-<Callout intent="info">
-The default settings of PrivateGPT should work out-of-the-box for a 100% local setup. **However**, as is, it runs exclusively on your CPU.
-Skip this section if you just want to test PrivateGPT locally, and come back later to learn about more configuration options (and have better performances).
-</Callout>
+## Recommended Setups

-<br />
+There are just some examples of recommended setups. You can mix and match the different options to fit your needs.
+You'll find more information in the Manual section of the documentation.

-### Local LLM requirements
+### Local, Ollama-powered setup

-Install extra dependencies for local execution:
+The easiest way to run PrivateGPT fully locally is to depend on Ollama for the LLM. Ollama provides a local LLM that is easy to install and use.

+Go to [ollama.ai](https://ollama.ai/) and follow the instructions to install Ollama on your machine.
+
+Once done, you can install PrivateGPT with the following command:
 ```bash
-poetry install --with local
+poetry install --extras "ui local ollama qdrant"
 ```

-For PrivateGPT to run fully locally GPU acceleration is required
-(CPU execution is possible, but very slow), however,
-typical Macbook laptops or window desktops with mid-range GPUs lack VRAM to run
-even the smallest LLMs. For that reason
-**local execution is only supported for models compatible with [llama.cpp](https://github.com/ggerganov/llama.cpp)**
-
-These two models are known to work well:
-
-* https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF
-* https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF (recommended)
-
-To ease the installation process, use the `setup` script that will download both
-the embedding and the LLM model and place them in the correct location (under `models` folder):
-
+We are installing "local" dependency to support local embeddings, because Ollama doesn't support embeddings just yet. But they working on it!
+In order for local embeddings to work, you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
 ```bash
 poetry run python scripts/setup
 ```

-If you are ok with CPU execution, you can skip the rest of this section.
+Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.

-As stated before, llama.cpp is required and in
+```bash
+PGPT_PROFILES=ollama make run
+```
+
+PrivateGPT will use the already existing `settings-ollama.yaml` settings file, which is already configured to use Ollama LLM, local Embeddings, and Qdrant. Review it and adapt it to your needs (different LLM model, different Ollama port, etc.)
+
+The UI will be available at http://localhost:8001
+
+### Private, Sagemaker-powered setup
+
+If you need more performance, you can run a version of PrivateGPT that relies on powerful AWS Sagemaker machines to serve the LLM and Embeddings.
+
+You need to have access to sagemaker inference endpoints for the LLM and / or the embeddings, and have AWS credentials properly configured.
+
+Edit the `settings-sagemaker.yaml` file to include the correct Sagemaker endpoints.
+
+Then, install PrivateGPT with the following command:
+```bash
+poetry install --extras "ui sagemaker qdrant"
+```
+
+Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.
+
+```bash
+PGPT_PROFILES=sagemaker make run
+```
+
+PrivateGPT will use the already existing `settings-sagemaker.yaml` settings file, which is already configured to use Sagemaker LLM and Embeddings endpoints, and Qdrant.
+
+The UI will be available at http://localhost:8001
+
+### Local, Llama-CPP powered setup
+
+If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command:
+
+```bash
+poetry install --extras "ui local qdrant"
+```
+
+In order for local LLM and embeddings to work, you need to download the models to the `models` folder. You can do so by running the `setup` script:
+```bash
+poetry run python scripts/setup
+```
+
+Once installed, you can run PrivateGPT with the following command:
+
+```bash
+PGPT_PROFILES=local make run
+```
+
+PrivateGPT will load the already existing `settings-local.yaml` file, which is already configured to use LlamaCPP and Qdrant.
+
+The UI will be available at http://localhost:8001
+
+#### Llama-CPP support
+
+For PrivateGPT to run fully locally without Ollama, Llama.cpp is required and in
 particular [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
 is used.

+You'll need to have a valid C++ compiler like gcc installed. See [Troubleshooting: C++ Compiler](#troubleshooting-c-compiler) for more details.
+
 > It's highly encouraged that you fully read llama-cpp and llama-cpp-python documentation relevant to your platform.
 > Running into installation issues is very likely, and you'll need to troubleshoot them yourself.

-#### Customizing low level parameters
-
-Currently, not all the parameters of `llama.cpp` and `llama-cpp-python` are available at PrivateGPT's `settings.yaml` file.
-In case you need to customize parameters such as the number of layers loaded into the GPU, you might change
-these at the `llm_component.py` file under the `private_gpt/components/llm/llm_component.py`.
-
-##### Available LLM config options
-
-The `llm` section of the settings allows for the following configurations:
-
- `mode`: how to run your llm
- `max_new_tokens`: this lets you configure the number of new tokens the LLM will generate and add to the context window (by default Llama.cpp uses `256`)
-
-Example:
-
-```yaml
-llm:
-  mode: local
-  max_new_tokens: 256
-```
-
-If you are getting an out of memory error, you might also try a smaller model or stick to the proposed
-recommended models, instead of custom tuning the parameters.
-
-#### OSX GPU support
+##### Llama-CPP OSX GPU support

 You will need to build [llama.cpp](https://github.com/ggerganov/llama.cpp) with metal support.

@ -127,7 +162,7 @@ More information is available in the documentation of the libraries themselves:
 * [llama-cpp-python's documentation](https://llama-cpp-python.readthedocs.io/en/latest/#installation-with-hardware-acceleration)
 * [llama.cpp](https://github.com/ggerganov/llama.cpp#build)

-#### Windows NVIDIA GPU support
+##### Llama-CPP Windows NVIDIA GPU support

 Windows GPU support is done through CUDA.
 Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
@ -160,7 +195,7 @@ Note that llama.cpp offloads matrix calculations to the GPU but the performance
 still hit heavily due to latency between CPU and GPU communication. You might need to tweak
 batch sizes and other parameters to get the best performance for your particular system.

-#### Linux NVIDIA GPU support and Windows-WSL
+##### Llama-CPP Linux NVIDIA GPU support and Windows-WSL

 Linux GPU support is done through CUDA.
 Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
@ -188,7 +223,7 @@ llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, co
 AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
 ```

-### Known issues and Troubleshooting
+##### Llama-CPP Known issues and Troubleshooting

 Execution of LLMs locally still has a lot of sharp edges, specially when running on non Linux platforms.
 You might encounter several issues:
@ -205,7 +240,7 @@ If, during your installation, something does not go as planned, retry in *verbos

 For example, when installing packages with `pip install`, you can add the option `-vvv` to show the details of the installation.

-#### Troubleshooting: C++ Compiler
+##### Llama-CPP Troubleshooting: C++ Compiler

 If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++
 compiler on your computer.
@ -227,7 +262,7 @@ To install a C++ compiler on Windows 10/11, follow these steps:
   Store and search for Xcode and install it. **Or** you can install the command line tools by running `xcode-select --install`.
 2. If not, you can install clang or gcc with homebrew `brew install gcc`

-#### Troubleshooting: Mac Running Intel
+##### Llama-CPP Troubleshooting: Mac Running Intel

 When running a Mac with Intel hardware (not M1), you may run into _clang: error: the clang compiler does not support '
 -march=native'_ during pip install.
--- a/fern/docs/pages/manual/llms.mdx
+++ b/fern/docs/pages/manual/llms.mdx
@ -25,6 +25,30 @@ When the server is started it will print a log *Application startup complete*.
 Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API
 using Swagger UI.

+#### Customizing low level parameters
+
+Currently, not all the parameters of `llama.cpp` and `llama-cpp-python` are available at PrivateGPT's `settings.yaml` file.
+In case you need to customize parameters such as the number of layers loaded into the GPU, you might change
+these at the `llm_component.py` file under the `private_gpt/components/llm/llm_component.py`.
+
+##### Available LLM config options
+
+The `llm` section of the settings allows for the following configurations:
+
+- `mode`: how to run your llm
+- `max_new_tokens`: this lets you configure the number of new tokens the LLM will generate and add to the context window (by default Llama.cpp uses `256`)
+
+Example:
+
+```yaml
+llm:
+  mode: local
+  max_new_tokens: 256
+```
+
+If you are getting an out of memory error, you might also try a smaller model or stick to the proposed
+recommended models, instead of custom tuning the parameters.
+
 ### Using OpenAI

 If you cannot run a local model (because you don't have a GPU, for example) or for testing purposes, you may
--- a/fern/docs/pages/overview/quickstart.mdx
+++ b/fern/docs/pages/overview/quickstart.mdx
@ -1,21 +0,0 @@
-## Local Installation steps
-
-The steps in [Installation](/installation) section are better explained and cover more
-setup scenarios (macOS, Windows, Linux).
-But if you like one-liners, have python3.11 installed, and you are running a UNIX (macOS or Linux)
-system, you can get up and running on CPU in few lines:
-
-```bash
-git clone https://github.com/imartinez/privateGPT && cd privateGPT && \
-python3.11 -m venv .venv && source .venv/bin/activate && \
-pip install --upgrade pip poetry && poetry install --with ui,local && ./scripts/setup
-
-# Launch the privateGPT API server **and** the gradio UI
-poetry run python3.11 -m private_gpt
-
-# In another terminal, create a new browser window on your private GPT!
-open http://127.0.0.1:8001/
-```
-
-The above is not working, or it is too slow, so **you want to run it on GPU(s)**?
-Please check the more detailed [installation guide](/installation).
--- a/fern/docs/pages/overview/welcome.mdx
+++ b/fern/docs/pages/overview/welcome.mdx
@ -1,20 +1,19 @@
-## Introduction 👋
-
 PrivateGPT provides an **API** containing all the building blocks required to
 build **private, context-aware AI applications**.
 The API follows and extends OpenAI API standard, and supports both normal and streaming responses.
 That means that, if you can use OpenAI API in one of your tools, you can use your own PrivateGPT API instead,
-with no code changes, **and for free** if you are running privateGPT in `local` mode.
-
-Looking for the installation quickstart? [Quickstart installation guide for Linux and macOS](/overview/welcome/quickstart).
-
-Do you want to install it on Windows? Or do you want to take full advantage of your hardware for better performances?
-The installation guide will help you in the [Installation section](/installation).
+with no code changes, **and for free** if you are running privateGPT in a `local` setup.

+Get started by understanding the [Main Concepts and Installation](/installation) and then dive into the [API Reference](/api-reference).

 ## Frequently Visited Resources

 <Cards>
+  <Card
+    title="Main Concepts"
+    icon="fa-solid fa-lines-leaning"
+    href="/installation"
+  />
  <Card
    title="API Reference"
    icon="fa-solid fa-code"
@ -32,6 +31,9 @@ The installation guide will help you in the [Installation section](/installation
  />
 </Cards>

+<br />
+
+
 <Callout intent = "info">
 A working **Gradio UI client** is provided to test the API, together with a set of useful tools such as bulk
 model download script, ingestion script, documents folder watch, etc.
--- a/settings-local.yaml
+++ b/settings-local.yaml
@ -3,3 +3,22 @@ server:

 llm:
  mode: local
+  # Should be matching the selected model
+  max_new_tokens: 512
+  context_window: 3900
+  tokenizer: mistralai/Mistral-7B-Instruct-v0.2
+
+embedding:
+  mode: local
+
+vectorstore:
+  database: qdrant
+
+qdrant:
+  path: local_data/private_gpt/qdrant
+
+local:
+  prompt_style: "mistral"
+  llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
+  llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
+  embedding_hf_model_name: BAAI/bge-small-en-v1.5
--- a/settings-ollama.yaml
+++ b/settings-ollama.yaml
@ -0,0 +1,24 @@
+server:
+  env_name: ${APP_ENV:ollama}
+
+llm:
+  mode: ollama
+  max_new_tokens: 512
+  context_window: 3900
+
+ollama:
+  model: llama2
+  api_base: http://localhost:11434
+
+embedding:
+  mode: local
+
+vectorstore:
+  database: qdrant
+
+qdrant:
+  path: local_data/private_gpt/qdrant
+
+local:
+  prompt_style: "llama2"
+  embedding_hf_model_name: BAAI/bge-small-en-v1.5
--- a/settings-sagemaker.yaml
+++ b/settings-sagemaker.yaml
@ -9,6 +9,9 @@ ui:
 llm:
  mode: sagemaker

+embedding:
+  mode: sagemaker
+
 sagemaker:
  llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140
  embedding_endpoint_name: huggingface-pytorch-inference-2023-11-03-07-41-36-479
--- a/settings.yaml
+++ b/settings.yaml
@ -34,7 +34,6 @@ ui:
  delete_file_button_enabled: true
  delete_all_files_button_enabled: true

-
 llm:
  mode: local
  # Should be matching the selected model