Next version of PrivateGPT (#1077)

* Dockerize private-gpt * Use port 8001 for local development * Add setup script * Add CUDA Dockerfile * Create README.md * Make the API use OpenAI response format * Truncate prompt * refactor: add models and __pycache__ to .gitignore * Better naming * Update readme * Move models ignore to it's folder * Add scaffolding * Apply formatting * Fix tests * Working sagemaker custom llm * Fix linting * Fix linting * Enable streaming * Allow all 3.11 python versions * Use llama 2 prompt format and fix completion * Restructure (#3) Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local> * Fix Dockerfile * Use a specific build stage * Cleanup * Add FastAPI skeleton * Cleanup openai package * Fix DI and tests * Split tests and tests with coverage * Remove old scaffolding * Add settings logic (#4) * Add settings logic * Add settings for sagemaker --------- Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local> * Local LLM (#5) * Add settings logic * Add settings for sagemaker * Add settings-local-example.yaml * Delete terraform files * Refactor tests to use fixtures * Join deltas * Add local model support --------- Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local> * Update README.md * Fix tests * Version bump * Enable simple llamaindex observability (#6) * Enable simple llamaindex observability * Improve code through linting * Update README.md * Move to async (#7) * Migrate implementation to use asyncio * Formatting * Cleanup * Linting --------- Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local> * Query Docs and gradio UI * Remove unnecessary files * Git ignore chromadb folder * Async migration + DI Cleanup * Fix tests * Add integration test * Use fastapi responses * Retrieval service with partial implementation * Cleanup * Run formatter * Fix types * Fetch nodes asynchronously * Install local dependencies in tests * Install ui dependencies in tests * Install dependencies for llama-cpp * Fix sudo * Attempt to fix cuda issues * Attempt to fix cuda issues * Try to reclaim some space from ubuntu machine * Retrieval with context * Fix lint and imports * Fix mypy * Make retrieval API a POST * Make Completions body a dataclass * Fix LLM chat message order * Add Query Chunks to Gradio UI * Improve rag query prompt * Rollback CI Changes * Move to sync code * Using Llamaindex abstraction for query retrieval * Fix types * Default to CONDENSED chat mode for contextualized chat * Rename route function * Add Chat endpoint * Remove webhooks * Add IntelliJ run config to gitignore * .gitignore applied * Sync chat completion * Refactor total * Typo in context_files.py * Add embeddings component and service * Remove wrong dataclass from IngestService * Filter by context file id implementation * Fix typing * Implement context_filter and separate from the bool use_context in the API * Change chunks api to avoid conceptual class of the context concept * Deprecate completions and fix tests * Remove remaining dataclasses * Use embedding component in ingest service * Fix ingestion to have multipart and local upload * Fix ingestion API * Add chunk tests * Add configurable paths * Cleaning up * Add more docs * IngestResponse includes a list of IngestedDocs * Use IngestedDoc in the Chunk document reference * Rename ingest routes to ingest_router.py * Fix test working directory for intellij * Set testpaths for pytest * Remove unused as_chat_engine * Add .fleet ide to gitignore * Make LLM and Embedding model configurable * Fix imports and checks * Let local_data folder exist empty in the repository * Don't use certain metadata in LLM * Remove long lines * Fix windows installation * Typos * Update poetry.lock * Add TODO for linux * Script and first version of docs * No jekill build * Fix relative url to openapi json * Change default docs values * Move chromadb dependency to the general group * Fix tests to use separate local_data * Create CNAME * Update CNAME * Fix openapi.json relative path * PrivateGPT logo * WIP OpenAPI documentation metadata * Add ingest script (#11) * Add ingest script * Fix broken name refactor * Add ingest docs and Makefile script * Linting * Move transformers to main dependency * Move torch to main dependencies * Don't load HuggingFaceEmbedding in tests * Fix lint --------- Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local> * Rename file to camel_case * Commit settings-local.yaml * Move documentation to public docs * Fix docker image for linux * Installation and Running the Server documentation * Move back to docs folder, as it is the only supported by github pages * Delete CNAME * Create CNAME * Delete CNAME * Create CNAME * Improved API documentation * Fix lint * Completions documentation * Updated openapi scheme * Ingestion API doc * Minor doc changes * Updated openapi scheme * Chunks API documentation * Embeddings and Health API, and homogeneous responses * Revamp README with new skeleton of content * More docs * PrivateGPT logo * Improve UI * Update ingestion docu * Update README with new sections * Use context window in the retriever * Gradio Documentation * Add logo to UI * Include Contributing and Community sections to README * Update links to resources in the README * Small README.md updates * Wrap lines of README.md * Don't put health under /v1 * Add copy button to Chat * Architecture documentation * Updated openapi.json * Updated openapi.json * Updated openapi.json * Change UI label * Update documentation * Add releases link to README.md * Gradio avatar and stop debug * Readme update * Clean old files * Remove unused terraform checks * Update twitter link. * Disable minimum coverage * Clean install message in README.md --------- Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local> Co-authored-by: Iván Martínez <ivanmartit@gmail.com> Co-authored-by: RubenGuerrero <ruben.guerrero@boopos.com> Co-authored-by: Daniel Gallego Vico <daniel.gallego@bq.com>
2025-09-15 22:59:53 +00:00 · 2023-10-19 16:04:35 +02:00
parent 78d1ef44ad
commit 51cc638758
98 changed files with 7067 additions and 3397 deletions
--- a/docs/.nojekyll
+++ b/docs/.nojekyll
--- a/docs/CNAME
+++ b/docs/CNAME
@@ -0,0 +1 @@
+docs.privategpt.dev
--- a/docs/description.md
+++ b/docs/description.md
@@ -0,0 +1,389 @@
+## Introduction
+
+PrivateGPT provides an **API** containing all the building blocks required to build
+**private, context-aware AI applications**. The API follows and extends OpenAI API standard, and supports
+both normal and streaming responses.
+
+The API is divided in two logical blocks:
+
+- High-level API, abstracting all the complexity of a RAG (Retrieval Augmented Generation) pipeline implementation:
+    - Ingestion of documents: internally managing document parsing, splitting, metadata extraction,
+      embedding generation and storage.
+    - Chat & Completions using context from ingested documents: abstracting the retrieval of context, the prompt
+      engineering and the response generation.
+- Low-level API, allowing advanced users to implement their own complex pipelines:
+    - Embeddings generation: based on a piece of text.
+    - Contextual chunks retrieval: given a query, returns the most relevant chunks of text from the ingested
+      documents.
+
+> A working **Gradio UI client** is provided to test the API, together with a set of
+> useful tools such as bulk model download script, ingestion script, documents folder
+> watch, etc.
+
+## Quick Local Installation steps
+The steps in `Installation and Settings` section are better explained and cover more 
+setup scenarios. But if you are looking for a quick setup guide, here it is:
+
+```
+# Clone the repo
+git clone https://github.com/imartinez/privateGPT
+cd privateGPT
+
+# Install Python 3.11
+pyenv install 3.11
+pyenv local 3.11
+
+# Install dependencies
+poetry install --with ui,local
+
+# Download Embedding and LLM models
+poetry run python scripts/setup
+
+# (Optional) For Mac with Metal GPU, enable it. Check Installation and Settings section 
+to know how to enable GPU on other platforms
+CMAKE_ARGS="-DLLAMA_METAL=on" pip install --force-reinstall --no-cache-dir llama-cpp-python
+
+# Run the local server  
+PGPT_PROFILES=local make run
+
+# Note: on Mac with Metal you should see a ggml_metal_add_buffer log, stating GPU is 
+being used
+
+# Navigate to the UI and try it out! 
+http://localhost:8001/
+```
+
+
+## Installation and Settings
+
+### Base requirements to run PrivateGPT
+
+* Git clone PrivateGPT repository, and navigate to it:
+```
+  git clone https://github.com/imartinez/privateGPT
+  cd privateGPT
+```
+* Install Python 3.11. Ideally through a python version manager like `pyenv`. 
+  Python 3.12 
+  should work too. Earlier python versions are not supported.
+    * osx/linux: [pyenv](https://github.com/pyenv/pyenv)
+    * windows: [pyenv-win](https://github.com/pyenv-win/pyenv-win)
+  
+```  
+pyenv install 3.11
+pyenv local 3.11
+```
+* Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) for dependency management: 
+  
+* Install `make` for scripts:
+    * osx: (Using homebrew): `brew install make`
+    * windows: (Using chocolatey) `choco install make`
+
+### Install dependencies
+
+Install the dependencies:
+
+```bash
+poetry install --with ui
+```
+
+Verify everything is working by running `make run` (or `poetry run python -m private_gpt`) and navigate to
+http://localhost:8001. You should see a [Gradio UI](https://gradio.app/) **configured with a mock LLM** that will
+echo back the input. Later we'll see how to configure a real LLM.
+
+### Settings
+
+> Note: the default settings of PrivateGPT work out-of-the-box for a 100% local setup. Skip this section if you just
+> want to test PrivateGPT locally, and come back later to learn about more configuration options.
+
+PrivateGPT is configured through *profiles* that are defined using yaml files, and selected through env variables.
+The full list of properties configurable can be found in `settings.yaml`
+
+#### env var `PGPT_SETTINGS_FOLDER`
+
+The location of the settings folder. Defaults to the root of the project.
+Should contain the default `settings.yaml` and any other `settings-{profile}.yaml`.
+
+#### env var `PGPT_PROFILES`
+
+By default, the profile definition in `settings.yaml` is loaded.
+Using this env var you can load additional profiles; format is a comma separated list of profile names.
+This will merge `settings-{profile}.yaml` on top of the base settings file.
+
+For example:
+`PGPT_PROFILES=local,cuda` will load `settings-local.yaml`
+and `settings-cuda.yaml`, their contents will be merged with
+later profiles properties overriding values of earlier ones like `settings.yaml`.
+
+During testing, the `test` profile will be active along with the default, therefore `settings-test.yaml`
+file is required.
+
+#### Environment variables expansion
+
+Configuration files can contain environment variables,
+they will be expanded at runtime.
+
+Expansion must follow the pattern `${VARIABLE_NAME:default_value}`.
+
+For example, the following configuration will use the value of the `PORT`
+environment variable or `8001` if it's not set.
+Missing variables with no default will produce an error.
+
+```yaml
+server:
+  port: ${PORT:8001}
+```
+
+### Local LLM requirements
+
+Install extra dependencies for local execution:
+
+```bash
+poetry install --with local
+```
+
+For PrivateGPT to run fully locally GPU acceleration is required
+(CPU execution is possible, but very slow), however,
+typical Macbook laptops or window desktops with mid-range GPUs lack VRAM to run
+even the smallest LLMs. For that reason
+**local execution is only supported for models compatible with [llama.cpp](https://github.com/ggerganov/llama.cpp)**
+
+These two models are known to work well:
+
+* https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF
+* https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF (recommended)
+
+To ease the installation process, use the `setup` script that will download both
+the embedding and the LLM model and place them in the correct location (under `models` folder):
+
+```bash
+poetry run python scripts/setup
+```
+
+If you are ok with CPU execution, you can skip the rest of this section.
+
+As stated before, llama.cpp is required and in
+particular [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
+is used.
+
+> It's highly encouraged that you fully read llama-cpp and llama-cpp-python documentation relevant to your platform.
+> Running into installation issues is very likely, and you'll need to troubleshoot them yourself.
+
+#### OSX GPU support
+
+You will need to build [llama.cpp](https://github.com/ggerganov/llama.cpp) with 
+metal support. To do that run:
+
+```bash
+CMAKE_ARGS="-DLLAMA_METAL=on" pip install --force-reinstall --no-cache-dir llama-cpp-python
+```
+
+#### Windows GPU support
+
+Windows GPU support is done through CUDA or similar open source technologies.
+Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
+dependencies.
+
+Some tips to get it working with an NVIDIA card and CUDA (Tested on Windows 10 with CUDA 11.5 RTX 3070):
+
+* Install latest VS2022 (and build tools) https://visualstudio.microsoft.com/vs/community/
+* Install CUDA toolkit https://developer.nvidia.com/cuda-downloads
+* [Optional] Install CMake to troubleshoot building issues by compiling llama.cpp directly https://cmake.org/download/
+
+If you have all required dependencies properly configured running the
+following powershell command should succeed.
+
+```powershell
+$env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
+```
+
+If your installation was correct, you should see a message similar to the following next
+time you start the server `BLAS = 1`.
+
+```
+llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
+AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | 
+```
+
+Note that llama.cpp offloads matrix calculations to the GPU but the performance is
+still hit heavily due to latency between CPU and GPU communication. You might need to tweak
+batch sizes and other parameters to get the best performance for your particular system.
+
+#### Linux GPU support
+
+🚧 Under construction 🚧
+
+#### Known issues and Troubleshooting
+
+Execution of LLMs locally still has a lot of sharp edges, specially when running on non Linux platforms.
+You might encounter several issues:
+
+* Performance: RAM or VRAM usage is very high, your computer might experience slowdowns or even crashes.
+* GPU Virtualization on Windows and OSX: Simply not possible with docker desktop, you have to run the server directly on
+  the host.
+* Building errors: Some of PrivateGPT dependencies need to build native code, and they might fail on some platforms.
+  Most likely you are missing some dev tools in your machine (updated C++ compiler, CUDA is not on PATH, etc.).
+  If you encounter any of these issues, please open an issue and we'll try to help.
+
+#### Troubleshooting: C++ Compiler
+If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++ compiler on your computer.
+
+**For Windows 10/11**
+
+To install a C++ compiler on Windows 10/11, follow these steps:
+
+1. Install Visual Studio 2022.
+2. Make sure the following components are selected:
+   * Universal Windows Platform development
+   * C++ CMake tools for Windows
+3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/).
+4. Run the installer and select the `gcc` component.
+
+#### Troubleshooting: Mac Running Intel
+When running a Mac with Intel hardware (not M1), you may run into _clang: error: the clang compiler does not support '-march=native'_ during pip install.
+
+If so set your archflags during pip install. eg: _ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt_
+
+## Running the Server
+
+After following the installation steps you should be ready to go. Here are some common run setups:
+
+### Running 100% locally
+
+Make sure you have followed the *Local LLM requirements* section before moving on.
+
+This command will start PrivateGPT using the `settings.yaml` (default profile) together with the `settings-local.yaml`
+configuration files. By default, it will enable both the API and the Gradio UI. Run:
+
+```
+PGPT_PROFILES=local make run
+``` 
+
+or
+
+```
+PGPT_PROFILES=local poetry run python -m private_gpt
+```
+
+When the server is started it will print a log *Application startup complete*.
+Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API
+using Swagger UI.
+
+### Local server using OpenAI as LLM
+
+If you cannot run a local model (because you don't have a GPU, for example) or for testing purposes, you may
+decide to run PrivateGPT using OpenAI as the LLM.
+
+In order to do so, create a profile `settings-openai.yaml` with the following contents:
+
+```yaml
+llm:
+  mode: openai
+
+openai:
+  api_key: <your_openai_api_key>  # You could skip this configuration and use the OPENAI_API_KEY env var instead
+```
+
+And run PrivateGPT loading that profile you just created:
+
+```PGPT_PROFILES=openai make run```
+
+or
+
+```PGPT_PROFILES=openai poetry run python -m private_gpt```
+
+> Note this will still use the local Embeddings model, as it is ok to use it on a CPU.
+> We'll support using OpenAI embeddings in a future release.
+
+When the server is started it will print a log *Application startup complete*.
+Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API.
+You'll notice the speed and quality of response is higher, given you are using OpenAI's servers for the heavy
+computations.
+
+### Use AWS's Sagemaker
+
+🚧 Under construction 🚧
+
+## Gradio UI user manual
+
+Gradio UI is a ready to use way of testing most of PrivateGPT API functionalities.
+
+![Gradio PrivateGPT](https://lh3.googleusercontent.com/drive-viewer/AK7aPaBTlIX9j5nsQ87XvcRgf3vhv6UG6pgy4j4IH5mYIo6dHcfJ5IUMiVHoqyQwjTnjRITxYTQ3TcF3pfPXyyWB3HS8hKMWDA=s1600)
+
+### Execution Modes
+
+It has 3 modes of execution (you can select in the top-left):
+* Query Documents: uses the context from the 
+  ingested documents to answer the questions posted in the chat. It also takes
+  into account previous chat messages as context. 
+  * Makes use of `/chat/completions` API with `use_context=true` and no 
+    `context_filter`.
+* LLM Chat: simple, non-contextual chat with the LLM. The ingested documents won't
+  be taken into account, only the previous messages.
+  * Makes use of `/chat/completions` API with `use_context=false`.
+* Context Chunks: returns the JSON representation of the 2 most related text
+  chunks, together with their metadata, source document and previous and next 
+  chunks. 
+  * Makes use of `/chunks` API with no `context_filter`, `limit=2` and 
+    `prev_next_chunks=1`. 
+
+### Document Ingestion
+
+Ingest documents by using the `Upload a File` button. You can check the progress of 
+the ingestion in the console logs of the server. 
+
+The list of ingested files is shown below the button.
+
+If you want to delete the ingested documents, refer to *Reset Local documents 
+database* section in the documentation.
+
+### Chat
+
+Normal chat interface, self-explanatory ;) 
+
+You can check the actual prompt being passed to the LLM by looking at the logs of 
+the server. We'll add better observability in future releases.
+
+## Deployment options
+
+🚧 We are working on Dockerized deployment guidelines 🚧
+
+## Observability
+
+Basic logs are enabled using LlamaIndex 
+basic logging (for example ingestion progress or LLM prompts and answers). 
+
+🚧 We are working on improved Observability. 🚧 
+
+## Ingesting & Managing Documents
+
+🚧 Document Update and Delete are still WIP. 🚧
+
+The ingestion of documents can be done in different ways:
+* Using the `/ingest` API
+* Using the Gradio UI
+* Using the Bulk Local Ingestion functionality (check next section)
+
+### Bulk Local Ingestion
+
+When you are running PrivateGPT in a fully local setup, you can ingest a complete folder for convenience (containing
+pdf, text files, etc.)
+and optionally watch changes on it with the command:
+
+```bash
+make ingest /path/to/folder -- --watch
+```
+
+After ingestion is complete, you should be able to chat with your documents
+by navigating to http://localhost:8001 and using the option `Query documents`,
+or using the completions / chat API.
+
+### Reset Local documents database
+
+When running in a local setup, you can remove all ingested documents by simply 
+deleting all contents of `local_data` folder (except .gitignore). 
+
+## API
+
+As explained in the introduction, the API contains high level APIs (ingestion and chat/completions) and low level APIs
+(embeddings and chunk retrieval). In this section the different specific API calls are explained.
--- a/docs/index.html
+++ b/docs/index.html
@@ -0,0 +1,22 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <title>PrivateGPT Docs</title>
+    <!-- needed for adaptive design -->
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <link href="https://fonts.googleapis.com/css?family=Montserrat:300,400,700|Roboto:300,400,700" rel="stylesheet">
+    <link rel="shortcut icon" href="https://fastapi.tiangolo.com/img/favicon.png">
+    <!-- ReDoc doesn't change outer page styles -->
+    <style>
+      body {
+        margin: 0;
+        padding: 0;
+      }
+    </style>
+</head>
+<body>
+    <noscript> ReDoc requires Javascript to function. Please enable it to browse the documentation. </noscript>
+    <redoc spec-url="/openapi.json"></redoc>
+    <script src="https://cdn.jsdelivr.net/npm/redoc@next/bundles/redoc.standalone.js"></script>
+</body>
--- a/docs/logo.png
+++ b/docs/logo.png
--- a/docs/openapi.json
+++ b/docs/openapi.json