Merge branch 'main' into update-ui-include-model-info-#1647

This commit is contained in:
Ingrid Stevens 2024-03-16 14:51:46 +01:00 committed by GitHub
commit b81bfce770
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
54 changed files with 2237 additions and 1451 deletions

View File

@ -25,6 +25,6 @@ runs:
python-version: ${{ inputs.python_version }}
cache: "poetry"
- name: Install Dependencies
run: poetry install --with ui --no-root
run: poetry install --extras "ui vector-stores-qdrant" --no-root
shell: bash

View File

@ -1,5 +1,13 @@
# Changelog
## [0.4.0](https://github.com/imartinez/privateGPT/compare/v0.3.0...v0.4.0) (2024-03-06)
### Features
* Upgrade to LlamaIndex to 0.10 ([#1663](https://github.com/imartinez/privateGPT/issues/1663)) ([45f0571](https://github.com/imartinez/privateGPT/commit/45f05711eb71ffccdedb26f37e680ced55795d44))
* **Vector:** support pgvector ([#1624](https://github.com/imartinez/privateGPT/issues/1624)) ([cd40e39](https://github.com/imartinez/privateGPT/commit/cd40e3982b780b548b9eea6438c759f1c22743a8))
## [0.3.0](https://github.com/imartinez/privateGPT/compare/v0.2.0...v0.3.0) (2024-02-16)

View File

@ -14,7 +14,7 @@ FROM base as dependencies
WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./
RUN poetry install --with ui
RUN poetry install --extras "ui vector-stores-qdrant"
FROM base as app

View File

@ -24,8 +24,7 @@ FROM base as dependencies
WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./
RUN poetry install --with local
RUN poetry install --with ui
RUN poetry install --extras "ui embeddings-huggingface llms-llama-cpp vector-stores-qdrant"
FROM base as app

View File

@ -30,15 +30,15 @@ navigation:
layout:
- section: Welcome
contents:
- page: Welcome
- page: Introduction
path: ./docs/pages/overview/welcome.mdx
- page: Quickstart
path: ./docs/pages/overview/quickstart.mdx
# How to install privateGPT, with FAQ and troubleshooting
- tab: installation
layout:
- section: Getting started
contents:
- page: Main Concepts
path: ./docs/pages/installation/concepts.mdx
- page: Installation
path: ./docs/pages/installation/installation.mdx
# Manual of privateGPT: how to use it and configure it
@ -58,6 +58,8 @@ navigation:
contents:
- page: Vector Stores
path: ./docs/pages/manual/vectordb.mdx
- page: Node Stores
path: ./docs/pages/manual/nodestore.mdx
- section: Advanced Setup
contents:
- page: LLM Backends

View File

@ -0,0 +1,60 @@
PrivateGPT is a service that wraps a set of AI RAG primitives in a comprehensive set of APIs providing a private, secure, customizable and easy to use GenAI development framework.
It uses FastAPI and LLamaIndex as its core frameworks. Those can be customized by changing the codebase itself.
It supports a variety of LLM providers, embeddings providers, and vector stores, both local and remote. Those can be easily changed without changing the codebase.
# Different Setups support
## Setup configurations available
You get to decide the setup for these 3 main components:
- LLM: the large language model provider used for inference. It can be local, or remote, or even OpenAI.
- Embeddings: the embeddings provider used to encode the input, the documents and the users' queries. Same as the LLM, it can be local, or remote, or even OpenAI.
- Vector store: the store used to index and retrieve the documents.
There is an extra component that can be enabled or disabled: the UI. It is a Gradio UI that allows to interact with the API in a more user-friendly way.
### Setups and Dependencies
Your setup will be the combination of the different options available. You'll find recommended setups in the [installation](/installation) section.
PrivateGPT uses poetry to manage its dependencies. You can install the dependencies for the different setups by running `poetry install --extras "<extra1> <extra2>..."`.
Extras are the different options available for each component. For example, to install the dependencies for a a local setup with UI and qdrant as vector database, Ollama as LLM and HuggingFace as local embeddings, you would run
`poetry install --extras "ui vector-stores-qdrant llms-ollama embeddings-huggingface"`.
Refer to the [installation](/installation) section for more details.
### Setups and Configuration
PrivateGPT uses yaml to define its configuration in files named `settings-<profile>.yaml`.
Different configuration files can be created in the root directory of the project.
PrivateGPT will load the configuration at startup from the profile specified in the `PGPT_PROFILES` environment variable.
For example, running:
```bash
PGPT_PROFILES=ollama make run
```
will load the configuration from `settings.yaml` and `settings-ollama.yaml`.
- `settings.yaml` is always loaded and contains the default configuration.
- `settings-ollama.yaml` is loaded if the `ollama` profile is specified in the `PGPT_PROFILES` environment variable. It can override configuration from the default `settings.yaml`
## About Fully Local Setups
In order to run PrivateGPT in a fully local setup, you will need to run the LLM, Embeddings and Vector Store locally.
### Vector stores
The vector stores supported (Qdrant, ChromaDB and Postgres) run locally by default.
### Embeddings
For local Embeddings there are two options:
* (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
* You can use the 'embeddings-huggingface' option in PrivateGPT, which will use HuggingFace.
In order for HuggingFace LLM to work (the second option), you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
```bash
poetry run python scripts/setup
```
### LLM
For local LLM there are two options:
* (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
* You can use the 'llms-llama-cpp' option in PrivateGPT, which will use LlamaCPP. It works great on Mac with Metal most of the times (leverages Metal GPU), but it can be tricky in certain Linux and Windows distributions, depending on the GPU. In the installation document you'll find guides and troubleshooting.
In order for LlamaCPP powered LLM to work (the second option), you need to download the LLM model to the `models` folder. You can do so by running the `setup` script:
```bash
poetry run python scripts/setup
```

View File

@ -1,8 +1,8 @@
## Installation and Settings
It is important that you review the Main Concepts before you start the installation process.
### Base requirements to run PrivateGPT
## Base requirements to run PrivateGPT
* Git clone PrivateGPT repository, and navigate to it:
* Clone PrivateGPT repository, and navigate to it:
```bash
git clone https://github.com/imartinez/privateGPT
@ -21,93 +21,205 @@ pyenv local 3.11
* Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) for dependency management:
* Have a valid C++ compiler like gcc. See [Troubleshooting: C++ Compiler](#troubleshooting-c-compiler) for more details.
* Install `make` for scripts:
* Install `make` to be able to run the different scripts:
* osx: (Using homebrew): `brew install make`
* windows: (Using chocolatey) `choco install make`
### Install dependencies
## Install and run your desired setup
Install the dependencies:
PrivateGPT allows to customize the setup -from fully local to cloud based- by deciding the modules to use.
Here are the different options available:
- LLM: "llama-cpp", "ollama", "sagemaker", "openai", "openailike", "azopenai"
- Embeddings: "huggingface", "openai", "sagemaker", "azopenai"
- Vector stores: "qdrant", "chroma", "postgres"
- UI: whether or not to enable UI (Gradio) or just go with the API
In order to only install the required dependencies, PrivateGPT offers different `extras` that can be combined during the installation process:
```bash
poetry install --with ui
poetry install --extras "<extra1> <extra2>..."
```
Verify everything is working by running `make run` (or `poetry run python -m private_gpt`) and navigate to
http://localhost:8001. You should see a [Gradio UI](https://gradio.app/) **configured with a mock LLM** that will
echo back the input. Below we'll see how to configure a real LLM.
Where `<extra>` can be any of the following:
### Settings
- ui: adds support for UI using Gradio
- llms-ollama: adds support for Ollama LLM, the easiest way to get a local LLM running, requires Ollama running locally
- llms-llama-cpp: adds support for local LLM using LlamaCPP - expect a messy installation process on some platforms
- llms-sagemaker: adds support for Amazon Sagemaker LLM, requires Sagemaker inference endpoints
- llms-openai: adds support for OpenAI LLM, requires OpenAI API key
- llms-openai-like: adds support for 3rd party LLM providers that are compatible with OpenAI's API
- llms-azopenai: adds support for Azure OpenAI LLM, requires Azure OpenAI inference endpoints
- embeddings-ollama: adds support for Ollama Embeddings, requires Ollama running locally
- embeddings-huggingface: adds support for local Embeddings using HuggingFace
- embeddings-sagemaker: adds support for Amazon Sagemaker Embeddings, requires Sagemaker inference endpoints
- embeddings-openai = adds support for OpenAI Embeddings, requires OpenAI API key
- embeddings-azopenai = adds support for Azure OpenAI Embeddings, requires Azure OpenAI inference endpoints
- vector-stores-qdrant: adds support for Qdrant vector store
- vector-stores-chroma: adds support for Chroma DB vector store
- vector-stores-postgres: adds support for Postgres vector store
<Callout intent="info">
The default settings of PrivateGPT should work out-of-the-box for a 100% local setup. **However**, as is, it runs exclusively on your CPU.
Skip this section if you just want to test PrivateGPT locally, and come back later to learn about more configuration options (and have better performances).
</Callout>
## Recommended Setups
<br />
There are just some examples of recommended setups. You can mix and match the different options to fit your needs.
You'll find more information in the Manual section of the documentation.
### Local LLM requirements
> **Important for Windows**: In the examples below or how to run PrivateGPT with `make run`, `PGPT_PROFILES` env var is being set inline following Unix command line syntax (works on MacOS and Linux).
If you are using Windows, you'll need to set the env var in a different way, for example:
Install extra dependencies for local execution:
```powershell
# Powershell
$env:PGPT_PROFILES="ollama"
make run
```
or
```cmd
# CMD
set PGPT_PROFILES=ollama
make run
```
### Local, Ollama-powered setup - RECOMMENDED
**The easiest way to run PrivateGPT fully locally** is to depend on Ollama for the LLM. Ollama provides local LLM and Embeddings super easy to install and use, abstracting the complexity of GPU support. It's the recommended setup for local development.
Go to [ollama.ai](https://ollama.ai/) and follow the instructions to install Ollama on your machine.
After the installation, make sure the Ollama desktop app is closed.
Install the models to be used, the default settings-ollama.yaml is configured to user `mistral 7b` LLM (~4GB) and `nomic-embed-text` Embeddings (~275MB). Therefore:
```bash
poetry install --with local
ollama pull mistral
ollama pull nomic-embed-text
```
For PrivateGPT to run fully locally GPU acceleration is required
(CPU execution is possible, but very slow), however,
typical Macbook laptops or window desktops with mid-range GPUs lack VRAM to run
even the smallest LLMs. For that reason
**local execution is only supported for models compatible with [llama.cpp](https://github.com/ggerganov/llama.cpp)**
Now, start Ollama service (it will start a local inference server, serving both the LLM and the Embeddings):
```bash
ollama serve
```
These two models are known to work well:
Once done, on a different terminal, you can install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
```
* https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF
* https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF (recommended)
Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.
To ease the installation process, use the `setup` script that will download both
the embedding and the LLM model and place them in the correct location (under `models` folder):
```bash
PGPT_PROFILES=ollama make run
```
PrivateGPT will use the already existing `settings-ollama.yaml` settings file, which is already configured to use Ollama LLM and Embeddings, and Qdrant. Review it and adapt it to your needs (different models, different Ollama port, etc.)
The UI will be available at http://localhost:8001
### Private, Sagemaker-powered setup
If you need more performance, you can run a version of PrivateGPT that relies on powerful AWS Sagemaker machines to serve the LLM and Embeddings.
You need to have access to sagemaker inference endpoints for the LLM and / or the embeddings, and have AWS credentials properly configured.
Edit the `settings-sagemaker.yaml` file to include the correct Sagemaker endpoints.
Then, install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-sagemaker embeddings-sagemaker vector-stores-qdrant"
```
Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.
```bash
PGPT_PROFILES=sagemaker make run
```
PrivateGPT will use the already existing `settings-sagemaker.yaml` settings file, which is already configured to use Sagemaker LLM and Embeddings endpoints, and Qdrant.
The UI will be available at http://localhost:8001
### Non-Private, OpenAI-powered test setup
If you want to test PrivateGPT with OpenAI's LLM and Embeddings -taking into account your data is going to OpenAI!- you can run the following command:
You need an OPENAI API key to run this setup.
Edit the `settings-openai.yaml` file to include the correct API KEY. Never commit it! It's a secret! As an alternative to editing `settings-openai.yaml`, you can just set the env var OPENAI_API_KEY.
Then, install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-openai embeddings-openai vector-stores-qdrant"
```
Once installed, you can run PrivateGPT.
```bash
PGPT_PROFILES=openai make run
```
PrivateGPT will use the already existing `settings-openai.yaml` settings file, which is already configured to use OpenAI LLM and Embeddings endpoints, and Qdrant.
The UI will be available at http://localhost:8001
### Non-Private, Azure OpenAI-powered test setup
If you want to test PrivateGPT with Azure OpenAI's LLM and Embeddings -taking into account your data is going to Azure OpenAI!- you can run the following command:
You need to have access to Azure OpenAI inference endpoints for the LLM and / or the embeddings, and have Azure OpenAI credentials properly configured.
Edit the `settings-azopenai.yaml` file to include the correct Azure OpenAI endpoints.
Then, install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-azopenai embeddings-azopenai vector-stores-qdrant"
```
Once installed, you can run PrivateGPT.
```bash
PGPT_PROFILES=azopenai make run
```
PrivateGPT will use the already existing `settings-azopenai.yaml` settings file, which is already configured to use Azure OpenAI LLM and Embeddings endpoints, and Qdrant.
The UI will be available at http://localhost:8001
### Local, Llama-CPP powered setup
If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command:
```bash
poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"
```
In order for local LLM and embeddings to work, you need to download the models to the `models` folder. You can do so by running the `setup` script:
```bash
poetry run python scripts/setup
```
If you are ok with CPU execution, you can skip the rest of this section.
Once installed, you can run PrivateGPT with the following command:
As stated before, llama.cpp is required and in
```bash
PGPT_PROFILES=local make run
```
PrivateGPT will load the already existing `settings-local.yaml` file, which is already configured to use LlamaCPP LLM, HuggingFace embeddings and Qdrant.
The UI will be available at http://localhost:8001
#### Llama-CPP support
For PrivateGPT to run fully locally without Ollama, Llama.cpp is required and in
particular [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
is used.
You'll need to have a valid C++ compiler like gcc installed. See [Troubleshooting: C++ Compiler](#troubleshooting-c-compiler) for more details.
> It's highly encouraged that you fully read llama-cpp and llama-cpp-python documentation relevant to your platform.
> Running into installation issues is very likely, and you'll need to troubleshoot them yourself.
#### Customizing low level parameters
Currently, not all the parameters of `llama.cpp` and `llama-cpp-python` are available at PrivateGPT's `settings.yaml` file.
In case you need to customize parameters such as the number of layers loaded into the GPU, you might change
these at the `llm_component.py` file under the `private_gpt/components/llm/llm_component.py`.
##### Available LLM config options
The `llm` section of the settings allows for the following configurations:
- `mode`: how to run your llm
- `max_new_tokens`: this lets you configure the number of new tokens the LLM will generate and add to the context window (by default Llama.cpp uses `256`)
Example:
```yaml
llm:
mode: local
max_new_tokens: 256
```
If you are getting an out of memory error, you might also try a smaller model or stick to the proposed
recommended models, instead of custom tuning the parameters.
#### OSX GPU support
##### Llama-CPP OSX GPU support
You will need to build [llama.cpp](https://github.com/ggerganov/llama.cpp) with metal support.
@ -127,7 +239,7 @@ More information is available in the documentation of the libraries themselves:
* [llama-cpp-python's documentation](https://llama-cpp-python.readthedocs.io/en/latest/#installation-with-hardware-acceleration)
* [llama.cpp](https://github.com/ggerganov/llama.cpp#build)
#### Windows NVIDIA GPU support
##### Llama-CPP Windows NVIDIA GPU support
Windows GPU support is done through CUDA.
Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
@ -160,7 +272,7 @@ Note that llama.cpp offloads matrix calculations to the GPU but the performance
still hit heavily due to latency between CPU and GPU communication. You might need to tweak
batch sizes and other parameters to get the best performance for your particular system.
#### Linux NVIDIA GPU support and Windows-WSL
##### Llama-CPP Linux NVIDIA GPU support and Windows-WSL
Linux GPU support is done through CUDA.
Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
@ -188,7 +300,7 @@ llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, co
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
```
### Known issues and Troubleshooting
##### Llama-CPP Known issues and Troubleshooting
Execution of LLMs locally still has a lot of sharp edges, specially when running on non Linux platforms.
You might encounter several issues:
@ -205,7 +317,7 @@ If, during your installation, something does not go as planned, retry in *verbos
For example, when installing packages with `pip install`, you can add the option `-vvv` to show the details of the installation.
#### Troubleshooting: C++ Compiler
##### Llama-CPP Troubleshooting: C++ Compiler
If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++
compiler on your computer.
@ -227,7 +339,7 @@ To install a C++ compiler on Windows 10/11, follow these steps:
Store and search for Xcode and install it. **Or** you can install the command line tools by running `xcode-select --install`.
2. If not, you can install clang or gcc with homebrew `brew install gcc`
#### Troubleshooting: Mac Running Intel
##### Llama-CPP Troubleshooting: Mac Running Intel
When running a Mac with Intel hardware (not M1), you may run into _clang: error: the clang compiler does not support '
-march=native'_ during pip install.

View File

@ -25,6 +25,30 @@ When the server is started it will print a log *Application startup complete*.
Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API
using Swagger UI.
#### Customizing low level parameters
Currently, not all the parameters of `llama.cpp` and `llama-cpp-python` are available at PrivateGPT's `settings.yaml` file.
In case you need to customize parameters such as the number of layers loaded into the GPU, you might change
these at the `llm_component.py` file under the `private_gpt/components/llm/llm_component.py`.
##### Available LLM config options
The `llm` section of the settings allows for the following configurations:
- `mode`: how to run your llm
- `max_new_tokens`: this lets you configure the number of new tokens the LLM will generate and add to the context window (by default Llama.cpp uses `256`)
Example:
```yaml
llm:
mode: local
max_new_tokens: 256
```
If you are getting an out of memory error, you might also try a smaller model or stick to the proposed
recommended models, instead of custom tuning the parameters.
### Using OpenAI
If you cannot run a local model (because you don't have a GPU, for example) or for testing purposes, you may
@ -74,6 +98,43 @@ to run an OpenAI compatible server. Then, you can run PrivateGPT using the `sett
`PGPT_PROFILES=vllm make run`
### Using Azure OpenAI
If you cannot run a local model (because you don't have a GPU, for example) or for testing purposes, you may
decide to run PrivateGPT using Azure OpenAI as the LLM and Embeddings model.
In order to do so, create a profile `settings-azopenai.yaml` with the following contents:
```yaml
llm:
mode: azopenai
embedding:
mode: azopenai
azopenai:
api_key: <your_azopenai_api_key> # You could skip this configuration and use the AZ_OPENAI_API_KEY env var instead
azure_endpoint: <your_azopenai_endpoint> # You could skip this configuration and use the AZ_OPENAI_ENDPOINT env var instead
api_version: <api_version> # The API version to use. Default is "2023_05_15"
embedding_deployment_name: <your_embedding_deployment_name> # You could skip this configuration and use the AZ_OPENAI_EMBEDDING_DEPLOYMENT_NAME env var instead
embedding_model: <openai_embeddings_to_use> # Optional model to use. Default is "text-embedding-ada-002"
llm_deployment_name: <your_model_deployment_name> # You could skip this configuration and use the AZ_OPENAI_LLM_DEPLOYMENT_NAME env var instead
llm_model: <openai_model_to_use> # Optional model to use. Default is "gpt-35-turbo"
```
And run PrivateGPT loading that profile you just created:
`PGPT_PROFILES=azopenai make run`
or
`PGPT_PROFILES=azopenai poetry run python -m private_gpt`
When the server is started it will print a log *Application startup complete*.
Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API.
You'll notice the speed and quality of response is higher, given you are using Azure OpenAI's servers for the heavy
computations.
### Using AWS Sagemaker
For a fully private & performant setup, you can choose to have both your LLM and Embeddings model deployed using Sagemaker.

View File

@ -0,0 +1,66 @@
## NodeStores
PrivateGPT supports **Simple** and [Postgres](https://www.postgresql.org/) providers. Simple being the default.
In order to select one or the other, set the `nodestore.database` property in the `settings.yaml` file to `simple` or `postgres`.
```yaml
nodestore:
database: simple
```
### Simple Document Store
Setting up simple document store: Persist data with in-memory and disk storage.
Enabling the simple document store is an excellent choice for small projects or proofs of concept where you need to persist data while maintaining minimal setup complexity. To get started, set the nodestore.database property in your settings.yaml file as follows:
```yaml
nodestore:
database: simple
```
The beauty of the simple document store is its flexibility and ease of implementation. It provides a solid foundation for managing and retrieving data without the need for complex setup or configuration. The combination of in-memory processing and disk persistence ensures that you can efficiently handle small to medium-sized datasets while maintaining data consistency across runs.
### Postgres Document Store
To enable Postgres, set the `nodestore.database` property in the `settings.yaml` file to `postgres` and install the `storage-nodestore-postgres` extra. Note: Vector Embeddings Storage in Postgres is configured separately
```bash
poetry install --extras storage-nodestore-postgres
```
The available configuration options are:
| Field | Description |
|---------------|-----------------------------------------------------------|
| **host** | The server hosting the Postgres database. Default is `localhost` |
| **port** | The port on which the Postgres database is accessible. Default is `5432` |
| **database** | The specific database to connect to. Default is `postgres` |
| **user** | The username for database access. Default is `postgres` |
| **password** | The password for database access. (Required) |
| **schema_name** | The database schema to use. Default is `private_gpt` |
For example:
```yaml
nodestore:
database: postgres
postgres:
host: localhost
port: 5432
database: postgres
user: postgres
password: <PASSWORD>
schema_name: private_gpt
```
Given the above configuration, Two PostgreSQL tables will be created upon successful connection: one for storing metadata related to the index and another for document data itself.
```
postgres=# \dt private_gpt.*
List of relations
Schema | Name | Type | Owner
-------------+-----------------+-------+--------------
private_gpt | data_docstore | table | postgres
private_gpt | data_indexstore | table | postgres
postgres=#
```

View File

@ -1,7 +1,7 @@
## Vectorstores
PrivateGPT supports [Qdrant](https://qdrant.tech/), [Chroma](https://www.trychroma.com/) and [PGVector](https://github.com/pgvector/pgvector) as vectorstore providers. Qdrant being the default.
In order to select one or the other, set the `vectorstore.database` property in the `settings.yaml` file to `qdrant`, `chroma` or `pgvector`.
In order to select one or the other, set the `vectorstore.database` property in the `settings.yaml` file to `qdrant`, `chroma` or `postgres`.
```yaml
vectorstore:
@ -50,14 +50,15 @@ poetry install --extras chroma
By default `chroma` will use a disk-based database stored in local_data_path / "chroma_db" (being local_data_path defined in settings.yaml)
### PGVector
To use the PGVector store a [postgreSQL](https://www.postgresql.org/) database with the PGVector extension must be used.
To enable PGVector, set the `vectorstore.database` property in the `settings.yaml` file to `pgvector` and install the `pgvector` extra.
To enable PGVector, set the `vectorstore.database` property in the `settings.yaml` file to `postgres` and install the `vector-stores-postgres` extra.
```bash
poetry install --extras pgvector
poetry install --extras vector-stores-postgres
```
PGVector settings can be configured by setting values to the `pgvector` property in the `settings.yaml` file.
PGVector settings can be configured by setting values to the `postgres` property in the `settings.yaml` file.
The available configuration options are:
| Field | Description |
@ -67,19 +68,36 @@ The available configuration options are:
| **database** | The specific database to connect to. Default is `postgres` |
| **user** | The username for database access. Default is `postgres` |
| **password** | The password for database access. (Required) |
| **embed_dim** | The dimensionality of the embedding model (Required) |
| **schema_name** | The database schema to use. Default is `private_gpt` |
| **table_name** | The database table to use. Default is `embeddings` |
For example:
```yaml
pgvector:
vectorstore:
database: postgres
postgres:
host: localhost
port: 5432
database: postgres
user: postgres
password: <PASSWORD>
embed_dim: 384 # 384 is for BAAI/bge-small-en-v1.5
schema_name: private_gpt
table_name: embeddings
```
The following table will be created in the database
```
postgres=# \d private_gpt.data_embeddings
Table "private_gpt.data_embeddings"
Column | Type | Collation | Nullable | Default
-----------+-------------------+-----------+----------+---------------------------------------------------------
id | bigint | | not null | nextval('private_gpt.data_embeddings_id_seq'::regclass)
text | character varying | | not null |
metadata_ | json | | |
node_id | character varying | | |
embedding | vector(768) | | |
Indexes:
"data_embeddings_pkey" PRIMARY KEY, btree (id)
postgres=#
```
The dimensions of the embeddings columns will be set based on the `embedding.embed_dim` value. If the embedding model changes this table may need to be dropped and recreated to avoid a dimension mismatch.

View File

@ -1,21 +0,0 @@
## Local Installation steps
The steps in [Installation](/installation) section are better explained and cover more
setup scenarios (macOS, Windows, Linux).
But if you like one-liners, have python3.11 installed, and you are running a UNIX (macOS or Linux)
system, you can get up and running on CPU in few lines:
```bash
git clone https://github.com/imartinez/privateGPT && cd privateGPT && \
python3.11 -m venv .venv && source .venv/bin/activate && \
pip install --upgrade pip poetry && poetry install --with ui,local && ./scripts/setup
# Launch the privateGPT API server **and** the gradio UI
poetry run python3.11 -m private_gpt
# In another terminal, create a new browser window on your private GPT!
open http://127.0.0.1:8001/
```
The above is not working, or it is too slow, so **you want to run it on GPU(s)**?
Please check the more detailed [installation guide](/installation).

View File

@ -1,20 +1,19 @@
## Introduction 👋
PrivateGPT provides an **API** containing all the building blocks required to
build **private, context-aware AI applications**.
The API follows and extends OpenAI API standard, and supports both normal and streaming responses.
That means that, if you can use OpenAI API in one of your tools, you can use your own PrivateGPT API instead,
with no code changes, **and for free** if you are running privateGPT in `local` mode.
Looking for the installation quickstart? [Quickstart installation guide for Linux and macOS](/overview/welcome/quickstart).
Do you want to install it on Windows? Or do you want to take full advantage of your hardware for better performances?
The installation guide will help you in the [Installation section](/installation).
with no code changes, **and for free** if you are running privateGPT in a `local` setup.
Get started by understanding the [Main Concepts and Installation](/installation) and then dive into the [API Reference](/api-reference).
## Frequently Visited Resources
<Cards>
<Card
title="Main Concepts"
icon="fa-solid fa-lines-leaning"
href="/installation"
/>
<Card
title="API Reference"
icon="fa-solid fa-code"
@ -32,6 +31,9 @@ The installation guide will help you in the [Installation section](/installation
/>
</Cards>
<br />
<Callout intent = "info">
A working **Gradio UI client** is provided to test the API, together with a set of useful tools such as bulk
model download script, ingestion script, documents folder watch, etc.

View File

@ -1,4 +1,4 @@
{
"organization": "privategpt",
"version": "0.15.3"
"version": "0.17.2"
}

2066
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,5 @@
"""private-gpt."""
import logging
import os
@ -21,3 +22,6 @@ os.environ["GRADIO_ANALYTICS_ENABLED"] = "False"
# Disable chromaDB telemetry
# It is already disabled, see PR#1144
# os.environ["ANONYMIZED_TELEMETRY"] = "False"
# adding tiktoken cache path within repo to be able to run in offline environment.
os.environ["TIKTOKEN_CACHE_DIR"] = "tiktoken_cache"

View File

@ -3,7 +3,7 @@ import json
from typing import Any
import boto3
from llama_index.embeddings.base import BaseEmbedding
from llama_index.core.base.embeddings.base import BaseEmbedding
from pydantic import Field, PrivateAttr

View File

@ -1,8 +1,7 @@
import logging
from injector import inject, singleton
from llama_index import MockEmbedding
from llama_index.embeddings.base import BaseEmbedding
from llama_index.core.embeddings import BaseEmbedding, MockEmbedding
from private_gpt.paths import models_cache_path
from private_gpt.settings.settings import Settings
@ -19,27 +18,78 @@ class EmbeddingComponent:
embedding_mode = settings.embedding.mode
logger.info("Initializing the embedding model in mode=%s", embedding_mode)
match embedding_mode:
case "local":
from llama_index.embeddings import HuggingFaceEmbedding
case "huggingface":
try:
from llama_index.embeddings.huggingface import ( # type: ignore
HuggingFaceEmbedding,
)
except ImportError as e:
raise ImportError(
"Local dependencies not found, install with `poetry install --extras embeddings-huggingface`"
) from e
self.embedding_model = HuggingFaceEmbedding(
model_name=settings.local.embedding_hf_model_name,
model_name=settings.huggingface.embedding_hf_model_name,
cache_folder=str(models_cache_path),
)
case "sagemaker":
from private_gpt.components.embedding.custom.sagemaker import (
SagemakerEmbedding,
)
try:
from private_gpt.components.embedding.custom.sagemaker import (
SagemakerEmbedding,
)
except ImportError as e:
raise ImportError(
"Sagemaker dependencies not found, install with `poetry install --extras embeddings-sagemaker`"
) from e
self.embedding_model = SagemakerEmbedding(
endpoint_name=settings.sagemaker.embedding_endpoint_name,
)
case "openai":
from llama_index import OpenAIEmbedding
try:
from llama_index.embeddings.openai import ( # type: ignore
OpenAIEmbedding,
)
except ImportError as e:
raise ImportError(
"OpenAI dependencies not found, install with `poetry install --extras embeddings-openai`"
) from e
openai_settings = settings.openai.api_key
self.embedding_model = OpenAIEmbedding(api_key=openai_settings)
case "ollama":
try:
from llama_index.embeddings.ollama import ( # type: ignore
OllamaEmbedding,
)
except ImportError as e:
raise ImportError(
"Local dependencies not found, install with `poetry install --extras embeddings-ollama`"
) from e
ollama_settings = settings.ollama
self.embedding_model = OllamaEmbedding(
model_name=ollama_settings.embedding_model,
base_url=ollama_settings.api_base,
)
case "azopenai":
try:
from llama_index.embeddings.azure_openai import ( # type: ignore
AzureOpenAIEmbedding,
)
except ImportError as e:
raise ImportError(
"Azure OpenAI dependencies not found, install with `poetry install --extras embeddings-azopenai`"
) from e
azopenai_settings = settings.azopenai
self.embedding_model = AzureOpenAIEmbedding(
model=azopenai_settings.embedding_model,
deployment_name=azopenai_settings.embedding_deployment_name,
api_key=azopenai_settings.api_key,
azure_endpoint=azopenai_settings.azure_endpoint,
api_version=azopenai_settings.api_version,
)
case "mock":
# Not a random number, is the dimensionality used by
# the default embedding model

View File

@ -8,16 +8,13 @@ import threading
from pathlib import Path
from typing import Any
from llama_index import (
Document,
ServiceContext,
StorageContext,
VectorStoreIndex,
load_index_from_storage,
)
from llama_index.data_structs import IndexDict
from llama_index.indices.base import BaseIndex
from llama_index.ingestion import run_transformations
from llama_index.core.data_structs import IndexDict
from llama_index.core.embeddings.utils import EmbedType
from llama_index.core.indices import VectorStoreIndex, load_index_from_storage
from llama_index.core.indices.base import BaseIndex
from llama_index.core.ingestion import run_transformations
from llama_index.core.schema import Document, TransformComponent
from llama_index.core.storage import StorageContext
from private_gpt.components.ingest.ingest_helper import IngestionHelper
from private_gpt.paths import local_data_path
@ -30,13 +27,15 @@ class BaseIngestComponent(abc.ABC):
def __init__(
self,
storage_context: StorageContext,
service_context: ServiceContext,
embed_model: EmbedType,
transformations: list[TransformComponent],
*args: Any,
**kwargs: Any,
) -> None:
logger.debug("Initializing base ingest component type=%s", type(self).__name__)
self.storage_context = storage_context
self.service_context = service_context
self.embed_model = embed_model
self.transformations = transformations
@abc.abstractmethod
def ingest(self, file_name: str, file_data: Path) -> list[Document]:
@ -55,11 +54,12 @@ class BaseIngestComponentWithIndex(BaseIngestComponent, abc.ABC):
def __init__(
self,
storage_context: StorageContext,
service_context: ServiceContext,
embed_model: EmbedType,
transformations: list[TransformComponent],
*args: Any,
**kwargs: Any,
) -> None:
super().__init__(storage_context, service_context, *args, **kwargs)
super().__init__(storage_context, embed_model, transformations, *args, **kwargs)
self.show_progress = True
self._index_thread_lock = (
@ -73,9 +73,10 @@ class BaseIngestComponentWithIndex(BaseIngestComponent, abc.ABC):
# Load the index with store_nodes_override=True to be able to delete them
index = load_index_from_storage(
storage_context=self.storage_context,
service_context=self.service_context,
store_nodes_override=True, # Force store nodes in index and document stores
show_progress=self.show_progress,
embed_model=self.embed_model,
transformations=self.transformations,
)
except ValueError:
# There are no index in the storage context, creating a new one
@ -83,9 +84,10 @@ class BaseIngestComponentWithIndex(BaseIngestComponent, abc.ABC):
index = VectorStoreIndex.from_documents(
[],
storage_context=self.storage_context,
service_context=self.service_context,
store_nodes_override=True, # Force store nodes in index and document stores
show_progress=self.show_progress,
embed_model=self.embed_model,
transformations=self.transformations,
)
index.storage_context.persist(persist_dir=local_data_path)
return index
@ -106,11 +108,12 @@ class SimpleIngestComponent(BaseIngestComponentWithIndex):
def __init__(
self,
storage_context: StorageContext,
service_context: ServiceContext,
embed_model: EmbedType,
transformations: list[TransformComponent],
*args: Any,
**kwargs: Any,
) -> None:
super().__init__(storage_context, service_context, *args, **kwargs)
super().__init__(storage_context, embed_model, transformations, *args, **kwargs)
def ingest(self, file_name: str, file_data: Path) -> list[Document]:
logger.info("Ingesting file_name=%s", file_name)
@ -151,16 +154,17 @@ class BatchIngestComponent(BaseIngestComponentWithIndex):
def __init__(
self,
storage_context: StorageContext,
service_context: ServiceContext,
embed_model: EmbedType,
transformations: list[TransformComponent],
count_workers: int,
*args: Any,
**kwargs: Any,
) -> None:
super().__init__(storage_context, service_context, *args, **kwargs)
super().__init__(storage_context, embed_model, transformations, *args, **kwargs)
# Make an efficient use of the CPU and GPU, the embedding
# must be in the transformations
assert (
len(self.service_context.transformations) >= 2
len(self.transformations) >= 2
), "Embeddings must be in the transformations"
assert count_workers > 0, "count_workers must be > 0"
self.count_workers = count_workers
@ -197,7 +201,7 @@ class BatchIngestComponent(BaseIngestComponentWithIndex):
logger.debug("Transforming count=%s documents into nodes", len(documents))
nodes = run_transformations(
documents, # type: ignore[arg-type]
self.service_context.transformations,
self.transformations,
show_progress=self.show_progress,
)
# Locking the index to avoid concurrent writes
@ -225,16 +229,17 @@ class ParallelizedIngestComponent(BaseIngestComponentWithIndex):
def __init__(
self,
storage_context: StorageContext,
service_context: ServiceContext,
embed_model: EmbedType,
transformations: list[TransformComponent],
count_workers: int,
*args: Any,
**kwargs: Any,
) -> None:
super().__init__(storage_context, service_context, *args, **kwargs)
super().__init__(storage_context, embed_model, transformations, *args, **kwargs)
# To make an efficient use of the CPU and GPU, the embeddings
# must be in the transformations (to be computed in batches)
assert (
len(self.service_context.transformations) >= 2
len(self.transformations) >= 2
), "Embeddings must be in the transformations"
assert count_workers > 0, "count_workers must be > 0"
self.count_workers = count_workers
@ -278,7 +283,7 @@ class ParallelizedIngestComponent(BaseIngestComponentWithIndex):
logger.debug("Transforming count=%s documents into nodes", len(documents))
nodes = run_transformations(
documents, # type: ignore[arg-type]
self.service_context.transformations,
self.transformations,
show_progress=self.show_progress,
)
# Locking the index to avoid concurrent writes
@ -311,18 +316,29 @@ class ParallelizedIngestComponent(BaseIngestComponentWithIndex):
def get_ingestion_component(
storage_context: StorageContext,
service_context: ServiceContext,
embed_model: EmbedType,
transformations: list[TransformComponent],
settings: Settings,
) -> BaseIngestComponent:
"""Get the ingestion component for the given configuration."""
ingest_mode = settings.embedding.ingest_mode
if ingest_mode == "batch":
return BatchIngestComponent(
storage_context, service_context, settings.embedding.count_workers
storage_context=storage_context,
embed_model=embed_model,
transformations=transformations,
count_workers=settings.embedding.count_workers,
)
elif ingest_mode == "parallel":
return ParallelizedIngestComponent(
storage_context, service_context, settings.embedding.count_workers
storage_context=storage_context,
embed_model=embed_model,
transformations=transformations,
count_workers=settings.embedding.count_workers,
)
else:
return SimpleIngestComponent(storage_context, service_context)
return SimpleIngestComponent(
storage_context=storage_context,
embed_model=embed_model,
transformations=transformations,
)

View File

@ -1,14 +1,58 @@
import logging
from pathlib import Path
from llama_index import Document
from llama_index.readers import JSONReader, StringIterableReader
from llama_index.readers.file.base import DEFAULT_FILE_READER_CLS
from llama_index.core.readers import StringIterableReader
from llama_index.core.readers.base import BaseReader
from llama_index.core.readers.json import JSONReader
from llama_index.core.schema import Document
logger = logging.getLogger(__name__)
# Inspired by the `llama_index.core.readers.file.base` module
def _try_loading_included_file_formats() -> dict[str, type[BaseReader]]:
try:
from llama_index.readers.file.docs import ( # type: ignore
DocxReader,
HWPReader,
PDFReader,
)
from llama_index.readers.file.epub import EpubReader # type: ignore
from llama_index.readers.file.image import ImageReader # type: ignore
from llama_index.readers.file.ipynb import IPYNBReader # type: ignore
from llama_index.readers.file.markdown import MarkdownReader # type: ignore
from llama_index.readers.file.mbox import MboxReader # type: ignore
from llama_index.readers.file.slides import PptxReader # type: ignore
from llama_index.readers.file.tabular import PandasCSVReader # type: ignore
from llama_index.readers.file.video_audio import ( # type: ignore
VideoAudioReader,
)
except ImportError as e:
raise ImportError("`llama-index-readers-file` package not found") from e
default_file_reader_cls: dict[str, type[BaseReader]] = {
".hwp": HWPReader,
".pdf": PDFReader,
".docx": DocxReader,
".pptx": PptxReader,
".ppt": PptxReader,
".pptm": PptxReader,
".jpg": ImageReader,
".png": ImageReader,
".jpeg": ImageReader,
".mp3": VideoAudioReader,
".mp4": VideoAudioReader,
".csv": PandasCSVReader,
".epub": EpubReader,
".md": MarkdownReader,
".mbox": MboxReader,
".ipynb": IPYNBReader,
}
return default_file_reader_cls
# Patching the default file reader to support other file types
FILE_READER_CLS = DEFAULT_FILE_READER_CLS.copy()
FILE_READER_CLS = _try_loading_included_file_formats()
FILE_READER_CLS.update(
{
".json": JSONReader,

View File

@ -7,26 +7,20 @@ import logging
from typing import TYPE_CHECKING, Any
import boto3 # type: ignore
from llama_index.bridge.pydantic import Field
from llama_index.llms import (
from llama_index.core.base.llms.generic_utils import (
completion_response_to_chat_response,
stream_completion_response_to_chat_response,
)
from llama_index.core.bridge.pydantic import Field
from llama_index.core.llms import (
CompletionResponse,
CustomLLM,
LLMMetadata,
)
from llama_index.llms.base import (
from llama_index.core.llms.callbacks import (
llm_chat_callback,
llm_completion_callback,
)
from llama_index.llms.generic_utils import (
completion_response_to_chat_response,
stream_completion_response_to_chat_response,
)
from llama_index.llms.llama_utils import (
completion_to_prompt as generic_completion_to_prompt,
)
from llama_index.llms.llama_utils import (
messages_to_prompt as generic_messages_to_prompt,
)
if TYPE_CHECKING:
from collections.abc import Sequence
@ -161,8 +155,8 @@ class SagemakerLLM(CustomLLM):
model_kwargs = model_kwargs or {}
model_kwargs.update({"n_ctx": context_window, "verbose": verbose})
messages_to_prompt = messages_to_prompt or generic_messages_to_prompt
completion_to_prompt = completion_to_prompt or generic_completion_to_prompt
messages_to_prompt = messages_to_prompt or {}
completion_to_prompt = completion_to_prompt or {}
generate_kwargs = generate_kwargs or {}
generate_kwargs.update(

View File

@ -1,9 +1,9 @@
import logging
from injector import inject, singleton
from llama_index import set_global_tokenizer
from llama_index.llms import MockLLM
from llama_index.llms.base import LLM
from llama_index.core.llms import LLM, MockLLM
from llama_index.core.settings import Settings as LlamaIndexSettings
from llama_index.core.utils import set_global_tokenizer
from transformers import AutoTokenizer # type: ignore
from private_gpt.components.llm.prompt_helper import get_prompt_style
@ -30,19 +30,32 @@ class LLMComponent:
logger.info("Initializing the LLM in mode=%s", llm_mode)
match settings.llm.mode:
case "local":
from llama_index.llms import LlamaCPP
prompt_style = get_prompt_style(settings.local.prompt_style)
case "llamacpp":
try:
from llama_index.llms.llama_cpp import LlamaCPP # type: ignore
except ImportError as e:
raise ImportError(
"Local dependencies not found, install with `poetry install --extras llms-llama-cpp`"
) from e
prompt_style = get_prompt_style(settings.llamacpp.prompt_style)
settings_kwargs = {
"tfs_z": settings.llamacpp.tfs_z, # ollama and llama-cpp
"top_k": settings.llamacpp.top_k, # ollama and llama-cpp
"top_p": settings.llamacpp.top_p, # ollama and llama-cpp
"repeat_penalty": settings.llamacpp.repeat_penalty, # ollama llama-cpp
"n_gpu_layers": -1,
"offload_kqv": True,
}
self.llm = LlamaCPP(
model_path=str(models_path / settings.local.llm_hf_model_file),
temperature=0.1,
model_path=str(models_path / settings.llamacpp.llm_hf_model_file),
temperature=settings.llm.temperature,
max_new_tokens=settings.llm.max_new_tokens,
context_window=settings.llm.context_window,
generate_kwargs={},
callback_manager=LlamaIndexSettings.callback_manager,
# All to GPU
model_kwargs={"n_gpu_layers": -1, "offload_kqv": True},
model_kwargs=settings_kwargs,
# transform inputs into Llama2 format
messages_to_prompt=prompt_style.messages_to_prompt,
completion_to_prompt=prompt_style.completion_to_prompt,
@ -50,7 +63,12 @@ class LLMComponent:
)
case "sagemaker":
from private_gpt.components.llm.custom.sagemaker import SagemakerLLM
try:
from private_gpt.components.llm.custom.sagemaker import SagemakerLLM
except ImportError as e:
raise ImportError(
"Sagemaker dependencies not found, install with `poetry install --extras llms-sagemaker`"
) from e
self.llm = SagemakerLLM(
endpoint_name=settings.sagemaker.llm_endpoint_name,
@ -58,7 +76,12 @@ class LLMComponent:
context_window=settings.llm.context_window,
)
case "openai":
from llama_index.llms import OpenAI
try:
from llama_index.llms.openai import OpenAI # type: ignore
except ImportError as e:
raise ImportError(
"OpenAI dependencies not found, install with `poetry install --extras llms-openai`"
) from e
openai_settings = settings.openai
self.llm = OpenAI(
@ -67,7 +90,12 @@ class LLMComponent:
model=openai_settings.model,
)
case "openailike":
from llama_index.llms import OpenAILike
try:
from llama_index.llms.openai_like import OpenAILike # type: ignore
except ImportError as e:
raise ImportError(
"OpenAILike dependencies not found, install with `poetry install --extras llms-openai-like`"
) from e
openai_settings = settings.openai
self.llm = OpenAILike(
@ -78,12 +106,49 @@ class LLMComponent:
max_tokens=None,
api_version="",
)
case "mock":
self.llm = MockLLM()
case "ollama":
from llama_index.llms import Ollama
try:
from llama_index.llms.ollama import Ollama # type: ignore
except ImportError as e:
raise ImportError(
"Ollama dependencies not found, install with `poetry install --extras llms-ollama`"
) from e
ollama_settings = settings.ollama
settings_kwargs = {
"tfs_z": ollama_settings.tfs_z, # ollama and llama-cpp
"num_predict": ollama_settings.num_predict, # ollama only
"top_k": ollama_settings.top_k, # ollama and llama-cpp
"top_p": ollama_settings.top_p, # ollama and llama-cpp
"repeat_last_n": ollama_settings.repeat_last_n, # ollama
"repeat_penalty": ollama_settings.repeat_penalty, # ollama llama-cpp
}
self.llm = Ollama(
model=ollama_settings.model, base_url=ollama_settings.api_base
model=ollama_settings.llm_model,
base_url=ollama_settings.api_base,
temperature=settings.llm.temperature,
context_window=settings.llm.context_window,
additional_kwargs=settings_kwargs,
)
case "azopenai":
try:
from llama_index.llms.azure_openai import ( # type: ignore
AzureOpenAI,
)
except ImportError as e:
raise ImportError(
"Azure OpenAI dependencies not found, install with `poetry install --extras llms-azopenai`"
) from e
azopenai_settings = settings.azopenai
self.llm = AzureOpenAI(
model=azopenai_settings.llm_model,
deployment_name=azopenai_settings.llm_deployment_name,
api_key=azopenai_settings.api_key,
azure_endpoint=azopenai_settings.azure_endpoint,
api_version=azopenai_settings.api_version,
)
case "mock":
self.llm = MockLLM()

View File

@ -3,11 +3,7 @@ import logging
from collections.abc import Sequence
from typing import Any, Literal
from llama_index.llms import ChatMessage, MessageRole
from llama_index.llms.llama_utils import (
completion_to_prompt,
messages_to_prompt,
)
from llama_index.core.llms import ChatMessage, MessageRole
logger = logging.getLogger(__name__)
@ -73,7 +69,9 @@ class DefaultPromptStyle(AbstractPromptStyle):
class Llama2PromptStyle(AbstractPromptStyle):
"""Simple prompt style that just uses the default llama_utils functions.
"""Simple prompt style that uses llama 2 prompt style.
Inspired by llama_index/legacy/llms/llama_utils.py
It transforms the sequence of messages into a prompt that should look like:
```text
@ -83,11 +81,61 @@ class Llama2PromptStyle(AbstractPromptStyle):
```
"""
BOS, EOS = "<s>", "</s>"
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
DEFAULT_SYSTEM_PROMPT = """\
You are a helpful, respectful and honest assistant. \
Always answer as helpfully as possible and follow ALL given instructions. \
Do not speculate or make up information. \
Do not reference any given instructions or context. \
"""
def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
return messages_to_prompt(messages)
string_messages: list[str] = []
if messages[0].role == MessageRole.SYSTEM:
# pull out the system message (if it exists in messages)
system_message_str = messages[0].content or ""
messages = messages[1:]
else:
system_message_str = self.DEFAULT_SYSTEM_PROMPT
system_message_str = f"{self.B_SYS} {system_message_str.strip()} {self.E_SYS}"
for i in range(0, len(messages), 2):
# first message should always be a user
user_message = messages[i]
assert user_message.role == MessageRole.USER
if i == 0:
# make sure system prompt is included at the start
str_message = f"{self.BOS} {self.B_INST} {system_message_str} "
else:
# end previous user-assistant interaction
string_messages[-1] += f" {self.EOS}"
# no need to include system prompt
str_message = f"{self.BOS} {self.B_INST} "
# include user message content
str_message += f"{user_message.content} {self.E_INST}"
if len(messages) > (i + 1):
# if assistant message exists, add to str_message
assistant_message = messages[i + 1]
assert assistant_message.role == MessageRole.ASSISTANT
str_message += f" {assistant_message.content}"
string_messages.append(str_message)
return "".join(string_messages)
def _completion_to_prompt(self, completion: str) -> str:
return completion_to_prompt(completion)
system_prompt_str = self.DEFAULT_SYSTEM_PROMPT
return (
f"{self.BOS} {self.B_INST} {self.B_SYS} {system_prompt_str.strip()} {self.E_SYS} "
f"{completion.strip()} {self.E_INST}"
)
class TagPromptStyle(AbstractPromptStyle):

View File

@ -1,11 +1,12 @@
import logging
from injector import inject, singleton
from llama_index.storage.docstore import BaseDocumentStore, SimpleDocumentStore
from llama_index.storage.index_store import SimpleIndexStore
from llama_index.storage.index_store.types import BaseIndexStore
from llama_index.core.storage.docstore import BaseDocumentStore, SimpleDocumentStore
from llama_index.core.storage.index_store import SimpleIndexStore
from llama_index.core.storage.index_store.types import BaseIndexStore
from private_gpt.paths import local_data_path
from private_gpt.settings.settings import Settings
logger = logging.getLogger(__name__)
@ -16,19 +17,51 @@ class NodeStoreComponent:
doc_store: BaseDocumentStore
@inject
def __init__(self) -> None:
try:
self.index_store = SimpleIndexStore.from_persist_dir(
persist_dir=str(local_data_path)
)
except FileNotFoundError:
logger.debug("Local index store not found, creating a new one")
self.index_store = SimpleIndexStore()
def __init__(self, settings: Settings) -> None:
match settings.nodestore.database:
case "simple":
try:
self.index_store = SimpleIndexStore.from_persist_dir(
persist_dir=str(local_data_path)
)
except FileNotFoundError:
logger.debug("Local index store not found, creating a new one")
self.index_store = SimpleIndexStore()
try:
self.doc_store = SimpleDocumentStore.from_persist_dir(
persist_dir=str(local_data_path)
)
except FileNotFoundError:
logger.debug("Local document store not found, creating a new one")
self.doc_store = SimpleDocumentStore()
try:
self.doc_store = SimpleDocumentStore.from_persist_dir(
persist_dir=str(local_data_path)
)
except FileNotFoundError:
logger.debug("Local document store not found, creating a new one")
self.doc_store = SimpleDocumentStore()
case "postgres":
try:
from llama_index.core.storage.docstore.postgres_docstore import (
PostgresDocumentStore,
)
from llama_index.core.storage.index_store.postgres_index_store import (
PostgresIndexStore,
)
except ImportError:
raise ImportError(
"Postgres dependencies not found, install with `poetry install --extras storage-nodestore-postgres`"
) from None
if settings.postgres is None:
raise ValueError("Postgres index/doc store settings not found.")
self.index_store = PostgresIndexStore.from_params(
**settings.postgres.model_dump(exclude_none=True)
)
self.doc_store = PostgresDocumentStore.from_params(
**settings.postgres.model_dump(exclude_none=True)
)
case _:
# Should be unreachable
# The settings validator should have caught this
raise ValueError(
f"Database {settings.nodestore.database} not supported"
)

View File

@ -1,12 +1,28 @@
from collections.abc import Generator
from typing import Any
from llama_index.schema import BaseNode, MetadataMode
from llama_index.vector_stores import ChromaVectorStore
from llama_index.vector_stores.chroma import chunk_list
from llama_index.vector_stores.utils import node_to_metadata_dict
from llama_index.core.schema import BaseNode, MetadataMode
from llama_index.core.vector_stores.utils import node_to_metadata_dict
from llama_index.vector_stores.chroma import ChromaVectorStore # type: ignore
class BatchedChromaVectorStore(ChromaVectorStore):
def chunk_list(
lst: list[BaseNode], max_chunk_size: int
) -> Generator[list[BaseNode], None, None]:
"""Yield successive max_chunk_size-sized chunks from lst.
Args:
lst (List[BaseNode]): list of nodes with embeddings
max_chunk_size (int): max chunk size
Yields:
Generator[List[BaseNode], None, None]: list of nodes with embeddings
"""
for i in range(0, len(lst), max_chunk_size):
yield lst[i : i + max_chunk_size]
class BatchedChromaVectorStore(ChromaVectorStore): # type: ignore
"""Chroma vector store, batching additions to avoid reaching the max batch limit.
In this vector store, embeddings are stored within a ChromaDB collection.

View File

@ -2,11 +2,14 @@ import logging
import typing
from injector import inject, singleton
from llama_index import VectorStoreIndex
from llama_index.indices.vector_store import VectorIndexRetriever
from llama_index.vector_stores.types import VectorStore
from llama_index.core.indices.vector_store import VectorIndexRetriever, VectorStoreIndex
from llama_index.core.vector_stores.types import (
FilterCondition,
MetadataFilter,
MetadataFilters,
VectorStore,
)
from private_gpt.components.vector_store.batched_chroma import BatchedChromaVectorStore
from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.paths import local_data_path
from private_gpt.settings.settings import Settings
@ -14,44 +17,48 @@ from private_gpt.settings.settings import Settings
logger = logging.getLogger(__name__)
@typing.no_type_check
def _chromadb_doc_id_metadata_filter(
def _doc_id_metadata_filter(
context_filter: ContextFilter | None,
) -> dict | None:
if context_filter is None or context_filter.docs_ids is None:
return {} # No filter
elif len(context_filter.docs_ids) < 1:
return {"doc_id": "-"} # Effectively filtering out all docs
else:
doc_filter_items = []
if len(context_filter.docs_ids) > 1:
doc_filter = {"$or": doc_filter_items}
for doc_id in context_filter.docs_ids:
doc_filter_items.append({"doc_id": doc_id})
else:
doc_filter = {"doc_id": context_filter.docs_ids[0]}
return doc_filter
) -> MetadataFilters:
filters = MetadataFilters(filters=[], condition=FilterCondition.OR)
if context_filter is not None and context_filter.docs_ids is not None:
for doc_id in context_filter.docs_ids:
filters.filters.append(MetadataFilter(key="doc_id", value=doc_id))
return filters
@singleton
class VectorStoreComponent:
settings: Settings
vector_store: VectorStore
@inject
def __init__(self, settings: Settings) -> None:
self.settings = settings
match settings.vectorstore.database:
case "pgvector":
from llama_index.vector_stores import PGVectorStore
case "postgres":
try:
from llama_index.vector_stores.postgres import ( # type: ignore
PGVectorStore,
)
except ImportError as e:
raise ImportError(
"Postgres dependencies not found, install with `poetry install --extras vector-stores-postgres`"
) from e
if settings.pgvector is None:
if settings.postgres is None:
raise ValueError(
"PGVectorStore settings not found. Please provide settings."
"Postgres settings not found. Please provide settings."
)
self.vector_store = typing.cast(
VectorStore,
PGVectorStore.from_params(
**settings.pgvector.model_dump(exclude_none=True)
**settings.postgres.model_dump(exclude_none=True),
table_name="embeddings",
embed_dim=settings.embedding.embed_dim,
),
)
@ -61,11 +68,13 @@ class VectorStoreComponent:
from chromadb.config import ( # type: ignore
Settings as ChromaSettings,
)
from private_gpt.components.vector_store.batched_chroma import (
BatchedChromaVectorStore,
)
except ImportError as e:
raise ImportError(
"'chromadb' is not installed."
"To use PrivateGPT with Chroma, install the 'chroma' extra."
"`poetry install --extras chroma`"
"ChromaDB dependencies not found, install with `poetry install --extras vector-stores-chroma`"
) from e
chroma_settings = ChromaSettings(anonymized_telemetry=False)
@ -85,8 +94,15 @@ class VectorStoreComponent:
)
case "qdrant":
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
try:
from llama_index.vector_stores.qdrant import ( # type: ignore
QdrantVectorStore,
)
from qdrant_client import QdrantClient # type: ignore
except ImportError as e:
raise ImportError(
"Qdrant dependencies not found, install with `poetry install --extras vector-stores-qdrant`"
) from e
if settings.qdrant is None:
logger.info(
@ -112,20 +128,22 @@ class VectorStoreComponent:
f"Vectorstore database {settings.vectorstore.database} not supported"
)
@staticmethod
def get_retriever(
self,
index: VectorStoreIndex,
context_filter: ContextFilter | None = None,
similarity_top_k: int = 2,
) -> VectorIndexRetriever:
# This way we support qdrant (using doc_ids) and chroma (using where clause)
# This way we support qdrant (using doc_ids) and the rest (using filters)
return VectorIndexRetriever(
index=index,
similarity_top_k=similarity_top_k,
doc_ids=context_filter.docs_ids if context_filter else None,
vector_store_kwargs={
"where": _chromadb_doc_id_metadata_filter(context_filter)
},
filters=(
_doc_id_metadata_filter(context_filter)
if self.settings.vectorstore.database != "qdrant"
else None
),
)
def close(self) -> None:

View File

@ -1,9 +1,13 @@
"""FastAPI app creation, logger configuration and main API routes."""
import logging
from fastapi import Depends, FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware
from injector import Injector
from llama_index.core.callbacks import CallbackManager
from llama_index.core.callbacks.global_handlers import create_global_handler
from llama_index.core.settings import Settings as LlamaIndexSettings
from private_gpt.server.chat.chat_router import chat_router
from private_gpt.server.chunks.chunks_router import chunks_router
@ -31,6 +35,10 @@ def create_app(root_injector: Injector) -> FastAPI:
app.include_router(embeddings_router)
app.include_router(health_router)
# Add LlamaIndex simple observability
global_handler = create_global_handler("simple")
LlamaIndexSettings.callback_manager = CallbackManager([global_handler])
settings = root_injector.get(Settings)
if settings.server.cors.enabled:
logger.debug("Setting up CORS middleware")
@ -45,7 +53,12 @@ def create_app(root_injector: Injector) -> FastAPI:
if settings.ui.enabled:
logger.debug("Importing the UI module")
from private_gpt.ui.ui import PrivateGptUi
try:
from private_gpt.ui.ui import PrivateGptUi
except ImportError as e:
raise ImportError(
"UI dependencies not found, install with `poetry install --extras ui`"
) from e
ui = root_injector.get(PrivateGptUi)
ui.mount_in_app(app, settings.ui.path)

View File

@ -1,11 +1,6 @@
"""FastAPI app creation, logger configuration and main API routes."""
import llama_index
from private_gpt.di import global_injector
from private_gpt.launcher import create_app
# Add LlamaIndex simple observability
llama_index.set_global_handler("simple")
app = create_app(global_injector)

View File

@ -3,7 +3,7 @@ import uuid
from collections.abc import Iterator
from typing import Literal
from llama_index.llms import ChatResponse, CompletionResponse
from llama_index.core.llms import ChatResponse, CompletionResponse
from pydantic import BaseModel, Field
from private_gpt.server.chunks.chunks_service import Chunk

View File

@ -1,5 +1,5 @@
from fastapi import APIRouter, Depends, Request
from llama_index.llms import ChatMessage, MessageRole
from llama_index.core.llms import ChatMessage, MessageRole
from pydantic import BaseModel
from starlette.responses import StreamingResponse

View File

@ -1,14 +1,15 @@
from dataclasses import dataclass
from injector import inject, singleton
from llama_index import ServiceContext, StorageContext, VectorStoreIndex
from llama_index.chat_engine import ContextChatEngine, SimpleChatEngine
from llama_index.chat_engine.types import (
from llama_index.core.chat_engine import ContextChatEngine, SimpleChatEngine
from llama_index.core.chat_engine.types import (
BaseChatEngine,
)
from llama_index.indices.postprocessor import MetadataReplacementPostProcessor
from llama_index.llms import ChatMessage, MessageRole
from llama_index.types import TokenGen
from llama_index.core.indices import VectorStoreIndex
from llama_index.core.indices.postprocessor import MetadataReplacementPostProcessor
from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.core.storage import StorageContext
from llama_index.core.types import TokenGen
from pydantic import BaseModel
from private_gpt.components.embedding.embedding_component import EmbeddingComponent
@ -75,20 +76,19 @@ class ChatService:
embedding_component: EmbeddingComponent,
node_store_component: NodeStoreComponent,
) -> None:
self.llm_service = llm_component
self.llm_component = llm_component
self.embedding_component = embedding_component
self.vector_store_component = vector_store_component
self.storage_context = StorageContext.from_defaults(
vector_store=vector_store_component.vector_store,
docstore=node_store_component.doc_store,
index_store=node_store_component.index_store,
)
self.service_context = ServiceContext.from_defaults(
llm=llm_component.llm, embed_model=embedding_component.embedding_model
)
self.index = VectorStoreIndex.from_vector_store(
vector_store_component.vector_store,
storage_context=self.storage_context,
service_context=self.service_context,
llm=llm_component.llm,
embed_model=embedding_component.embedding_model,
show_progress=True,
)
@ -105,7 +105,7 @@ class ChatService:
return ContextChatEngine.from_defaults(
system_prompt=system_prompt,
retriever=vector_index_retriever,
service_context=self.service_context,
llm=self.llm_component.llm, # Takes no effect at the moment
node_postprocessors=[
MetadataReplacementPostProcessor(target_metadata_key="window"),
],
@ -113,7 +113,7 @@ class ChatService:
else:
return SimpleChatEngine.from_defaults(
system_prompt=system_prompt,
service_context=self.service_context,
llm=self.llm_component.llm,
)
def stream_chat(

View File

@ -1,8 +1,9 @@
from typing import TYPE_CHECKING, Literal
from injector import inject, singleton
from llama_index import ServiceContext, StorageContext, VectorStoreIndex
from llama_index.schema import NodeWithScore
from llama_index.core.indices import VectorStoreIndex
from llama_index.core.schema import NodeWithScore
from llama_index.core.storage import StorageContext
from pydantic import BaseModel, Field
from private_gpt.components.embedding.embedding_component import EmbeddingComponent
@ -15,7 +16,7 @@ from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.server.ingest.model import IngestedDoc
if TYPE_CHECKING:
from llama_index.schema import RelatedNodeInfo
from llama_index.core.schema import RelatedNodeInfo
class Chunk(BaseModel):
@ -63,14 +64,13 @@ class ChunksService:
node_store_component: NodeStoreComponent,
) -> None:
self.vector_store_component = vector_store_component
self.llm_component = llm_component
self.embedding_component = embedding_component
self.storage_context = StorageContext.from_defaults(
vector_store=vector_store_component.vector_store,
docstore=node_store_component.doc_store,
index_store=node_store_component.index_store,
)
self.query_service_context = ServiceContext.from_defaults(
llm=llm_component.llm, embed_model=embedding_component.embedding_model
)
def _get_sibling_nodes_text(
self, node_with_score: NodeWithScore, related_number: int, forward: bool = True
@ -103,7 +103,8 @@ class ChunksService:
index = VectorStoreIndex.from_vector_store(
self.vector_store_component.vector_store,
storage_context=self.storage_context,
service_context=self.query_service_context,
llm=self.llm_component.llm,
embed_model=self.embedding_component.embedding_model,
show_progress=True,
)
vector_index_retriever = self.vector_store_component.get_retriever(

View File

@ -4,11 +4,8 @@ from pathlib import Path
from typing import AnyStr, BinaryIO
from injector import inject, singleton
from llama_index import (
ServiceContext,
StorageContext,
)
from llama_index.node_parser import SentenceWindowNodeParser
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.storage import StorageContext
from private_gpt.components.embedding.embedding_component import EmbeddingComponent
from private_gpt.components.ingest.ingest_component import get_ingestion_component
@ -40,17 +37,12 @@ class IngestService:
index_store=node_store_component.index_store,
)
node_parser = SentenceWindowNodeParser.from_defaults()
self.ingest_service_context = ServiceContext.from_defaults(
llm=self.llm_service.llm,
embed_model=embedding_component.embedding_model,
node_parser=node_parser,
# Embeddings done early in the pipeline of node transformations, right
# after the node parsing
transformations=[node_parser, embedding_component.embedding_model],
)
self.ingest_component = get_ingestion_component(
self.storage_context, self.ingest_service_context, settings=settings()
self.storage_context,
embed_model=embedding_component.embedding_model,
transformations=[node_parser, embedding_component.embedding_model],
settings=settings(),
)
def _ingest_data(self, file_name: str, file_data: AnyStr) -> list[IngestedDoc]:

View File

@ -3,10 +3,9 @@ from pathlib import Path
from typing import Any
from watchdog.events import (
DirCreatedEvent,
DirModifiedEvent,
FileCreatedEvent,
FileModifiedEvent,
FileSystemEvent,
FileSystemEventHandler,
)
from watchdog.observers import Observer
@ -20,11 +19,11 @@ class IngestWatcher:
self.on_file_changed = on_file_changed
class Handler(FileSystemEventHandler):
def on_modified(self, event: DirModifiedEvent | FileModifiedEvent) -> None:
def on_modified(self, event: FileSystemEvent) -> None:
if isinstance(event, FileModifiedEvent):
on_file_changed(Path(event.src_path))
def on_created(self, event: DirCreatedEvent | FileCreatedEvent) -> None:
def on_created(self, event: FileSystemEvent) -> None:
if isinstance(event, FileCreatedEvent):
on_file_changed(Path(event.src_path))

View File

@ -1,6 +1,6 @@
from typing import Any, Literal
from llama_index import Document
from llama_index.core.schema import Document
from pydantic import BaseModel, Field

View File

@ -12,6 +12,7 @@ Authorization can be done by following fastapi's guides:
* https://fastapi.tiangolo.com/tutorial/security/
* https://fastapi.tiangolo.com/tutorial/dependencies/dependencies-in-path-operation-decorators/
"""
# mypy: ignore-errors
# Disabled mypy error: All conditional function variants must have identical signatures
# We are changing the implementation of the authenticated method, based on

View File

@ -81,7 +81,9 @@ class DataSettings(BaseModel):
class LLMSettings(BaseModel):
mode: Literal["local", "openai", "openailike", "sagemaker", "mock", "ollama"]
mode: Literal[
"llamacpp", "openai", "openailike", "azopenai", "sagemaker", "mock", "ollama"
]
max_new_tokens: int = Field(
256,
description="The maximum number of token that the LLM is authorized to generate in one completion.",
@ -98,18 +100,23 @@ class LLMSettings(BaseModel):
"like `HuggingFaceH4/zephyr-7b-beta`. If not set, will load a tokenizer matching "
"gpt-3.5-turbo LLM.",
)
temperature: float = Field(
0.1,
description="The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual.",
)
class VectorstoreSettings(BaseModel):
database: Literal["chroma", "qdrant", "pgvector"]
database: Literal["chroma", "qdrant", "postgres"]
class LocalSettings(BaseModel):
class NodeStoreSettings(BaseModel):
database: Literal["simple", "postgres"]
class LlamaCPPSettings(BaseModel):
llm_hf_repo_id: str
llm_hf_model_file: str
embedding_hf_model_name: str = Field(
description="Name of the HuggingFace model to use for embeddings"
)
prompt_style: Literal["default", "llama2", "tag", "mistral", "chatml"] = Field(
"llama2",
description=(
@ -122,9 +129,32 @@ class LocalSettings(BaseModel):
),
)
tfs_z: float = Field(
1.0,
description="Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.",
)
top_k: int = Field(
40,
description="Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)",
)
top_p: float = Field(
0.9,
description="Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)",
)
repeat_penalty: float = Field(
1.1,
description="Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)",
)
class HuggingFaceSettings(BaseModel):
embedding_hf_model_name: str = Field(
description="Name of the HuggingFace model to use for embeddings"
)
class EmbeddingSettings(BaseModel):
mode: Literal["local", "openai", "sagemaker", "mock"]
mode: Literal["huggingface", "openai", "azopenai", "sagemaker", "ollama", "mock"]
ingest_mode: Literal["simple", "batch", "parallel"] = Field(
"simple",
description=(
@ -149,6 +179,10 @@ class EmbeddingSettings(BaseModel):
"Do not set it higher than your number of threads of your CPU."
),
)
embed_dim: int = Field(
384,
description="The dimension of the embeddings stored in the Postgres database",
)
class SagemakerSettings(BaseModel):
@ -173,10 +207,57 @@ class OllamaSettings(BaseModel):
"http://localhost:11434",
description="Base URL of Ollama API. Example: 'https://localhost:11434'.",
)
model: str = Field(
llm_model: str = Field(
None,
description="Model to use. Example: 'llama2-uncensored'.",
)
embedding_model: str = Field(
None,
description="Model to use. Example: 'nomic-embed-text'.",
)
tfs_z: float = Field(
1.0,
description="Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.",
)
num_predict: int = Field(
None,
description="Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)",
)
top_k: int = Field(
40,
description="Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)",
)
top_p: float = Field(
0.9,
description="Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)",
)
repeat_last_n: int = Field(
64,
description="Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)",
)
repeat_penalty: float = Field(
1.1,
description="Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)",
)
class AzureOpenAISettings(BaseModel):
api_key: str
azure_endpoint: str
api_version: str = Field(
"2023_05_15",
description="The API version to use for this operation. This follows the YYYY-MM-DD format.",
)
embedding_deployment_name: str
embedding_model: str = Field(
"text-embedding-ada-002",
description="OpenAI Model to use. Example: 'text-embedding-ada-002'.",
)
llm_deployment_name: str
llm_model: str = Field(
"gpt-35-turbo",
description="OpenAI Model to use. Example: 'gpt-4'.",
)
class UISettings(BaseModel):
@ -197,7 +278,7 @@ class UISettings(BaseModel):
)
class PGVectorSettings(BaseModel):
class PostgresSettings(BaseModel):
host: str = Field(
"localhost",
description="The server hosting the Postgres database",
@ -218,17 +299,9 @@ class PGVectorSettings(BaseModel):
"postgres",
description="The database to use to connect to the Postgres database",
)
embed_dim: int = Field(
384,
description="The dimension of the embeddings stored in the Postgres database",
)
schema_name: str = Field(
"public",
description="The name of the schema in the Postgres database where the embeddings are stored",
)
table_name: str = Field(
"embeddings",
description="The name of the table in the Postgres database where the embeddings are stored",
description="The name of the schema in the Postgres database to use",
)
@ -292,13 +365,16 @@ class Settings(BaseModel):
ui: UISettings
llm: LLMSettings
embedding: EmbeddingSettings
local: LocalSettings
llamacpp: LlamaCPPSettings
huggingface: HuggingFaceSettings
sagemaker: SagemakerSettings
openai: OpenAISettings
ollama: OllamaSettings
azopenai: AzureOpenAISettings
vectorstore: VectorstoreSettings
nodestore: NodeStoreSettings
qdrant: QdrantSettings | None = None
pgvector: PGVectorSettings | None = None
postgres: PostgresSettings | None = None
"""

View File

@ -1,4 +1,5 @@
"""This file should be imported if and only if you want to run the UI locally."""
import itertools
import logging
import time
@ -10,7 +11,7 @@ import gradio as gr # type: ignore
from fastapi import FastAPI
from gradio.themes.utils.colors import slate # type: ignore
from injector import inject, singleton
from llama_index.llms import ChatMessage, ChatResponse, MessageRole
from llama_index.core.llms import ChatMessage, ChatResponse, MessageRole
from pydantic import BaseModel
from private_gpt.constants import PROJECT_ROOT_PATH
@ -44,8 +45,8 @@ class Source(BaseModel):
frozen = True
@staticmethod
def curate_sources(sources: list[Chunk]) -> set["Source"]:
curated_sources = set()
def curate_sources(sources: list[Chunk]) -> list["Source"]:
curated_sources = []
for chunk in sources:
doc_metadata = chunk.document.doc_metadata
@ -54,7 +55,10 @@ class Source(BaseModel):
page_label = doc_metadata.get("page_label", "-") if doc_metadata else "-"
source = Source(file=file_name, page=page_label, text=chunk.text)
curated_sources.add(source)
curated_sources.append(source)
curated_sources = list(
dict.fromkeys(curated_sources).keys()
) # Unique sources only
return curated_sources
@ -96,10 +100,15 @@ class PrivateGptUi:
if completion_gen.sources:
full_response += SOURCES_SEPARATOR
cur_sources = Source.curate_sources(completion_gen.sources)
sources_text = "\n\n\n".join(
f"{index}. {source.file} (page {source.page})"
for index, source in enumerate(cur_sources, start=1)
)
sources_text = "\n\n\n"
used_files = set()
for index, source in enumerate(cur_sources, start=1):
if (source.file + "-" + source.page) not in used_files:
sources_text = (
sources_text
+ f"{index}. {source.file} (page {source.page}) \n\n"
)
used_files.add(source.file + "-" + source.page)
full_response += sources_text
yield full_response

View File

@ -1,25 +1,62 @@
[tool.poetry]
name = "private-gpt"
version = "0.2.0"
version = "0.4.0"
description = "Private GPT"
authors = ["Zylon <hi@zylon.ai>"]
[tool.poetry.dependencies]
python = ">=3.11,<3.12"
fastapi = { extras = ["all"], version = "^0.103.1" }
boto3 = "^1.28.56"
# PrivateGPT
fastapi = { extras = ["all"], version = "^0.110.0" }
python-multipart = "^0.0.9"
injector = "^0.21.0"
pyyaml = "^6.0.1"
python-multipart = "^0.0.6"
pypdf = "^3.16.2"
llama-index = { extras = ["local_models"], version = "0.9.3" }
watchdog = "^3.0.0"
qdrant-client = "^1.6.9"
chromadb = {version = "^0.4.13", optional = true}
asyncpg = {version = "^0.29.0", optional = true}
pgvector = {version = "^0.2.5", optional = true}
psycopg2-binary = {version = "^2.9.9", optional = true}
sqlalchemy = {version = "^2.0.27", optional = true}
watchdog = "^4.0.0"
transformers = "^4.38.2"
# LlamaIndex core libs
llama-index-core = "^0.10.14"
llama-index-readers-file = "^0.1.6"
# Optional LlamaIndex integration libs
llama-index-llms-llama-cpp = {version = "^0.1.3", optional = true}
llama-index-llms-openai = {version = "^0.1.6", optional = true}
llama-index-llms-openai-like = {version ="^0.1.3", optional = true}
llama-index-llms-ollama = {version ="^0.1.2", optional = true}
llama-index-llms-azure-openai = {version ="^0.1.5", optional = true}
llama-index-embeddings-ollama = {version ="^0.1.2", optional = true}
llama-index-embeddings-huggingface = {version ="^0.1.4", optional = true}
llama-index-embeddings-openai = {version ="^0.1.6", optional = true}
llama-index-embeddings-azure-openai = {version ="^0.1.6", optional = true}
llama-index-vector-stores-qdrant = {version ="^0.1.3", optional = true}
llama-index-vector-stores-chroma = {version ="^0.1.4", optional = true}
llama-index-vector-stores-postgres = {version ="^0.1.2", optional = true}
llama-index-storage-docstore-postgres = {version ="^0.1.2", optional = true}
llama-index-storage-index-store-postgres = {version ="^0.1.2", optional = true}
# Postgres
psycopg2-binary = {version ="^2.9.9", optional = true}
asyncpg = {version="^0.29.0", optional = true}
# Optional Sagemaker dependency
boto3 = {version ="^1.34.51", optional = true}
# Optional UI
gradio = {version ="^4.19.2", optional = true}
[tool.poetry.extras]
ui = ["gradio"]
llms-llama-cpp = ["llama-index-llms-llama-cpp"]
llms-openai = ["llama-index-llms-openai"]
llms-openai-like = ["llama-index-llms-openai-like"]
llms-ollama = ["llama-index-llms-ollama"]
llms-sagemaker = ["boto3"]
llms-azopenai = ["llama-index-llms-azure-openai"]
embeddings-ollama = ["llama-index-embeddings-ollama"]
embeddings-huggingface = ["llama-index-embeddings-huggingface"]
embeddings-openai = ["llama-index-embeddings-openai"]
embeddings-sagemaker = ["boto3"]
embeddings-azopenai = ["llama-index-embeddings-azure-openai"]
vector-stores-qdrant = ["llama-index-vector-stores-qdrant"]
vector-stores-chroma = ["llama-index-vector-stores-chroma"]
vector-stores-postgres = ["llama-index-vector-stores-postgres"]
storage-nodestore-postgres = ["llama-index-storage-docstore-postgres","llama-index-storage-index-store-postgres","psycopg2-binary","asyncpg"]
[tool.poetry.group.dev.dependencies]
black = "^22"
@ -31,26 +68,6 @@ ruff = "^0"
pytest-asyncio = "^0.21.1"
types-pyyaml = "^6.0.12.12"
# Dependencies for gradio UI
[tool.poetry.group.ui]
optional = true
[tool.poetry.group.ui.dependencies]
gradio = "^4.19.0"
[tool.poetry.group.local]
optional = true
[tool.poetry.group.local.dependencies]
llama-cpp-python = "^0.2.23"
numpy = "1.26.0"
sentence-transformers = "^2.2.2"
# https://stackoverflow.com/questions/76327419/valueerror-libcublas-so-0-9-not-found-in-the-system-path
torch = ">=2.0.0, !=2.0.1, !=2.1.0"
transformers = "^4.34.0"
[tool.poetry.extras]
chroma = ["chromadb"]
pgvector = ["sqlalchemy", "pgvector", "psycopg2-binary", "asyncpg"]
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
@ -143,6 +160,9 @@ explicit_package_bases = true
warn_unused_ignores = false
exclude = ["tests"]
[tool.mypy-llama-index]
ignore_missing_imports = true
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]

View File

@ -19,19 +19,19 @@ os.makedirs(models_path, exist_ok=True)
# Download Embedding model
embedding_path = models_path / "embedding"
print(f"Downloading embedding {settings().local.embedding_hf_model_name}")
print(f"Downloading embedding {settings().huggingface.embedding_hf_model_name}")
snapshot_download(
repo_id=settings().local.embedding_hf_model_name,
repo_id=settings().huggingface.embedding_hf_model_name,
cache_dir=models_cache_path,
local_dir=embedding_path,
)
print("Embedding model downloaded!")
# Download LLM and create a symlink to the model file
print(f"Downloading LLM {settings().local.llm_hf_model_file}")
print(f"Downloading LLM {settings().llamacpp.llm_hf_model_file}")
hf_hub_download(
repo_id=settings().local.llm_hf_repo_id,
filename=settings().local.llm_hf_model_file,
repo_id=settings().llamacpp.llm_hf_repo_id,
filename=settings().llamacpp.llm_hf_model_file,
cache_dir=models_cache_path,
local_dir=models_path,
resume_download=resume_download,

17
settings-azopenai.yaml Normal file
View File

@ -0,0 +1,17 @@
server:
env_name: ${APP_ENV:azopenai}
llm:
mode: azopenai
embedding:
mode: azopenai
azopenai:
api_key: ${AZ_OPENAI_API_KEY:}
azure_endpoint: ${AZ_OPENAI_ENDPOINT:}
embedding_deployment_name: ${AZ_OPENAI_EMBEDDING_DEPLOYMENT_NAME:}
llm_deployment_name: ${AZ_OPENAI_LLM_DEPLOYMENT_NAME:}
api_version: "2023-05-15"
embedding_model: text-embedding-ada-002
llm_model: gpt-35-turbo

View File

@ -8,9 +8,11 @@ llm:
embedding:
mode: ${PGPT_MODE:sagemaker}
local:
llamacpp:
llm_hf_repo_id: ${PGPT_HF_REPO_ID:TheBloke/Mistral-7B-Instruct-v0.1-GGUF}
llm_hf_model_file: ${PGPT_HF_MODEL_FILE:mistral-7b-instruct-v0.1.Q4_K_M.gguf}
huggingface:
embedding_hf_model_name: ${PGPT_EMBEDDING_HF_MODEL_NAME:BAAI/bge-small-en-v1.5}
sagemaker:

View File

@ -2,4 +2,25 @@ server:
env_name: ${APP_ENV:local}
llm:
mode: local
mode: llamacpp
# Should be matching the selected model
max_new_tokens: 512
context_window: 3900
tokenizer: mistralai/Mistral-7B-Instruct-v0.2
llamacpp:
prompt_style: "mistral"
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
embedding:
mode: huggingface
huggingface:
embedding_hf_model_name: BAAI/bge-small-en-v1.5
vectorstore:
database: qdrant
qdrant:
path: local_data/private_gpt/qdrant

View File

@ -4,5 +4,6 @@ server:
# This configuration allows you to use GPU for creating embeddings while avoiding loading LLM into vRAM
llm:
mode: mock
embedding:
mode: local
mode: huggingface

34
settings-ollama-pg.yaml Normal file
View File

@ -0,0 +1,34 @@
# Using ollama and postgres for the vector, doc and index store. Ollama is also used for embeddings.
# To use install these extras:
# poetry install --extras "llms-ollama ui vector-stores-postgres embeddings-ollama storage-nodestore-postgres"
server:
env_name: ${APP_ENV:ollama}
llm:
mode: ollama
max_new_tokens: 512
context_window: 3900
embedding:
mode: ollama
embed_dim: 768
ollama:
llm_model: mistral
embedding_model: nomic-embed-text
api_base: http://localhost:11434
nodestore:
database: postgres
vectorstore:
database: postgres
postgres:
host: localhost
port: 5432
database: postgres
user: postgres
password: admin
schema_name: private_gpt

27
settings-ollama.yaml Normal file
View File

@ -0,0 +1,27 @@
server:
env_name: ${APP_ENV:ollama}
llm:
mode: ollama
max_new_tokens: 512
context_window: 3900
temperature: 0.1 #The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)
embedding:
mode: ollama
ollama:
llm_model: mistral
embedding_model: nomic-embed-text
api_base: http://localhost:11434
tfs_z: 1.0 # Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.
top_k: 40 # Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)
top_p: 0.9 # Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
repeat_last_n: 64 # Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)
repeat_penalty: 1.2 # Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)
vectorstore:
database: qdrant
qdrant:
path: local_data/private_gpt/qdrant

12
settings-openai.yaml Normal file
View File

@ -0,0 +1,12 @@
server:
env_name: ${APP_ENV:openai}
llm:
mode: openai
embedding:
mode: openai
openai:
api_key: ${OPENAI_API_KEY:}
model: gpt-3.5-turbo

View File

@ -1,5 +1,5 @@
server:
env_name: ${APP_ENV:prod}
env_name: ${APP_ENV:sagemaker}
port: ${PORT:8001}
ui:
@ -9,6 +9,9 @@ ui:
llm:
mode: sagemaker
embedding:
mode: sagemaker
sagemaker:
llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140
embedding_endpoint_name: huggingface-pytorch-inference-2023-11-03-07-41-36-479
llm_endpoint_name: llm
embedding_endpoint_name: embedding

View File

@ -14,5 +14,8 @@ qdrant:
llm:
mode: mock
embedding:
mode: mock
ui:
enabled: false

View File

@ -1,11 +1,14 @@
server:
env_name: ${APP_ENV:vllm}
llm:
mode: openailike
embedding:
mode: local
mode: huggingface
ingest_mode: simple
local:
huggingface:
embedding_hf_model_name: BAAI/bge-small-en-v1.5
openai:

View File

@ -34,40 +34,48 @@ ui:
delete_file_button_enabled: true
delete_all_files_button_enabled: true
llm:
mode: local
mode: llamacpp
# Should be matching the selected model
max_new_tokens: 512
context_window: 3900
tokenizer: mistralai/Mistral-7B-Instruct-v0.2
temperature: 0.1 # The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)
llamacpp:
prompt_style: "mistral"
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
tfs_z: 1.0 # Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting
top_k: 40 # Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)
top_p: 1.0 # Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
repeat_penalty: 1.1 # Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)
embedding:
# Should be matching the value above in most cases
mode: local
mode: huggingface
ingest_mode: simple
embed_dim: 384 # 384 is for BAAI/bge-small-en-v1.5
huggingface:
embedding_hf_model_name: BAAI/bge-small-en-v1.5
vectorstore:
database: qdrant
nodestore:
database: simple
qdrant:
path: local_data/private_gpt/qdrant
pgvector:
postgres:
host: localhost
port: 5432
database: postgres
user: postgres
password: postgres
embed_dim: 384 # 384 is for BAAI/bge-small-en-v1.5
schema_name: private_gpt
table_name: embeddings
local:
prompt_style: "mistral"
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
embedding_hf_model_name: BAAI/bge-small-en-v1.5
sagemaker:
llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140
@ -78,4 +86,15 @@ openai:
model: gpt-3.5-turbo
ollama:
model: llama2-uncensored
llm_model: llama2
embedding_model: nomic-embed-text
api_base: http://localhost:11434
azopenai:
api_key: ${AZ_OPENAI_API_KEY:}
azure_endpoint: ${AZ_OPENAI_ENDPOINT:}
embedding_deployment_name: ${AZ_OPENAI_EMBEDDING_DEPLOYMENT_NAME:}
llm_deployment_name: ${AZ_OPENAI_LLM_DEPLOYMENT_NAME:}
api_version: "2023-05-15"
embedding_model: text-embedding-ada-002
llm_model: gpt-35-turbo

View File

@ -5,6 +5,7 @@ NOTE: We are not testing the switch based on the config in
is currently architecture (it is hard to patch the `settings` and the app while
the tests are directly importing them).
"""
from typing import Annotated
import pytest

View File

@ -1,5 +1,5 @@
import pytest
from llama_index.llms import ChatMessage, MessageRole
from llama_index.core.llms import ChatMessage, MessageRole
from private_gpt.components.llm.prompt_helper import (
ChatMLPromptStyle,

2
tiktoken_cache/.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
*
!.gitignore

View File

@ -1 +1 @@
0.3.0
0.4.0