mirror of
https://github.com/imartinez/privateGPT.git
synced 2025-04-27 03:11:11 +00:00
feat: Upgrade to LlamaIndex to 0.10 (#1663)
* Extract optional dependencies * Separate local mode into llms-llama-cpp and embeddings-huggingface for clarity * Support Ollama embeddings * Upgrade to llamaindex 0.10.14. Remove legacy use of ServiceContext in ContextChatEngine * Fix vector retriever filters
This commit is contained in:
parent
12f3a39e8a
commit
45f05711eb
@ -25,6 +25,6 @@ runs:
|
||||
python-version: ${{ inputs.python_version }}
|
||||
cache: "poetry"
|
||||
- name: Install Dependencies
|
||||
run: poetry install --with ui --no-root
|
||||
run: poetry install --extras "ui vector-stores-qdrant" --no-root
|
||||
shell: bash
|
||||
|
||||
|
@ -14,7 +14,7 @@ FROM base as dependencies
|
||||
WORKDIR /home/worker/app
|
||||
COPY pyproject.toml poetry.lock ./
|
||||
|
||||
RUN poetry install --with ui
|
||||
RUN poetry install --extras "ui vector-stores-qdrant"
|
||||
|
||||
FROM base as app
|
||||
|
||||
|
@ -24,8 +24,7 @@ FROM base as dependencies
|
||||
WORKDIR /home/worker/app
|
||||
COPY pyproject.toml poetry.lock ./
|
||||
|
||||
RUN poetry install --with local
|
||||
RUN poetry install --with ui
|
||||
RUN poetry install --extras "ui embeddings-huggingface llms-llama-cpp vector-stores-qdrant"
|
||||
|
||||
FROM base as app
|
||||
|
||||
|
@ -30,15 +30,15 @@ navigation:
|
||||
layout:
|
||||
- section: Welcome
|
||||
contents:
|
||||
- page: Welcome
|
||||
- page: Introduction
|
||||
path: ./docs/pages/overview/welcome.mdx
|
||||
- page: Quickstart
|
||||
path: ./docs/pages/overview/quickstart.mdx
|
||||
# How to install privateGPT, with FAQ and troubleshooting
|
||||
- tab: installation
|
||||
layout:
|
||||
- section: Getting started
|
||||
contents:
|
||||
- page: Main Concepts
|
||||
path: ./docs/pages/installation/concepts.mdx
|
||||
- page: Installation
|
||||
path: ./docs/pages/installation/installation.mdx
|
||||
# Manual of privateGPT: how to use it and configure it
|
||||
|
60
fern/docs/pages/installation/concepts.mdx
Normal file
60
fern/docs/pages/installation/concepts.mdx
Normal file
@ -0,0 +1,60 @@
|
||||
PrivateGPT is a service that wraps a set of AI RAG primitives in a comprehensive set of APIs providing a private, secure, customizable and easy to use GenAI development framework.
|
||||
|
||||
It uses FastAPI and LLamaIndex as its core frameworks. Those can be customized by changing the codebase itself.
|
||||
|
||||
It supports a variety of LLM providers, embeddings providers, and vector stores, both local and remote. Those can be easily changed without changing the codebase.
|
||||
|
||||
# Different Setups support
|
||||
|
||||
## Setup configurations available
|
||||
You get to decide the setup for these 3 main components:
|
||||
- LLM: the large language model provider used for inference. It can be local, or remote, or even OpenAI.
|
||||
- Embeddings: the embeddings provider used to encode the input, the documents and the users' queries. Same as the LLM, it can be local, or remote, or even OpenAI.
|
||||
- Vector store: the store used to index and retrieve the documents.
|
||||
|
||||
There is an extra component that can be enabled or disabled: the UI. It is a Gradio UI that allows to interact with the API in a more user-friendly way.
|
||||
|
||||
### Setups and Dependencies
|
||||
Your setup will be the combination of the different options available. You'll find recommended setups in the [installation](/installation) section.
|
||||
PrivateGPT uses poetry to manage its dependencies. You can install the dependencies for the different setups by running `poetry install --extras "<extra1> <extra2>..."`.
|
||||
Extras are the different options available for each component. For example, to install the dependencies for a a local setup with UI and qdrant as vector database, Ollama as LLM and HuggingFace as local embeddings, you would run
|
||||
|
||||
`poetry install --extras "ui vector-stores-qdrant llms-ollama embeddings-huggingface"`.
|
||||
|
||||
Refer to the [installation](/installation) section for more details.
|
||||
|
||||
### Setups and Configuration
|
||||
PrivateGPT uses yaml to define its configuration in files named `settings-<profile>.yaml`.
|
||||
Different configuration files can be created in the root directory of the project.
|
||||
PrivateGPT will load the configuration at startup from the profile specified in the `PGPT_PROFILES` environment variable.
|
||||
For example, running:
|
||||
```bash
|
||||
PGPT_PROFILES=ollama make run
|
||||
```
|
||||
will load the configuration from `settings.yaml` and `settings-ollama.yaml`.
|
||||
- `settings.yaml` is always loaded and contains the default configuration.
|
||||
- `settings-ollama.yaml` is loaded if the `ollama` profile is specified in the `PGPT_PROFILES` environment variable. It can override configuration from the default `settings.yaml`
|
||||
|
||||
## About Fully Local Setups
|
||||
In order to run PrivateGPT in a fully local setup, you will need to run the LLM, Embeddings and Vector Store locally.
|
||||
### Vector stores
|
||||
The vector stores supported (Qdrant, ChromaDB and Postgres) run locally by default.
|
||||
### Embeddings
|
||||
For local Embeddings there are two options:
|
||||
* (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
|
||||
* You can use the 'embeddings-huggingface' option in PrivateGPT, which will use HuggingFace.
|
||||
|
||||
In order for HuggingFace LLM to work (the second option), you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
|
||||
```bash
|
||||
poetry run python scripts/setup
|
||||
```
|
||||
|
||||
### LLM
|
||||
For local LLM there are two options:
|
||||
* (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
|
||||
* You can use the 'llms-llama-cpp' option in PrivateGPT, which will use LlamaCPP. It works great on Mac with Metal most of the times (leverages Metal GPU), but it can be tricky in certain Linux and Windows distributions, depending on the GPU. In the installation document you'll find guides and troubleshooting.
|
||||
|
||||
In order for LlamaCPP powered LLM to work (the second option), you need to download the LLM model to the `models` folder. You can do so by running the `setup` script:
|
||||
```bash
|
||||
poetry run python scripts/setup
|
||||
```
|
@ -1,8 +1,8 @@
|
||||
## Installation and Settings
|
||||
It is important that you review the Main Concepts before you start the installation process.
|
||||
|
||||
### Base requirements to run PrivateGPT
|
||||
## Base requirements to run PrivateGPT
|
||||
|
||||
* Git clone PrivateGPT repository, and navigate to it:
|
||||
* Clone PrivateGPT repository, and navigate to it:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/imartinez/privateGPT
|
||||
@ -21,93 +21,180 @@ pyenv local 3.11
|
||||
|
||||
* Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) for dependency management:
|
||||
|
||||
* Have a valid C++ compiler like gcc. See [Troubleshooting: C++ Compiler](#troubleshooting-c-compiler) for more details.
|
||||
|
||||
* Install `make` for scripts:
|
||||
* Install `make` to be able to run the different scripts:
|
||||
* osx: (Using homebrew): `brew install make`
|
||||
* windows: (Using chocolatey) `choco install make`
|
||||
|
||||
### Install dependencies
|
||||
## Install and run your desired setup
|
||||
|
||||
Install the dependencies:
|
||||
PrivateGPT allows to customize the setup -from fully local to cloud based- by deciding the modules to use.
|
||||
Here are the different options available:
|
||||
|
||||
- LLM: "llama-cpp", "ollama", "sagemaker", "openai", "openailike"
|
||||
- Embeddings: "huggingface", "openai", "sagemaker"
|
||||
- Vector stores: "qdrant", "chroma", "postgres"
|
||||
- UI: whether or not to enable UI (Gradio) or just go with the API
|
||||
|
||||
In order to only install the required dependencies, PrivateGPT offers different `extras` that can be combined during the installation process:
|
||||
|
||||
```bash
|
||||
poetry install --with ui
|
||||
poetry install --extras "<extra1> <extra2>..."
|
||||
```
|
||||
|
||||
Verify everything is working by running `make run` (or `poetry run python -m private_gpt`) and navigate to
|
||||
http://localhost:8001. You should see a [Gradio UI](https://gradio.app/) **configured with a mock LLM** that will
|
||||
echo back the input. Below we'll see how to configure a real LLM.
|
||||
Where `<extra>` can be any of the following:
|
||||
|
||||
### Settings
|
||||
- ui: adds support for UI using Gradio
|
||||
- llms-ollama: adds support for Ollama LLM, the easiest way to get a local LLM running, requires Ollama running locally
|
||||
- llms-llama-cpp: adds support for local LLM using LlamaCPP - expect a messy installation process on some platforms
|
||||
- llms-sagemaker: adds support for Amazon Sagemaker LLM, requires Sagemaker inference endpoints
|
||||
- llms-openai: adds support for OpenAI LLM, requires OpenAI API key
|
||||
- llms-openai-like: adds support for 3rd party LLM providers that are compatible with OpenAI's API
|
||||
- embeddings-ollama: adds support for Ollama Embeddings, requires Ollama running locally
|
||||
- embeddings-huggingface: adds support for local Embeddings using HuggingFace
|
||||
- embeddings-sagemaker: adds support for Amazon Sagemaker Embeddings, requires Sagemaker inference endpoints
|
||||
- embeddings-openai = adds support for OpenAI Embeddings, requires OpenAI API key
|
||||
- vector-stores-qdrant: adds support for Qdrant vector store
|
||||
- vector-stores-chroma: adds support for Chroma DB vector store
|
||||
- vector-stores-postgres: adds support for Postgres vector store
|
||||
|
||||
<Callout intent="info">
|
||||
The default settings of PrivateGPT should work out-of-the-box for a 100% local setup. **However**, as is, it runs exclusively on your CPU.
|
||||
Skip this section if you just want to test PrivateGPT locally, and come back later to learn about more configuration options (and have better performances).
|
||||
</Callout>
|
||||
## Recommended Setups
|
||||
|
||||
<br />
|
||||
There are just some examples of recommended setups. You can mix and match the different options to fit your needs.
|
||||
You'll find more information in the Manual section of the documentation.
|
||||
|
||||
### Local LLM requirements
|
||||
> **Important for Windows**: In the examples below or how to run PrivateGPT with `make run`, `PGPT_PROFILES` env var is being set inline following Unix command line syntax (works on MacOS and Linux).
|
||||
If you are using Windows, you'll need to set the env var in a different way, for example:
|
||||
|
||||
Install extra dependencies for local execution:
|
||||
```powershell
|
||||
# Powershell
|
||||
$env:PGPT_PROFILES="ollama"
|
||||
make run
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```cmd
|
||||
# CMD
|
||||
set PGPT_PROFILES=ollama
|
||||
make run
|
||||
```
|
||||
|
||||
### Local, Ollama-powered setup - RECOMMENDED
|
||||
|
||||
**The easiest way to run PrivateGPT fully locally** is to depend on Ollama for the LLM. Ollama provides local LLM and Embeddings super easy to install and use, abstracting the complexity of GPU support. It's the recommended setup for local development.
|
||||
|
||||
Go to [ollama.ai](https://ollama.ai/) and follow the instructions to install Ollama on your machine.
|
||||
|
||||
After the installation, make sure the Ollama desktop app is closed.
|
||||
|
||||
Install the models to be used, the default settings-ollama.yaml is configured to user `mistral 7b` LLM (~4GB) and `nomic-embed-text` Embeddings (~275MB). Therefore:
|
||||
|
||||
```bash
|
||||
poetry install --with local
|
||||
ollama pull mistral
|
||||
ollama pull nomic-embed-text
|
||||
```
|
||||
|
||||
For PrivateGPT to run fully locally GPU acceleration is required
|
||||
(CPU execution is possible, but very slow), however,
|
||||
typical Macbook laptops or window desktops with mid-range GPUs lack VRAM to run
|
||||
even the smallest LLMs. For that reason
|
||||
**local execution is only supported for models compatible with [llama.cpp](https://github.com/ggerganov/llama.cpp)**
|
||||
Now, start Ollama service (it will start a local inference server, serving both the LLM and the Embeddings):
|
||||
```bash
|
||||
ollama serve
|
||||
```
|
||||
|
||||
These two models are known to work well:
|
||||
Once done, on a different terminal, you can install PrivateGPT with the following command:
|
||||
```bash
|
||||
poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
|
||||
```
|
||||
|
||||
* https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF
|
||||
* https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF (recommended)
|
||||
Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.
|
||||
|
||||
To ease the installation process, use the `setup` script that will download both
|
||||
the embedding and the LLM model and place them in the correct location (under `models` folder):
|
||||
```bash
|
||||
PGPT_PROFILES=ollama make run
|
||||
```
|
||||
|
||||
PrivateGPT will use the already existing `settings-ollama.yaml` settings file, which is already configured to use Ollama LLM and Embeddings, and Qdrant. Review it and adapt it to your needs (different models, different Ollama port, etc.)
|
||||
|
||||
The UI will be available at http://localhost:8001
|
||||
|
||||
### Private, Sagemaker-powered setup
|
||||
|
||||
If you need more performance, you can run a version of PrivateGPT that relies on powerful AWS Sagemaker machines to serve the LLM and Embeddings.
|
||||
|
||||
You need to have access to sagemaker inference endpoints for the LLM and / or the embeddings, and have AWS credentials properly configured.
|
||||
|
||||
Edit the `settings-sagemaker.yaml` file to include the correct Sagemaker endpoints.
|
||||
|
||||
Then, install PrivateGPT with the following command:
|
||||
```bash
|
||||
poetry install --extras "ui llms-sagemaker embeddings-sagemaker vector-stores-qdrant"
|
||||
```
|
||||
|
||||
Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.
|
||||
|
||||
```bash
|
||||
PGPT_PROFILES=sagemaker make run
|
||||
```
|
||||
|
||||
PrivateGPT will use the already existing `settings-sagemaker.yaml` settings file, which is already configured to use Sagemaker LLM and Embeddings endpoints, and Qdrant.
|
||||
|
||||
The UI will be available at http://localhost:8001
|
||||
|
||||
### Non-Private, OpenAI-powered test setup
|
||||
|
||||
If you want to test PrivateGPT with OpenAI's LLM and Embeddings -taking into account your data is going to OpenAI!- you can run the following command:
|
||||
|
||||
You need an OPENAI API key to run this setup.
|
||||
|
||||
Edit the `settings-openai.yaml` file to include the correct API KEY. Never commit it! It's a secret! As an alternative to editing `settings-openai.yaml`, you can just set the env var OPENAI_API_KEY.
|
||||
|
||||
Then, install PrivateGPT with the following command:
|
||||
```bash
|
||||
poetry install --extras "ui llms-openai embeddings-openai vector-stores-qdrant"
|
||||
```
|
||||
|
||||
Once installed, you can run PrivateGPT.
|
||||
|
||||
```bash
|
||||
PGPT_PROFILES=openai make run
|
||||
```
|
||||
|
||||
PrivateGPT will use the already existing `settings-openai.yaml` settings file, which is already configured to use OpenAI LLM and Embeddings endpoints, and Qdrant.
|
||||
|
||||
The UI will be available at http://localhost:8001
|
||||
|
||||
### Local, Llama-CPP powered setup
|
||||
|
||||
If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command:
|
||||
|
||||
```bash
|
||||
poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"
|
||||
```
|
||||
|
||||
In order for local LLM and embeddings to work, you need to download the models to the `models` folder. You can do so by running the `setup` script:
|
||||
```bash
|
||||
poetry run python scripts/setup
|
||||
```
|
||||
|
||||
If you are ok with CPU execution, you can skip the rest of this section.
|
||||
Once installed, you can run PrivateGPT with the following command:
|
||||
|
||||
As stated before, llama.cpp is required and in
|
||||
```bash
|
||||
PGPT_PROFILES=local make run
|
||||
```
|
||||
|
||||
PrivateGPT will load the already existing `settings-local.yaml` file, which is already configured to use LlamaCPP LLM, HuggingFace embeddings and Qdrant.
|
||||
|
||||
The UI will be available at http://localhost:8001
|
||||
|
||||
#### Llama-CPP support
|
||||
|
||||
For PrivateGPT to run fully locally without Ollama, Llama.cpp is required and in
|
||||
particular [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
|
||||
is used.
|
||||
|
||||
You'll need to have a valid C++ compiler like gcc installed. See [Troubleshooting: C++ Compiler](#troubleshooting-c-compiler) for more details.
|
||||
|
||||
> It's highly encouraged that you fully read llama-cpp and llama-cpp-python documentation relevant to your platform.
|
||||
> Running into installation issues is very likely, and you'll need to troubleshoot them yourself.
|
||||
|
||||
#### Customizing low level parameters
|
||||
|
||||
Currently, not all the parameters of `llama.cpp` and `llama-cpp-python` are available at PrivateGPT's `settings.yaml` file.
|
||||
In case you need to customize parameters such as the number of layers loaded into the GPU, you might change
|
||||
these at the `llm_component.py` file under the `private_gpt/components/llm/llm_component.py`.
|
||||
|
||||
##### Available LLM config options
|
||||
|
||||
The `llm` section of the settings allows for the following configurations:
|
||||
|
||||
- `mode`: how to run your llm
|
||||
- `max_new_tokens`: this lets you configure the number of new tokens the LLM will generate and add to the context window (by default Llama.cpp uses `256`)
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
llm:
|
||||
mode: local
|
||||
max_new_tokens: 256
|
||||
```
|
||||
|
||||
If you are getting an out of memory error, you might also try a smaller model or stick to the proposed
|
||||
recommended models, instead of custom tuning the parameters.
|
||||
|
||||
#### OSX GPU support
|
||||
##### Llama-CPP OSX GPU support
|
||||
|
||||
You will need to build [llama.cpp](https://github.com/ggerganov/llama.cpp) with metal support.
|
||||
|
||||
@ -127,7 +214,7 @@ More information is available in the documentation of the libraries themselves:
|
||||
* [llama-cpp-python's documentation](https://llama-cpp-python.readthedocs.io/en/latest/#installation-with-hardware-acceleration)
|
||||
* [llama.cpp](https://github.com/ggerganov/llama.cpp#build)
|
||||
|
||||
#### Windows NVIDIA GPU support
|
||||
##### Llama-CPP Windows NVIDIA GPU support
|
||||
|
||||
Windows GPU support is done through CUDA.
|
||||
Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
|
||||
@ -160,7 +247,7 @@ Note that llama.cpp offloads matrix calculations to the GPU but the performance
|
||||
still hit heavily due to latency between CPU and GPU communication. You might need to tweak
|
||||
batch sizes and other parameters to get the best performance for your particular system.
|
||||
|
||||
#### Linux NVIDIA GPU support and Windows-WSL
|
||||
##### Llama-CPP Linux NVIDIA GPU support and Windows-WSL
|
||||
|
||||
Linux GPU support is done through CUDA.
|
||||
Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
|
||||
@ -188,7 +275,7 @@ llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, co
|
||||
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
|
||||
```
|
||||
|
||||
### Known issues and Troubleshooting
|
||||
##### Llama-CPP Known issues and Troubleshooting
|
||||
|
||||
Execution of LLMs locally still has a lot of sharp edges, specially when running on non Linux platforms.
|
||||
You might encounter several issues:
|
||||
@ -205,7 +292,7 @@ If, during your installation, something does not go as planned, retry in *verbos
|
||||
|
||||
For example, when installing packages with `pip install`, you can add the option `-vvv` to show the details of the installation.
|
||||
|
||||
#### Troubleshooting: C++ Compiler
|
||||
##### Llama-CPP Troubleshooting: C++ Compiler
|
||||
|
||||
If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++
|
||||
compiler on your computer.
|
||||
@ -227,7 +314,7 @@ To install a C++ compiler on Windows 10/11, follow these steps:
|
||||
Store and search for Xcode and install it. **Or** you can install the command line tools by running `xcode-select --install`.
|
||||
2. If not, you can install clang or gcc with homebrew `brew install gcc`
|
||||
|
||||
#### Troubleshooting: Mac Running Intel
|
||||
##### Llama-CPP Troubleshooting: Mac Running Intel
|
||||
|
||||
When running a Mac with Intel hardware (not M1), you may run into _clang: error: the clang compiler does not support '
|
||||
-march=native'_ during pip install.
|
||||
|
@ -25,6 +25,30 @@ When the server is started it will print a log *Application startup complete*.
|
||||
Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API
|
||||
using Swagger UI.
|
||||
|
||||
#### Customizing low level parameters
|
||||
|
||||
Currently, not all the parameters of `llama.cpp` and `llama-cpp-python` are available at PrivateGPT's `settings.yaml` file.
|
||||
In case you need to customize parameters such as the number of layers loaded into the GPU, you might change
|
||||
these at the `llm_component.py` file under the `private_gpt/components/llm/llm_component.py`.
|
||||
|
||||
##### Available LLM config options
|
||||
|
||||
The `llm` section of the settings allows for the following configurations:
|
||||
|
||||
- `mode`: how to run your llm
|
||||
- `max_new_tokens`: this lets you configure the number of new tokens the LLM will generate and add to the context window (by default Llama.cpp uses `256`)
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
llm:
|
||||
mode: local
|
||||
max_new_tokens: 256
|
||||
```
|
||||
|
||||
If you are getting an out of memory error, you might also try a smaller model or stick to the proposed
|
||||
recommended models, instead of custom tuning the parameters.
|
||||
|
||||
### Using OpenAI
|
||||
|
||||
If you cannot run a local model (because you don't have a GPU, for example) or for testing purposes, you may
|
||||
|
@ -1,21 +0,0 @@
|
||||
## Local Installation steps
|
||||
|
||||
The steps in [Installation](/installation) section are better explained and cover more
|
||||
setup scenarios (macOS, Windows, Linux).
|
||||
But if you like one-liners, have python3.11 installed, and you are running a UNIX (macOS or Linux)
|
||||
system, you can get up and running on CPU in few lines:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/imartinez/privateGPT && cd privateGPT && \
|
||||
python3.11 -m venv .venv && source .venv/bin/activate && \
|
||||
pip install --upgrade pip poetry && poetry install --with ui,local && ./scripts/setup
|
||||
|
||||
# Launch the privateGPT API server **and** the gradio UI
|
||||
poetry run python3.11 -m private_gpt
|
||||
|
||||
# In another terminal, create a new browser window on your private GPT!
|
||||
open http://127.0.0.1:8001/
|
||||
```
|
||||
|
||||
The above is not working, or it is too slow, so **you want to run it on GPU(s)**?
|
||||
Please check the more detailed [installation guide](/installation).
|
@ -1,20 +1,19 @@
|
||||
## Introduction 👋
|
||||
|
||||
PrivateGPT provides an **API** containing all the building blocks required to
|
||||
build **private, context-aware AI applications**.
|
||||
The API follows and extends OpenAI API standard, and supports both normal and streaming responses.
|
||||
That means that, if you can use OpenAI API in one of your tools, you can use your own PrivateGPT API instead,
|
||||
with no code changes, **and for free** if you are running privateGPT in `local` mode.
|
||||
|
||||
Looking for the installation quickstart? [Quickstart installation guide for Linux and macOS](/overview/welcome/quickstart).
|
||||
|
||||
Do you want to install it on Windows? Or do you want to take full advantage of your hardware for better performances?
|
||||
The installation guide will help you in the [Installation section](/installation).
|
||||
with no code changes, **and for free** if you are running privateGPT in a `local` setup.
|
||||
|
||||
Get started by understanding the [Main Concepts and Installation](/installation) and then dive into the [API Reference](/api-reference).
|
||||
|
||||
## Frequently Visited Resources
|
||||
|
||||
<Cards>
|
||||
<Card
|
||||
title="Main Concepts"
|
||||
icon="fa-solid fa-lines-leaning"
|
||||
href="/installation"
|
||||
/>
|
||||
<Card
|
||||
title="API Reference"
|
||||
icon="fa-solid fa-code"
|
||||
@ -32,6 +31,9 @@ The installation guide will help you in the [Installation section](/installation
|
||||
/>
|
||||
</Cards>
|
||||
|
||||
<br />
|
||||
|
||||
|
||||
<Callout intent = "info">
|
||||
A working **Gradio UI client** is provided to test the API, together with a set of useful tools such as bulk
|
||||
model download script, ingestion script, documents folder watch, etc.
|
||||
|
1775
poetry.lock
generated
1775
poetry.lock
generated
File diff suppressed because it is too large
Load Diff
@ -3,7 +3,7 @@ import json
|
||||
from typing import Any
|
||||
|
||||
import boto3
|
||||
from llama_index.embeddings.base import BaseEmbedding
|
||||
from llama_index.core.base.embeddings.base import BaseEmbedding
|
||||
from pydantic import Field, PrivateAttr
|
||||
|
||||
|
||||
|
@ -1,8 +1,7 @@
|
||||
import logging
|
||||
|
||||
from injector import inject, singleton
|
||||
from llama_index import MockEmbedding
|
||||
from llama_index.embeddings.base import BaseEmbedding
|
||||
from llama_index.core.embeddings import BaseEmbedding, MockEmbedding
|
||||
|
||||
from private_gpt.paths import models_cache_path
|
||||
from private_gpt.settings.settings import Settings
|
||||
@ -19,27 +18,60 @@ class EmbeddingComponent:
|
||||
embedding_mode = settings.embedding.mode
|
||||
logger.info("Initializing the embedding model in mode=%s", embedding_mode)
|
||||
match embedding_mode:
|
||||
case "local":
|
||||
from llama_index.embeddings import HuggingFaceEmbedding
|
||||
case "huggingface":
|
||||
try:
|
||||
from llama_index.embeddings.huggingface import ( # type: ignore
|
||||
HuggingFaceEmbedding,
|
||||
)
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"Local dependencies not found, install with `poetry install --extras embeddings-huggingface`"
|
||||
) from e
|
||||
|
||||
self.embedding_model = HuggingFaceEmbedding(
|
||||
model_name=settings.local.embedding_hf_model_name,
|
||||
model_name=settings.huggingface.embedding_hf_model_name,
|
||||
cache_folder=str(models_cache_path),
|
||||
)
|
||||
case "sagemaker":
|
||||
|
||||
from private_gpt.components.embedding.custom.sagemaker import (
|
||||
SagemakerEmbedding,
|
||||
)
|
||||
try:
|
||||
from private_gpt.components.embedding.custom.sagemaker import (
|
||||
SagemakerEmbedding,
|
||||
)
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"Sagemaker dependencies not found, install with `poetry install --extras embeddings-sagemaker`"
|
||||
) from e
|
||||
|
||||
self.embedding_model = SagemakerEmbedding(
|
||||
endpoint_name=settings.sagemaker.embedding_endpoint_name,
|
||||
)
|
||||
case "openai":
|
||||
from llama_index import OpenAIEmbedding
|
||||
try:
|
||||
from llama_index.embeddings.openai import ( # type: ignore
|
||||
OpenAIEmbedding,
|
||||
)
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"OpenAI dependencies not found, install with `poetry install --extras embeddings-openai`"
|
||||
) from e
|
||||
|
||||
openai_settings = settings.openai.api_key
|
||||
self.embedding_model = OpenAIEmbedding(api_key=openai_settings)
|
||||
case "ollama":
|
||||
try:
|
||||
from llama_index.embeddings.ollama import ( # type: ignore
|
||||
OllamaEmbedding,
|
||||
)
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"Local dependencies not found, install with `poetry install --extras embeddings-ollama`"
|
||||
) from e
|
||||
|
||||
ollama_settings = settings.ollama
|
||||
self.embedding_model = OllamaEmbedding(
|
||||
model_name=ollama_settings.embedding_model,
|
||||
base_url=ollama_settings.api_base,
|
||||
)
|
||||
case "mock":
|
||||
# Not a random number, is the dimensionality used by
|
||||
# the default embedding model
|
||||
|
@ -8,16 +8,13 @@ import threading
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from llama_index import (
|
||||
Document,
|
||||
ServiceContext,
|
||||
StorageContext,
|
||||
VectorStoreIndex,
|
||||
load_index_from_storage,
|
||||
)
|
||||
from llama_index.data_structs import IndexDict
|
||||
from llama_index.indices.base import BaseIndex
|
||||
from llama_index.ingestion import run_transformations
|
||||
from llama_index.core.data_structs import IndexDict
|
||||
from llama_index.core.embeddings.utils import EmbedType
|
||||
from llama_index.core.indices import VectorStoreIndex, load_index_from_storage
|
||||
from llama_index.core.indices.base import BaseIndex
|
||||
from llama_index.core.ingestion import run_transformations
|
||||
from llama_index.core.schema import Document, TransformComponent
|
||||
from llama_index.core.storage import StorageContext
|
||||
|
||||
from private_gpt.components.ingest.ingest_helper import IngestionHelper
|
||||
from private_gpt.paths import local_data_path
|
||||
@ -30,13 +27,15 @@ class BaseIngestComponent(abc.ABC):
|
||||
def __init__(
|
||||
self,
|
||||
storage_context: StorageContext,
|
||||
service_context: ServiceContext,
|
||||
embed_model: EmbedType,
|
||||
transformations: list[TransformComponent],
|
||||
*args: Any,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
logger.debug("Initializing base ingest component type=%s", type(self).__name__)
|
||||
self.storage_context = storage_context
|
||||
self.service_context = service_context
|
||||
self.embed_model = embed_model
|
||||
self.transformations = transformations
|
||||
|
||||
@abc.abstractmethod
|
||||
def ingest(self, file_name: str, file_data: Path) -> list[Document]:
|
||||
@ -55,11 +54,12 @@ class BaseIngestComponentWithIndex(BaseIngestComponent, abc.ABC):
|
||||
def __init__(
|
||||
self,
|
||||
storage_context: StorageContext,
|
||||
service_context: ServiceContext,
|
||||
embed_model: EmbedType,
|
||||
transformations: list[TransformComponent],
|
||||
*args: Any,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
super().__init__(storage_context, service_context, *args, **kwargs)
|
||||
super().__init__(storage_context, embed_model, transformations, *args, **kwargs)
|
||||
|
||||
self.show_progress = True
|
||||
self._index_thread_lock = (
|
||||
@ -73,9 +73,10 @@ class BaseIngestComponentWithIndex(BaseIngestComponent, abc.ABC):
|
||||
# Load the index with store_nodes_override=True to be able to delete them
|
||||
index = load_index_from_storage(
|
||||
storage_context=self.storage_context,
|
||||
service_context=self.service_context,
|
||||
store_nodes_override=True, # Force store nodes in index and document stores
|
||||
show_progress=self.show_progress,
|
||||
embed_model=self.embed_model,
|
||||
transformations=self.transformations,
|
||||
)
|
||||
except ValueError:
|
||||
# There are no index in the storage context, creating a new one
|
||||
@ -83,9 +84,10 @@ class BaseIngestComponentWithIndex(BaseIngestComponent, abc.ABC):
|
||||
index = VectorStoreIndex.from_documents(
|
||||
[],
|
||||
storage_context=self.storage_context,
|
||||
service_context=self.service_context,
|
||||
store_nodes_override=True, # Force store nodes in index and document stores
|
||||
show_progress=self.show_progress,
|
||||
embed_model=self.embed_model,
|
||||
transformations=self.transformations,
|
||||
)
|
||||
index.storage_context.persist(persist_dir=local_data_path)
|
||||
return index
|
||||
@ -106,11 +108,12 @@ class SimpleIngestComponent(BaseIngestComponentWithIndex):
|
||||
def __init__(
|
||||
self,
|
||||
storage_context: StorageContext,
|
||||
service_context: ServiceContext,
|
||||
embed_model: EmbedType,
|
||||
transformations: list[TransformComponent],
|
||||
*args: Any,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
super().__init__(storage_context, service_context, *args, **kwargs)
|
||||
super().__init__(storage_context, embed_model, transformations, *args, **kwargs)
|
||||
|
||||
def ingest(self, file_name: str, file_data: Path) -> list[Document]:
|
||||
logger.info("Ingesting file_name=%s", file_name)
|
||||
@ -151,16 +154,17 @@ class BatchIngestComponent(BaseIngestComponentWithIndex):
|
||||
def __init__(
|
||||
self,
|
||||
storage_context: StorageContext,
|
||||
service_context: ServiceContext,
|
||||
embed_model: EmbedType,
|
||||
transformations: list[TransformComponent],
|
||||
count_workers: int,
|
||||
*args: Any,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
super().__init__(storage_context, service_context, *args, **kwargs)
|
||||
super().__init__(storage_context, embed_model, transformations, *args, **kwargs)
|
||||
# Make an efficient use of the CPU and GPU, the embedding
|
||||
# must be in the transformations
|
||||
assert (
|
||||
len(self.service_context.transformations) >= 2
|
||||
len(self.transformations) >= 2
|
||||
), "Embeddings must be in the transformations"
|
||||
assert count_workers > 0, "count_workers must be > 0"
|
||||
self.count_workers = count_workers
|
||||
@ -197,7 +201,7 @@ class BatchIngestComponent(BaseIngestComponentWithIndex):
|
||||
logger.debug("Transforming count=%s documents into nodes", len(documents))
|
||||
nodes = run_transformations(
|
||||
documents, # type: ignore[arg-type]
|
||||
self.service_context.transformations,
|
||||
self.transformations,
|
||||
show_progress=self.show_progress,
|
||||
)
|
||||
# Locking the index to avoid concurrent writes
|
||||
@ -225,16 +229,17 @@ class ParallelizedIngestComponent(BaseIngestComponentWithIndex):
|
||||
def __init__(
|
||||
self,
|
||||
storage_context: StorageContext,
|
||||
service_context: ServiceContext,
|
||||
embed_model: EmbedType,
|
||||
transformations: list[TransformComponent],
|
||||
count_workers: int,
|
||||
*args: Any,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
super().__init__(storage_context, service_context, *args, **kwargs)
|
||||
super().__init__(storage_context, embed_model, transformations, *args, **kwargs)
|
||||
# To make an efficient use of the CPU and GPU, the embeddings
|
||||
# must be in the transformations (to be computed in batches)
|
||||
assert (
|
||||
len(self.service_context.transformations) >= 2
|
||||
len(self.transformations) >= 2
|
||||
), "Embeddings must be in the transformations"
|
||||
assert count_workers > 0, "count_workers must be > 0"
|
||||
self.count_workers = count_workers
|
||||
@ -278,7 +283,7 @@ class ParallelizedIngestComponent(BaseIngestComponentWithIndex):
|
||||
logger.debug("Transforming count=%s documents into nodes", len(documents))
|
||||
nodes = run_transformations(
|
||||
documents, # type: ignore[arg-type]
|
||||
self.service_context.transformations,
|
||||
self.transformations,
|
||||
show_progress=self.show_progress,
|
||||
)
|
||||
# Locking the index to avoid concurrent writes
|
||||
@ -311,18 +316,29 @@ class ParallelizedIngestComponent(BaseIngestComponentWithIndex):
|
||||
|
||||
def get_ingestion_component(
|
||||
storage_context: StorageContext,
|
||||
service_context: ServiceContext,
|
||||
embed_model: EmbedType,
|
||||
transformations: list[TransformComponent],
|
||||
settings: Settings,
|
||||
) -> BaseIngestComponent:
|
||||
"""Get the ingestion component for the given configuration."""
|
||||
ingest_mode = settings.embedding.ingest_mode
|
||||
if ingest_mode == "batch":
|
||||
return BatchIngestComponent(
|
||||
storage_context, service_context, settings.embedding.count_workers
|
||||
storage_context=storage_context,
|
||||
embed_model=embed_model,
|
||||
transformations=transformations,
|
||||
count_workers=settings.embedding.count_workers,
|
||||
)
|
||||
elif ingest_mode == "parallel":
|
||||
return ParallelizedIngestComponent(
|
||||
storage_context, service_context, settings.embedding.count_workers
|
||||
storage_context=storage_context,
|
||||
embed_model=embed_model,
|
||||
transformations=transformations,
|
||||
count_workers=settings.embedding.count_workers,
|
||||
)
|
||||
else:
|
||||
return SimpleIngestComponent(storage_context, service_context)
|
||||
return SimpleIngestComponent(
|
||||
storage_context=storage_context,
|
||||
embed_model=embed_model,
|
||||
transformations=transformations,
|
||||
)
|
||||
|
@ -1,14 +1,58 @@
|
||||
import logging
|
||||
from pathlib import Path
|
||||
|
||||
from llama_index import Document
|
||||
from llama_index.readers import JSONReader, StringIterableReader
|
||||
from llama_index.readers.file.base import DEFAULT_FILE_READER_CLS
|
||||
from llama_index.core.readers import StringIterableReader
|
||||
from llama_index.core.readers.base import BaseReader
|
||||
from llama_index.core.readers.json import JSONReader
|
||||
from llama_index.core.schema import Document
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# Inspired by the `llama_index.core.readers.file.base` module
|
||||
def _try_loading_included_file_formats() -> dict[str, type[BaseReader]]:
|
||||
try:
|
||||
from llama_index.readers.file.docs import ( # type: ignore
|
||||
DocxReader,
|
||||
HWPReader,
|
||||
PDFReader,
|
||||
)
|
||||
from llama_index.readers.file.epub import EpubReader # type: ignore
|
||||
from llama_index.readers.file.image import ImageReader # type: ignore
|
||||
from llama_index.readers.file.ipynb import IPYNBReader # type: ignore
|
||||
from llama_index.readers.file.markdown import MarkdownReader # type: ignore
|
||||
from llama_index.readers.file.mbox import MboxReader # type: ignore
|
||||
from llama_index.readers.file.slides import PptxReader # type: ignore
|
||||
from llama_index.readers.file.tabular import PandasCSVReader # type: ignore
|
||||
from llama_index.readers.file.video_audio import ( # type: ignore
|
||||
VideoAudioReader,
|
||||
)
|
||||
except ImportError as e:
|
||||
raise ImportError("`llama-index-readers-file` package not found") from e
|
||||
|
||||
default_file_reader_cls: dict[str, type[BaseReader]] = {
|
||||
".hwp": HWPReader,
|
||||
".pdf": PDFReader,
|
||||
".docx": DocxReader,
|
||||
".pptx": PptxReader,
|
||||
".ppt": PptxReader,
|
||||
".pptm": PptxReader,
|
||||
".jpg": ImageReader,
|
||||
".png": ImageReader,
|
||||
".jpeg": ImageReader,
|
||||
".mp3": VideoAudioReader,
|
||||
".mp4": VideoAudioReader,
|
||||
".csv": PandasCSVReader,
|
||||
".epub": EpubReader,
|
||||
".md": MarkdownReader,
|
||||
".mbox": MboxReader,
|
||||
".ipynb": IPYNBReader,
|
||||
}
|
||||
return default_file_reader_cls
|
||||
|
||||
|
||||
# Patching the default file reader to support other file types
|
||||
FILE_READER_CLS = DEFAULT_FILE_READER_CLS.copy()
|
||||
FILE_READER_CLS = _try_loading_included_file_formats()
|
||||
FILE_READER_CLS.update(
|
||||
{
|
||||
".json": JSONReader,
|
||||
|
@ -7,26 +7,20 @@ import logging
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
import boto3 # type: ignore
|
||||
from llama_index.bridge.pydantic import Field
|
||||
from llama_index.llms import (
|
||||
from llama_index.core.base.llms.generic_utils import (
|
||||
completion_response_to_chat_response,
|
||||
stream_completion_response_to_chat_response,
|
||||
)
|
||||
from llama_index.core.bridge.pydantic import Field
|
||||
from llama_index.core.llms import (
|
||||
CompletionResponse,
|
||||
CustomLLM,
|
||||
LLMMetadata,
|
||||
)
|
||||
from llama_index.llms.base import (
|
||||
from llama_index.core.llms.callbacks import (
|
||||
llm_chat_callback,
|
||||
llm_completion_callback,
|
||||
)
|
||||
from llama_index.llms.generic_utils import (
|
||||
completion_response_to_chat_response,
|
||||
stream_completion_response_to_chat_response,
|
||||
)
|
||||
from llama_index.llms.llama_utils import (
|
||||
completion_to_prompt as generic_completion_to_prompt,
|
||||
)
|
||||
from llama_index.llms.llama_utils import (
|
||||
messages_to_prompt as generic_messages_to_prompt,
|
||||
)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
@ -161,8 +155,8 @@ class SagemakerLLM(CustomLLM):
|
||||
model_kwargs = model_kwargs or {}
|
||||
model_kwargs.update({"n_ctx": context_window, "verbose": verbose})
|
||||
|
||||
messages_to_prompt = messages_to_prompt or generic_messages_to_prompt
|
||||
completion_to_prompt = completion_to_prompt or generic_completion_to_prompt
|
||||
messages_to_prompt = messages_to_prompt or {}
|
||||
completion_to_prompt = completion_to_prompt or {}
|
||||
|
||||
generate_kwargs = generate_kwargs or {}
|
||||
generate_kwargs.update(
|
||||
|
@ -1,9 +1,9 @@
|
||||
import logging
|
||||
|
||||
from injector import inject, singleton
|
||||
from llama_index import set_global_tokenizer
|
||||
from llama_index.llms import MockLLM
|
||||
from llama_index.llms.base import LLM
|
||||
from llama_index.core.llms import LLM, MockLLM
|
||||
from llama_index.core.settings import Settings as LlamaIndexSettings
|
||||
from llama_index.core.utils import set_global_tokenizer
|
||||
from transformers import AutoTokenizer # type: ignore
|
||||
|
||||
from private_gpt.components.llm.prompt_helper import get_prompt_style
|
||||
@ -30,17 +30,23 @@ class LLMComponent:
|
||||
|
||||
logger.info("Initializing the LLM in mode=%s", llm_mode)
|
||||
match settings.llm.mode:
|
||||
case "local":
|
||||
from llama_index.llms import LlamaCPP
|
||||
case "llamacpp":
|
||||
try:
|
||||
from llama_index.llms.llama_cpp import LlamaCPP # type: ignore
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"Local dependencies not found, install with `poetry install --extras llms-llama-cpp`"
|
||||
) from e
|
||||
|
||||
prompt_style = get_prompt_style(settings.local.prompt_style)
|
||||
prompt_style = get_prompt_style(settings.llamacpp.prompt_style)
|
||||
|
||||
self.llm = LlamaCPP(
|
||||
model_path=str(models_path / settings.local.llm_hf_model_file),
|
||||
model_path=str(models_path / settings.llamacpp.llm_hf_model_file),
|
||||
temperature=0.1,
|
||||
max_new_tokens=settings.llm.max_new_tokens,
|
||||
context_window=settings.llm.context_window,
|
||||
generate_kwargs={},
|
||||
callback_manager=LlamaIndexSettings.callback_manager,
|
||||
# All to GPU
|
||||
model_kwargs={"n_gpu_layers": -1, "offload_kqv": True},
|
||||
# transform inputs into Llama2 format
|
||||
@ -50,7 +56,12 @@ class LLMComponent:
|
||||
)
|
||||
|
||||
case "sagemaker":
|
||||
from private_gpt.components.llm.custom.sagemaker import SagemakerLLM
|
||||
try:
|
||||
from private_gpt.components.llm.custom.sagemaker import SagemakerLLM
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"Sagemaker dependencies not found, install with `poetry install --extras llms-sagemaker`"
|
||||
) from e
|
||||
|
||||
self.llm = SagemakerLLM(
|
||||
endpoint_name=settings.sagemaker.llm_endpoint_name,
|
||||
@ -58,7 +69,12 @@ class LLMComponent:
|
||||
context_window=settings.llm.context_window,
|
||||
)
|
||||
case "openai":
|
||||
from llama_index.llms import OpenAI
|
||||
try:
|
||||
from llama_index.llms.openai import OpenAI # type: ignore
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"OpenAI dependencies not found, install with `poetry install --extras llms-openai`"
|
||||
) from e
|
||||
|
||||
openai_settings = settings.openai
|
||||
self.llm = OpenAI(
|
||||
@ -67,7 +83,12 @@ class LLMComponent:
|
||||
model=openai_settings.model,
|
||||
)
|
||||
case "openailike":
|
||||
from llama_index.llms import OpenAILike
|
||||
try:
|
||||
from llama_index.llms.openai_like import OpenAILike # type: ignore
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"OpenAILike dependencies not found, install with `poetry install --extras llms-openai-like`"
|
||||
) from e
|
||||
|
||||
openai_settings = settings.openai
|
||||
self.llm = OpenAILike(
|
||||
@ -78,12 +99,17 @@ class LLMComponent:
|
||||
max_tokens=None,
|
||||
api_version="",
|
||||
)
|
||||
case "mock":
|
||||
self.llm = MockLLM()
|
||||
case "ollama":
|
||||
from llama_index.llms import Ollama
|
||||
try:
|
||||
from llama_index.llms.ollama import Ollama # type: ignore
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"Ollama dependencies not found, install with `poetry install --extras llms-ollama`"
|
||||
) from e
|
||||
|
||||
ollama_settings = settings.ollama
|
||||
self.llm = Ollama(
|
||||
model=ollama_settings.model, base_url=ollama_settings.api_base
|
||||
model=ollama_settings.llm_model, base_url=ollama_settings.api_base
|
||||
)
|
||||
case "mock":
|
||||
self.llm = MockLLM()
|
||||
|
@ -3,11 +3,7 @@ import logging
|
||||
from collections.abc import Sequence
|
||||
from typing import Any, Literal
|
||||
|
||||
from llama_index.llms import ChatMessage, MessageRole
|
||||
from llama_index.llms.llama_utils import (
|
||||
completion_to_prompt,
|
||||
messages_to_prompt,
|
||||
)
|
||||
from llama_index.core.llms import ChatMessage, MessageRole
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@ -73,7 +69,9 @@ class DefaultPromptStyle(AbstractPromptStyle):
|
||||
|
||||
|
||||
class Llama2PromptStyle(AbstractPromptStyle):
|
||||
"""Simple prompt style that just uses the default llama_utils functions.
|
||||
"""Simple prompt style that uses llama 2 prompt style.
|
||||
|
||||
Inspired by llama_index/legacy/llms/llama_utils.py
|
||||
|
||||
It transforms the sequence of messages into a prompt that should look like:
|
||||
```text
|
||||
@ -83,11 +81,61 @@ class Llama2PromptStyle(AbstractPromptStyle):
|
||||
```
|
||||
"""
|
||||
|
||||
BOS, EOS = "<s>", "</s>"
|
||||
B_INST, E_INST = "[INST]", "[/INST]"
|
||||
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
|
||||
DEFAULT_SYSTEM_PROMPT = """\
|
||||
You are a helpful, respectful and honest assistant. \
|
||||
Always answer as helpfully as possible and follow ALL given instructions. \
|
||||
Do not speculate or make up information. \
|
||||
Do not reference any given instructions or context. \
|
||||
"""
|
||||
|
||||
def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
|
||||
return messages_to_prompt(messages)
|
||||
string_messages: list[str] = []
|
||||
if messages[0].role == MessageRole.SYSTEM:
|
||||
# pull out the system message (if it exists in messages)
|
||||
system_message_str = messages[0].content or ""
|
||||
messages = messages[1:]
|
||||
else:
|
||||
system_message_str = self.DEFAULT_SYSTEM_PROMPT
|
||||
|
||||
system_message_str = f"{self.B_SYS} {system_message_str.strip()} {self.E_SYS}"
|
||||
|
||||
for i in range(0, len(messages), 2):
|
||||
# first message should always be a user
|
||||
user_message = messages[i]
|
||||
assert user_message.role == MessageRole.USER
|
||||
|
||||
if i == 0:
|
||||
# make sure system prompt is included at the start
|
||||
str_message = f"{self.BOS} {self.B_INST} {system_message_str} "
|
||||
else:
|
||||
# end previous user-assistant interaction
|
||||
string_messages[-1] += f" {self.EOS}"
|
||||
# no need to include system prompt
|
||||
str_message = f"{self.BOS} {self.B_INST} "
|
||||
|
||||
# include user message content
|
||||
str_message += f"{user_message.content} {self.E_INST}"
|
||||
|
||||
if len(messages) > (i + 1):
|
||||
# if assistant message exists, add to str_message
|
||||
assistant_message = messages[i + 1]
|
||||
assert assistant_message.role == MessageRole.ASSISTANT
|
||||
str_message += f" {assistant_message.content}"
|
||||
|
||||
string_messages.append(str_message)
|
||||
|
||||
return "".join(string_messages)
|
||||
|
||||
def _completion_to_prompt(self, completion: str) -> str:
|
||||
return completion_to_prompt(completion)
|
||||
system_prompt_str = self.DEFAULT_SYSTEM_PROMPT
|
||||
|
||||
return (
|
||||
f"{self.BOS} {self.B_INST} {self.B_SYS} {system_prompt_str.strip()} {self.E_SYS} "
|
||||
f"{completion.strip()} {self.E_INST}"
|
||||
)
|
||||
|
||||
|
||||
class TagPromptStyle(AbstractPromptStyle):
|
||||
|
@ -1,9 +1,9 @@
|
||||
import logging
|
||||
|
||||
from injector import inject, singleton
|
||||
from llama_index.storage.docstore import BaseDocumentStore, SimpleDocumentStore
|
||||
from llama_index.storage.index_store import SimpleIndexStore
|
||||
from llama_index.storage.index_store.types import BaseIndexStore
|
||||
from llama_index.core.storage.docstore import BaseDocumentStore, SimpleDocumentStore
|
||||
from llama_index.core.storage.index_store import SimpleIndexStore
|
||||
from llama_index.core.storage.index_store.types import BaseIndexStore
|
||||
|
||||
from private_gpt.paths import local_data_path
|
||||
|
||||
|
@ -1,12 +1,28 @@
|
||||
from collections.abc import Generator
|
||||
from typing import Any
|
||||
|
||||
from llama_index.schema import BaseNode, MetadataMode
|
||||
from llama_index.vector_stores import ChromaVectorStore
|
||||
from llama_index.vector_stores.chroma import chunk_list
|
||||
from llama_index.vector_stores.utils import node_to_metadata_dict
|
||||
from llama_index.core.schema import BaseNode, MetadataMode
|
||||
from llama_index.core.vector_stores.utils import node_to_metadata_dict
|
||||
from llama_index.vector_stores.chroma import ChromaVectorStore # type: ignore
|
||||
|
||||
|
||||
class BatchedChromaVectorStore(ChromaVectorStore):
|
||||
def chunk_list(
|
||||
lst: list[BaseNode], max_chunk_size: int
|
||||
) -> Generator[list[BaseNode], None, None]:
|
||||
"""Yield successive max_chunk_size-sized chunks from lst.
|
||||
|
||||
Args:
|
||||
lst (List[BaseNode]): list of nodes with embeddings
|
||||
max_chunk_size (int): max chunk size
|
||||
|
||||
Yields:
|
||||
Generator[List[BaseNode], None, None]: list of nodes with embeddings
|
||||
"""
|
||||
for i in range(0, len(lst), max_chunk_size):
|
||||
yield lst[i : i + max_chunk_size]
|
||||
|
||||
|
||||
class BatchedChromaVectorStore(ChromaVectorStore): # type: ignore
|
||||
"""Chroma vector store, batching additions to avoid reaching the max batch limit.
|
||||
|
||||
In this vector store, embeddings are stored within a ChromaDB collection.
|
||||
|
@ -2,11 +2,14 @@ import logging
|
||||
import typing
|
||||
|
||||
from injector import inject, singleton
|
||||
from llama_index import VectorStoreIndex
|
||||
from llama_index.indices.vector_store import VectorIndexRetriever
|
||||
from llama_index.vector_stores.types import VectorStore
|
||||
from llama_index.core.indices.vector_store import VectorIndexRetriever, VectorStoreIndex
|
||||
from llama_index.core.vector_stores.types import (
|
||||
FilterCondition,
|
||||
MetadataFilter,
|
||||
MetadataFilters,
|
||||
VectorStore,
|
||||
)
|
||||
|
||||
from private_gpt.components.vector_store.batched_chroma import BatchedChromaVectorStore
|
||||
from private_gpt.open_ai.extensions.context_filter import ContextFilter
|
||||
from private_gpt.paths import local_data_path
|
||||
from private_gpt.settings.settings import Settings
|
||||
@ -14,34 +17,36 @@ from private_gpt.settings.settings import Settings
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@typing.no_type_check
|
||||
def _chromadb_doc_id_metadata_filter(
|
||||
def _doc_id_metadata_filter(
|
||||
context_filter: ContextFilter | None,
|
||||
) -> dict | None:
|
||||
if context_filter is None or context_filter.docs_ids is None:
|
||||
return {} # No filter
|
||||
elif len(context_filter.docs_ids) < 1:
|
||||
return {"doc_id": "-"} # Effectively filtering out all docs
|
||||
else:
|
||||
doc_filter_items = []
|
||||
if len(context_filter.docs_ids) > 1:
|
||||
doc_filter = {"$or": doc_filter_items}
|
||||
for doc_id in context_filter.docs_ids:
|
||||
doc_filter_items.append({"doc_id": doc_id})
|
||||
else:
|
||||
doc_filter = {"doc_id": context_filter.docs_ids[0]}
|
||||
return doc_filter
|
||||
) -> MetadataFilters:
|
||||
filters = MetadataFilters(filters=[], condition=FilterCondition.OR)
|
||||
|
||||
if context_filter is not None and context_filter.docs_ids is not None:
|
||||
for doc_id in context_filter.docs_ids:
|
||||
filters.filters.append(MetadataFilter(key="doc_id", value=doc_id))
|
||||
|
||||
return filters
|
||||
|
||||
|
||||
@singleton
|
||||
class VectorStoreComponent:
|
||||
settings: Settings
|
||||
vector_store: VectorStore
|
||||
|
||||
@inject
|
||||
def __init__(self, settings: Settings) -> None:
|
||||
self.settings = settings
|
||||
match settings.vectorstore.database:
|
||||
case "pgvector":
|
||||
from llama_index.vector_stores import PGVectorStore
|
||||
try:
|
||||
from llama_index.vector_stores.postgres import ( # type: ignore
|
||||
PGVectorStore,
|
||||
)
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"Postgres dependencies not found, install with `poetry install --extras vector-stores-postgres`"
|
||||
) from e
|
||||
|
||||
if settings.pgvector is None:
|
||||
raise ValueError(
|
||||
@ -61,11 +66,13 @@ class VectorStoreComponent:
|
||||
from chromadb.config import ( # type: ignore
|
||||
Settings as ChromaSettings,
|
||||
)
|
||||
|
||||
from private_gpt.components.vector_store.batched_chroma import (
|
||||
BatchedChromaVectorStore,
|
||||
)
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"'chromadb' is not installed."
|
||||
"To use PrivateGPT with Chroma, install the 'chroma' extra."
|
||||
"`poetry install --extras chroma`"
|
||||
"ChromaDB dependencies not found, install with `poetry install --extras vector-stores-chroma`"
|
||||
) from e
|
||||
|
||||
chroma_settings = ChromaSettings(anonymized_telemetry=False)
|
||||
@ -85,8 +92,15 @@ class VectorStoreComponent:
|
||||
)
|
||||
|
||||
case "qdrant":
|
||||
from llama_index.vector_stores.qdrant import QdrantVectorStore
|
||||
from qdrant_client import QdrantClient
|
||||
try:
|
||||
from llama_index.vector_stores.qdrant import ( # type: ignore
|
||||
QdrantVectorStore,
|
||||
)
|
||||
from qdrant_client import QdrantClient # type: ignore
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"Qdrant dependencies not found, install with `poetry install --extras vector-stores-qdrant`"
|
||||
) from e
|
||||
|
||||
if settings.qdrant is None:
|
||||
logger.info(
|
||||
@ -112,20 +126,20 @@ class VectorStoreComponent:
|
||||
f"Vectorstore database {settings.vectorstore.database} not supported"
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def get_retriever(
|
||||
self,
|
||||
index: VectorStoreIndex,
|
||||
context_filter: ContextFilter | None = None,
|
||||
similarity_top_k: int = 2,
|
||||
) -> VectorIndexRetriever:
|
||||
# This way we support qdrant (using doc_ids) and chroma (using where clause)
|
||||
# This way we support qdrant (using doc_ids) and the rest (using filters)
|
||||
return VectorIndexRetriever(
|
||||
index=index,
|
||||
similarity_top_k=similarity_top_k,
|
||||
doc_ids=context_filter.docs_ids if context_filter else None,
|
||||
vector_store_kwargs={
|
||||
"where": _chromadb_doc_id_metadata_filter(context_filter)
|
||||
},
|
||||
filters=_doc_id_metadata_filter(context_filter)
|
||||
if self.settings.vectorstore.database != "qdrant"
|
||||
else None,
|
||||
)
|
||||
|
||||
def close(self) -> None:
|
||||
|
@ -4,6 +4,9 @@ import logging
|
||||
from fastapi import Depends, FastAPI, Request
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from injector import Injector
|
||||
from llama_index.core.callbacks import CallbackManager
|
||||
from llama_index.core.callbacks.global_handlers import create_global_handler
|
||||
from llama_index.core.settings import Settings as LlamaIndexSettings
|
||||
|
||||
from private_gpt.server.chat.chat_router import chat_router
|
||||
from private_gpt.server.chunks.chunks_router import chunks_router
|
||||
@ -31,6 +34,10 @@ def create_app(root_injector: Injector) -> FastAPI:
|
||||
app.include_router(embeddings_router)
|
||||
app.include_router(health_router)
|
||||
|
||||
# Add LlamaIndex simple observability
|
||||
global_handler = create_global_handler("simple")
|
||||
LlamaIndexSettings.callback_manager = CallbackManager([global_handler])
|
||||
|
||||
settings = root_injector.get(Settings)
|
||||
if settings.server.cors.enabled:
|
||||
logger.debug("Setting up CORS middleware")
|
||||
@ -45,7 +52,12 @@ def create_app(root_injector: Injector) -> FastAPI:
|
||||
|
||||
if settings.ui.enabled:
|
||||
logger.debug("Importing the UI module")
|
||||
from private_gpt.ui.ui import PrivateGptUi
|
||||
try:
|
||||
from private_gpt.ui.ui import PrivateGptUi
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"UI dependencies not found, install with `poetry install --extras ui`"
|
||||
) from e
|
||||
|
||||
ui = root_injector.get(PrivateGptUi)
|
||||
ui.mount_in_app(app, settings.ui.path)
|
||||
|
@ -1,11 +1,6 @@
|
||||
"""FastAPI app creation, logger configuration and main API routes."""
|
||||
|
||||
import llama_index
|
||||
|
||||
from private_gpt.di import global_injector
|
||||
from private_gpt.launcher import create_app
|
||||
|
||||
# Add LlamaIndex simple observability
|
||||
llama_index.set_global_handler("simple")
|
||||
|
||||
app = create_app(global_injector)
|
||||
|
@ -3,7 +3,7 @@ import uuid
|
||||
from collections.abc import Iterator
|
||||
from typing import Literal
|
||||
|
||||
from llama_index.llms import ChatResponse, CompletionResponse
|
||||
from llama_index.core.llms import ChatResponse, CompletionResponse
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from private_gpt.server.chunks.chunks_service import Chunk
|
||||
|
@ -1,5 +1,5 @@
|
||||
from fastapi import APIRouter, Depends, Request
|
||||
from llama_index.llms import ChatMessage, MessageRole
|
||||
from llama_index.core.llms import ChatMessage, MessageRole
|
||||
from pydantic import BaseModel
|
||||
from starlette.responses import StreamingResponse
|
||||
|
||||
|
@ -1,14 +1,15 @@
|
||||
from dataclasses import dataclass
|
||||
|
||||
from injector import inject, singleton
|
||||
from llama_index import ServiceContext, StorageContext, VectorStoreIndex
|
||||
from llama_index.chat_engine import ContextChatEngine, SimpleChatEngine
|
||||
from llama_index.chat_engine.types import (
|
||||
from llama_index.core.chat_engine import ContextChatEngine, SimpleChatEngine
|
||||
from llama_index.core.chat_engine.types import (
|
||||
BaseChatEngine,
|
||||
)
|
||||
from llama_index.indices.postprocessor import MetadataReplacementPostProcessor
|
||||
from llama_index.llms import ChatMessage, MessageRole
|
||||
from llama_index.types import TokenGen
|
||||
from llama_index.core.indices import VectorStoreIndex
|
||||
from llama_index.core.indices.postprocessor import MetadataReplacementPostProcessor
|
||||
from llama_index.core.llms import ChatMessage, MessageRole
|
||||
from llama_index.core.storage import StorageContext
|
||||
from llama_index.core.types import TokenGen
|
||||
from pydantic import BaseModel
|
||||
|
||||
from private_gpt.components.embedding.embedding_component import EmbeddingComponent
|
||||
@ -75,20 +76,19 @@ class ChatService:
|
||||
embedding_component: EmbeddingComponent,
|
||||
node_store_component: NodeStoreComponent,
|
||||
) -> None:
|
||||
self.llm_service = llm_component
|
||||
self.llm_component = llm_component
|
||||
self.embedding_component = embedding_component
|
||||
self.vector_store_component = vector_store_component
|
||||
self.storage_context = StorageContext.from_defaults(
|
||||
vector_store=vector_store_component.vector_store,
|
||||
docstore=node_store_component.doc_store,
|
||||
index_store=node_store_component.index_store,
|
||||
)
|
||||
self.service_context = ServiceContext.from_defaults(
|
||||
llm=llm_component.llm, embed_model=embedding_component.embedding_model
|
||||
)
|
||||
self.index = VectorStoreIndex.from_vector_store(
|
||||
vector_store_component.vector_store,
|
||||
storage_context=self.storage_context,
|
||||
service_context=self.service_context,
|
||||
llm=llm_component.llm,
|
||||
embed_model=embedding_component.embedding_model,
|
||||
show_progress=True,
|
||||
)
|
||||
|
||||
@ -105,7 +105,7 @@ class ChatService:
|
||||
return ContextChatEngine.from_defaults(
|
||||
system_prompt=system_prompt,
|
||||
retriever=vector_index_retriever,
|
||||
service_context=self.service_context,
|
||||
llm=self.llm_component.llm, # Takes no effect at the moment
|
||||
node_postprocessors=[
|
||||
MetadataReplacementPostProcessor(target_metadata_key="window"),
|
||||
],
|
||||
@ -113,7 +113,7 @@ class ChatService:
|
||||
else:
|
||||
return SimpleChatEngine.from_defaults(
|
||||
system_prompt=system_prompt,
|
||||
service_context=self.service_context,
|
||||
llm=self.llm_component.llm,
|
||||
)
|
||||
|
||||
def stream_chat(
|
||||
|
@ -1,8 +1,9 @@
|
||||
from typing import TYPE_CHECKING, Literal
|
||||
|
||||
from injector import inject, singleton
|
||||
from llama_index import ServiceContext, StorageContext, VectorStoreIndex
|
||||
from llama_index.schema import NodeWithScore
|
||||
from llama_index.core.indices import VectorStoreIndex
|
||||
from llama_index.core.schema import NodeWithScore
|
||||
from llama_index.core.storage import StorageContext
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from private_gpt.components.embedding.embedding_component import EmbeddingComponent
|
||||
@ -15,7 +16,7 @@ from private_gpt.open_ai.extensions.context_filter import ContextFilter
|
||||
from private_gpt.server.ingest.model import IngestedDoc
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from llama_index.schema import RelatedNodeInfo
|
||||
from llama_index.core.schema import RelatedNodeInfo
|
||||
|
||||
|
||||
class Chunk(BaseModel):
|
||||
@ -63,14 +64,13 @@ class ChunksService:
|
||||
node_store_component: NodeStoreComponent,
|
||||
) -> None:
|
||||
self.vector_store_component = vector_store_component
|
||||
self.llm_component = llm_component
|
||||
self.embedding_component = embedding_component
|
||||
self.storage_context = StorageContext.from_defaults(
|
||||
vector_store=vector_store_component.vector_store,
|
||||
docstore=node_store_component.doc_store,
|
||||
index_store=node_store_component.index_store,
|
||||
)
|
||||
self.query_service_context = ServiceContext.from_defaults(
|
||||
llm=llm_component.llm, embed_model=embedding_component.embedding_model
|
||||
)
|
||||
|
||||
def _get_sibling_nodes_text(
|
||||
self, node_with_score: NodeWithScore, related_number: int, forward: bool = True
|
||||
@ -103,7 +103,8 @@ class ChunksService:
|
||||
index = VectorStoreIndex.from_vector_store(
|
||||
self.vector_store_component.vector_store,
|
||||
storage_context=self.storage_context,
|
||||
service_context=self.query_service_context,
|
||||
llm=self.llm_component.llm,
|
||||
embed_model=self.embedding_component.embedding_model,
|
||||
show_progress=True,
|
||||
)
|
||||
vector_index_retriever = self.vector_store_component.get_retriever(
|
||||
|
@ -4,11 +4,8 @@ from pathlib import Path
|
||||
from typing import AnyStr, BinaryIO
|
||||
|
||||
from injector import inject, singleton
|
||||
from llama_index import (
|
||||
ServiceContext,
|
||||
StorageContext,
|
||||
)
|
||||
from llama_index.node_parser import SentenceWindowNodeParser
|
||||
from llama_index.core.node_parser import SentenceWindowNodeParser
|
||||
from llama_index.core.storage import StorageContext
|
||||
|
||||
from private_gpt.components.embedding.embedding_component import EmbeddingComponent
|
||||
from private_gpt.components.ingest.ingest_component import get_ingestion_component
|
||||
@ -40,17 +37,12 @@ class IngestService:
|
||||
index_store=node_store_component.index_store,
|
||||
)
|
||||
node_parser = SentenceWindowNodeParser.from_defaults()
|
||||
self.ingest_service_context = ServiceContext.from_defaults(
|
||||
llm=self.llm_service.llm,
|
||||
embed_model=embedding_component.embedding_model,
|
||||
node_parser=node_parser,
|
||||
# Embeddings done early in the pipeline of node transformations, right
|
||||
# after the node parsing
|
||||
transformations=[node_parser, embedding_component.embedding_model],
|
||||
)
|
||||
|
||||
self.ingest_component = get_ingestion_component(
|
||||
self.storage_context, self.ingest_service_context, settings=settings()
|
||||
self.storage_context,
|
||||
embed_model=embedding_component.embedding_model,
|
||||
transformations=[node_parser, embedding_component.embedding_model],
|
||||
settings=settings(),
|
||||
)
|
||||
|
||||
def _ingest_data(self, file_name: str, file_data: AnyStr) -> list[IngestedDoc]:
|
||||
|
@ -3,10 +3,9 @@ from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from watchdog.events import (
|
||||
DirCreatedEvent,
|
||||
DirModifiedEvent,
|
||||
FileCreatedEvent,
|
||||
FileModifiedEvent,
|
||||
FileSystemEvent,
|
||||
FileSystemEventHandler,
|
||||
)
|
||||
from watchdog.observers import Observer
|
||||
@ -20,11 +19,11 @@ class IngestWatcher:
|
||||
self.on_file_changed = on_file_changed
|
||||
|
||||
class Handler(FileSystemEventHandler):
|
||||
def on_modified(self, event: DirModifiedEvent | FileModifiedEvent) -> None:
|
||||
def on_modified(self, event: FileSystemEvent) -> None:
|
||||
if isinstance(event, FileModifiedEvent):
|
||||
on_file_changed(Path(event.src_path))
|
||||
|
||||
def on_created(self, event: DirCreatedEvent | FileCreatedEvent) -> None:
|
||||
def on_created(self, event: FileSystemEvent) -> None:
|
||||
if isinstance(event, FileCreatedEvent):
|
||||
on_file_changed(Path(event.src_path))
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
from typing import Any, Literal
|
||||
|
||||
from llama_index import Document
|
||||
from llama_index.core.schema import Document
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
|
@ -81,7 +81,7 @@ class DataSettings(BaseModel):
|
||||
|
||||
|
||||
class LLMSettings(BaseModel):
|
||||
mode: Literal["local", "openai", "openailike", "sagemaker", "mock", "ollama"]
|
||||
mode: Literal["llamacpp", "openai", "openailike", "sagemaker", "mock", "ollama"]
|
||||
max_new_tokens: int = Field(
|
||||
256,
|
||||
description="The maximum number of token that the LLM is authorized to generate in one completion.",
|
||||
@ -104,12 +104,9 @@ class VectorstoreSettings(BaseModel):
|
||||
database: Literal["chroma", "qdrant", "pgvector"]
|
||||
|
||||
|
||||
class LocalSettings(BaseModel):
|
||||
class LlamaCPPSettings(BaseModel):
|
||||
llm_hf_repo_id: str
|
||||
llm_hf_model_file: str
|
||||
embedding_hf_model_name: str = Field(
|
||||
description="Name of the HuggingFace model to use for embeddings"
|
||||
)
|
||||
prompt_style: Literal["default", "llama2", "tag", "mistral", "chatml"] = Field(
|
||||
"llama2",
|
||||
description=(
|
||||
@ -123,8 +120,14 @@ class LocalSettings(BaseModel):
|
||||
)
|
||||
|
||||
|
||||
class HuggingFaceSettings(BaseModel):
|
||||
embedding_hf_model_name: str = Field(
|
||||
description="Name of the HuggingFace model to use for embeddings"
|
||||
)
|
||||
|
||||
|
||||
class EmbeddingSettings(BaseModel):
|
||||
mode: Literal["local", "openai", "sagemaker", "mock"]
|
||||
mode: Literal["huggingface", "openai", "sagemaker", "ollama", "mock"]
|
||||
ingest_mode: Literal["simple", "batch", "parallel"] = Field(
|
||||
"simple",
|
||||
description=(
|
||||
@ -173,10 +176,14 @@ class OllamaSettings(BaseModel):
|
||||
"http://localhost:11434",
|
||||
description="Base URL of Ollama API. Example: 'https://localhost:11434'.",
|
||||
)
|
||||
model: str = Field(
|
||||
llm_model: str = Field(
|
||||
None,
|
||||
description="Model to use. Example: 'llama2-uncensored'.",
|
||||
)
|
||||
embedding_model: str = Field(
|
||||
None,
|
||||
description="Model to use. Example: 'nomic-embed-text'.",
|
||||
)
|
||||
|
||||
|
||||
class UISettings(BaseModel):
|
||||
@ -292,7 +299,8 @@ class Settings(BaseModel):
|
||||
ui: UISettings
|
||||
llm: LLMSettings
|
||||
embedding: EmbeddingSettings
|
||||
local: LocalSettings
|
||||
llamacpp: LlamaCPPSettings
|
||||
huggingface: HuggingFaceSettings
|
||||
sagemaker: SagemakerSettings
|
||||
openai: OpenAISettings
|
||||
ollama: OllamaSettings
|
||||
|
@ -10,7 +10,7 @@ import gradio as gr # type: ignore
|
||||
from fastapi import FastAPI
|
||||
from gradio.themes.utils.colors import slate # type: ignore
|
||||
from injector import inject, singleton
|
||||
from llama_index.llms import ChatMessage, ChatResponse, MessageRole
|
||||
from llama_index.core.llms import ChatMessage, ChatResponse, MessageRole
|
||||
from pydantic import BaseModel
|
||||
|
||||
from private_gpt.constants import PROJECT_ROOT_PATH
|
||||
|
@ -1,25 +1,52 @@
|
||||
[tool.poetry]
|
||||
name = "private-gpt"
|
||||
version = "0.2.0"
|
||||
version = "0.4.0"
|
||||
description = "Private GPT"
|
||||
authors = ["Zylon <hi@zylon.ai>"]
|
||||
|
||||
[tool.poetry.dependencies]
|
||||
python = ">=3.11,<3.12"
|
||||
fastapi = { extras = ["all"], version = "^0.103.1" }
|
||||
boto3 = "^1.28.56"
|
||||
# PrivateGPT
|
||||
fastapi = { extras = ["all"], version = "^0.110.0" }
|
||||
python-multipart = "^0.0.9"
|
||||
injector = "^0.21.0"
|
||||
pyyaml = "^6.0.1"
|
||||
python-multipart = "^0.0.6"
|
||||
pypdf = "^3.16.2"
|
||||
llama-index = { extras = ["local_models"], version = "0.9.3" }
|
||||
watchdog = "^3.0.0"
|
||||
qdrant-client = "^1.6.9"
|
||||
chromadb = {version = "^0.4.13", optional = true}
|
||||
asyncpg = {version = "^0.29.0", optional = true}
|
||||
pgvector = {version = "^0.2.5", optional = true}
|
||||
psycopg2-binary = {version = "^2.9.9", optional = true}
|
||||
sqlalchemy = {version = "^2.0.27", optional = true}
|
||||
watchdog = "^4.0.0"
|
||||
transformers = "^4.38.2"
|
||||
# LlamaIndex core libs
|
||||
llama-index-core = "^0.10.14"
|
||||
llama-index-readers-file = "^0.1.6"
|
||||
# Optional LlamaIndex integration libs
|
||||
llama-index-llms-llama-cpp = {version = "^0.1.3", optional = true}
|
||||
llama-index-llms-openai = {version = "^0.1.6", optional = true}
|
||||
llama-index-llms-openai-like = {version ="^0.1.3", optional = true}
|
||||
llama-index-llms-ollama = {version ="^0.1.2", optional = true}
|
||||
llama-index-embeddings-ollama = {version ="^0.1.2", optional = true}
|
||||
llama-index-embeddings-huggingface = {version ="^0.1.4", optional = true}
|
||||
llama-index-embeddings-openai = {version ="^0.1.6", optional = true}
|
||||
llama-index-vector-stores-qdrant = {version ="^0.1.3", optional = true}
|
||||
llama-index-vector-stores-chroma = {version ="^0.1.4", optional = true}
|
||||
llama-index-vector-stores-postgres = {version ="^0.1.2", optional = true}
|
||||
# Optional Sagemaker dependency
|
||||
boto3 = {version ="^1.34.51", optional = true}
|
||||
# Optional UI
|
||||
gradio = {version ="^4.19.2", optional = true}
|
||||
|
||||
[tool.poetry.extras]
|
||||
ui = ["gradio"]
|
||||
llms-llama-cpp = ["llama-index-llms-llama-cpp"]
|
||||
llms-openai = ["llama-index-llms-openai"]
|
||||
llms-openai-like = ["llama-index-llms-openai-like"]
|
||||
llms-ollama = ["llama-index-llms-ollama"]
|
||||
llms-sagemaker = ["boto3"]
|
||||
embeddings-ollama = ["llama-index-embeddings-ollama"]
|
||||
embeddings-huggingface = ["llama-index-embeddings-huggingface"]
|
||||
embeddings-openai = ["llama-index-embeddings-openai"]
|
||||
embeddings-sagemaker = ["boto3"]
|
||||
vector-stores-qdrant = ["llama-index-vector-stores-qdrant"]
|
||||
vector-stores-chroma = ["llama-index-vector-stores-chroma"]
|
||||
vector-stores-postgres = ["llama-index-vector-stores-postgres"]
|
||||
|
||||
|
||||
[tool.poetry.group.dev.dependencies]
|
||||
black = "^22"
|
||||
@ -31,26 +58,6 @@ ruff = "^0"
|
||||
pytest-asyncio = "^0.21.1"
|
||||
types-pyyaml = "^6.0.12.12"
|
||||
|
||||
# Dependencies for gradio UI
|
||||
[tool.poetry.group.ui]
|
||||
optional = true
|
||||
[tool.poetry.group.ui.dependencies]
|
||||
gradio = "^4.19.0"
|
||||
|
||||
[tool.poetry.group.local]
|
||||
optional = true
|
||||
[tool.poetry.group.local.dependencies]
|
||||
llama-cpp-python = "^0.2.23"
|
||||
numpy = "1.26.0"
|
||||
sentence-transformers = "^2.2.2"
|
||||
# https://stackoverflow.com/questions/76327419/valueerror-libcublas-so-0-9-not-found-in-the-system-path
|
||||
torch = ">=2.0.0, !=2.0.1, !=2.1.0"
|
||||
transformers = "^4.34.0"
|
||||
|
||||
[tool.poetry.extras]
|
||||
chroma = ["chromadb"]
|
||||
pgvector = ["sqlalchemy", "pgvector", "psycopg2-binary", "asyncpg"]
|
||||
|
||||
[build-system]
|
||||
requires = ["poetry-core>=1.0.0"]
|
||||
build-backend = "poetry.core.masonry.api"
|
||||
@ -143,6 +150,9 @@ explicit_package_bases = true
|
||||
warn_unused_ignores = false
|
||||
exclude = ["tests"]
|
||||
|
||||
[tool.mypy-llama-index]
|
||||
ignore_missing_imports = true
|
||||
|
||||
[tool.pytest.ini_options]
|
||||
asyncio_mode = "auto"
|
||||
testpaths = ["tests"]
|
||||
|
@ -19,19 +19,19 @@ os.makedirs(models_path, exist_ok=True)
|
||||
|
||||
# Download Embedding model
|
||||
embedding_path = models_path / "embedding"
|
||||
print(f"Downloading embedding {settings().local.embedding_hf_model_name}")
|
||||
print(f"Downloading embedding {settings().huggingface.embedding_hf_model_name}")
|
||||
snapshot_download(
|
||||
repo_id=settings().local.embedding_hf_model_name,
|
||||
repo_id=settings().huggingface.embedding_hf_model_name,
|
||||
cache_dir=models_cache_path,
|
||||
local_dir=embedding_path,
|
||||
)
|
||||
print("Embedding model downloaded!")
|
||||
|
||||
# Download LLM and create a symlink to the model file
|
||||
print(f"Downloading LLM {settings().local.llm_hf_model_file}")
|
||||
print(f"Downloading LLM {settings().llamacpp.llm_hf_model_file}")
|
||||
hf_hub_download(
|
||||
repo_id=settings().local.llm_hf_repo_id,
|
||||
filename=settings().local.llm_hf_model_file,
|
||||
repo_id=settings().llamacpp.llm_hf_repo_id,
|
||||
filename=settings().llamacpp.llm_hf_model_file,
|
||||
cache_dir=models_cache_path,
|
||||
local_dir=models_path,
|
||||
resume_download=resume_download,
|
||||
|
@ -8,9 +8,11 @@ llm:
|
||||
embedding:
|
||||
mode: ${PGPT_MODE:sagemaker}
|
||||
|
||||
local:
|
||||
llamacpp:
|
||||
llm_hf_repo_id: ${PGPT_HF_REPO_ID:TheBloke/Mistral-7B-Instruct-v0.1-GGUF}
|
||||
llm_hf_model_file: ${PGPT_HF_MODEL_FILE:mistral-7b-instruct-v0.1.Q4_K_M.gguf}
|
||||
|
||||
huggingface:
|
||||
embedding_hf_model_name: ${PGPT_EMBEDDING_HF_MODEL_NAME:BAAI/bge-small-en-v1.5}
|
||||
|
||||
sagemaker:
|
||||
|
@ -2,4 +2,25 @@ server:
|
||||
env_name: ${APP_ENV:local}
|
||||
|
||||
llm:
|
||||
mode: local
|
||||
mode: llamacpp
|
||||
# Should be matching the selected model
|
||||
max_new_tokens: 512
|
||||
context_window: 3900
|
||||
tokenizer: mistralai/Mistral-7B-Instruct-v0.2
|
||||
|
||||
llamacpp:
|
||||
prompt_style: "mistral"
|
||||
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
|
||||
llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
|
||||
|
||||
embedding:
|
||||
mode: huggingface
|
||||
|
||||
huggingface:
|
||||
embedding_hf_model_name: BAAI/bge-small-en-v1.5
|
||||
|
||||
vectorstore:
|
||||
database: qdrant
|
||||
|
||||
qdrant:
|
||||
path: local_data/private_gpt/qdrant
|
@ -4,5 +4,6 @@ server:
|
||||
# This configuration allows you to use GPU for creating embeddings while avoiding loading LLM into vRAM
|
||||
llm:
|
||||
mode: mock
|
||||
|
||||
embedding:
|
||||
mode: local
|
||||
mode: huggingface
|
||||
|
22
settings-ollama.yaml
Normal file
22
settings-ollama.yaml
Normal file
@ -0,0 +1,22 @@
|
||||
server:
|
||||
env_name: ${APP_ENV:ollama}
|
||||
|
||||
llm:
|
||||
mode: ollama
|
||||
max_new_tokens: 512
|
||||
context_window: 3900
|
||||
|
||||
embedding:
|
||||
mode: ollama
|
||||
|
||||
ollama:
|
||||
llm_model: mistral
|
||||
embedding_model: nomic-embed-text
|
||||
api_base: http://localhost:11434
|
||||
|
||||
vectorstore:
|
||||
database: qdrant
|
||||
|
||||
qdrant:
|
||||
path: local_data/private_gpt/qdrant
|
||||
|
12
settings-openai.yaml
Normal file
12
settings-openai.yaml
Normal file
@ -0,0 +1,12 @@
|
||||
server:
|
||||
env_name: ${APP_ENV:openai}
|
||||
|
||||
llm:
|
||||
mode: openai
|
||||
|
||||
embedding:
|
||||
mode: openai
|
||||
|
||||
openai:
|
||||
api_key: ${OPENAI_API_KEY:}
|
||||
model: gpt-3.5-turbo
|
@ -1,5 +1,5 @@
|
||||
server:
|
||||
env_name: ${APP_ENV:prod}
|
||||
env_name: ${APP_ENV:sagemaker}
|
||||
port: ${PORT:8001}
|
||||
|
||||
ui:
|
||||
@ -9,6 +9,9 @@ ui:
|
||||
llm:
|
||||
mode: sagemaker
|
||||
|
||||
embedding:
|
||||
mode: sagemaker
|
||||
|
||||
sagemaker:
|
||||
llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140
|
||||
embedding_endpoint_name: huggingface-pytorch-inference-2023-11-03-07-41-36-479
|
||||
llm_endpoint_name: llm
|
||||
embedding_endpoint_name: embedding
|
@ -14,5 +14,8 @@ qdrant:
|
||||
llm:
|
||||
mode: mock
|
||||
|
||||
embedding:
|
||||
mode: mock
|
||||
|
||||
ui:
|
||||
enabled: false
|
@ -1,11 +1,14 @@
|
||||
server:
|
||||
env_name: ${APP_ENV:vllm}
|
||||
|
||||
llm:
|
||||
mode: openailike
|
||||
|
||||
embedding:
|
||||
mode: local
|
||||
mode: huggingface
|
||||
ingest_mode: simple
|
||||
|
||||
local:
|
||||
huggingface:
|
||||
embedding_hf_model_name: BAAI/bge-small-en-v1.5
|
||||
|
||||
openai:
|
||||
|
@ -34,19 +34,25 @@ ui:
|
||||
delete_file_button_enabled: true
|
||||
delete_all_files_button_enabled: true
|
||||
|
||||
|
||||
llm:
|
||||
mode: local
|
||||
mode: llamacpp
|
||||
# Should be matching the selected model
|
||||
max_new_tokens: 512
|
||||
context_window: 3900
|
||||
tokenizer: mistralai/Mistral-7B-Instruct-v0.2
|
||||
|
||||
llamacpp:
|
||||
prompt_style: "mistral"
|
||||
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
|
||||
llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
|
||||
|
||||
embedding:
|
||||
# Should be matching the value above in most cases
|
||||
mode: local
|
||||
mode: huggingface
|
||||
ingest_mode: simple
|
||||
|
||||
huggingface:
|
||||
embedding_hf_model_name: BAAI/bge-small-en-v1.5
|
||||
|
||||
vectorstore:
|
||||
database: qdrant
|
||||
|
||||
@ -63,12 +69,6 @@ pgvector:
|
||||
schema_name: private_gpt
|
||||
table_name: embeddings
|
||||
|
||||
local:
|
||||
prompt_style: "mistral"
|
||||
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
|
||||
llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
|
||||
embedding_hf_model_name: BAAI/bge-small-en-v1.5
|
||||
|
||||
sagemaker:
|
||||
llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140
|
||||
embedding_endpoint_name: huggingface-pytorch-inference-2023-11-03-07-41-36-479
|
||||
@ -78,4 +78,6 @@ openai:
|
||||
model: gpt-3.5-turbo
|
||||
|
||||
ollama:
|
||||
model: llama2-uncensored
|
||||
llm_model: llama2
|
||||
embedding_model: nomic-embed-text
|
||||
api_base: http://localhost:11434
|
||||
|
@ -1,5 +1,5 @@
|
||||
import pytest
|
||||
from llama_index.llms import ChatMessage, MessageRole
|
||||
from llama_index.core.llms import ChatMessage, MessageRole
|
||||
|
||||
from private_gpt.components.llm.prompt_helper import (
|
||||
ChatMLPromptStyle,
|
||||
|
Loading…
Reference in New Issue
Block a user