2 Commits

Author SHA1 Message Date
github-actions[bot]
1b03b369c0 chore(main): release 0.4.0 (#1628)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-03-06 17:53:35 +01:00
Iván Martínez
45f05711eb feat: Upgrade to LlamaIndex to 0.10 (#1663)
* Extract optional dependencies

* Separate local mode into llms-llama-cpp and embeddings-huggingface for clarity

* Support Ollama embeddings

* Upgrade to llamaindex 0.10.14. Remove legacy use of ServiceContext in ContextChatEngine

* Fix vector retriever filters
2024-03-06 17:51:30 +01:00
14 changed files with 122 additions and 276 deletions

View File

@@ -1,5 +1,13 @@
# Changelog
## [0.4.0](https://github.com/imartinez/privateGPT/compare/v0.3.0...v0.4.0) (2024-03-06)
### Features
* Upgrade to LlamaIndex to 0.10 ([#1663](https://github.com/imartinez/privateGPT/issues/1663)) ([45f0571](https://github.com/imartinez/privateGPT/commit/45f05711eb71ffccdedb26f37e680ced55795d44))
* **Vector:** support pgvector ([#1624](https://github.com/imartinez/privateGPT/issues/1624)) ([cd40e39](https://github.com/imartinez/privateGPT/commit/cd40e3982b780b548b9eea6438c759f1c22743a8))
## [0.3.0](https://github.com/imartinez/privateGPT/compare/v0.2.0...v0.3.0) (2024-02-16)

View File

@@ -40,20 +40,21 @@ In order to run PrivateGPT in a fully local setup, you will need to run the LLM,
### Vector stores
The vector stores supported (Qdrant, ChromaDB and Postgres) run locally by default.
### Embeddings
For local embeddings you need to install the 'embeddings-huggingface' extra dependencies. It will use Huggingface Embeddings.
For local Embeddings there are two options:
* (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
* You can use the 'embeddings-huggingface' option in PrivateGPT, which will use HuggingFace.
Note: Ollama will support Embeddings in the short term for easier installation, but it doesn't as of today.
In order for local embeddings to work, you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
In order for HuggingFace LLM to work (the second option), you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
```bash
poetry run python scripts/setup
```
### LLM
For local LLM there are two options:
* (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
* You can use the 'llms-llama-cpp' option in PrivateGPT, which will use LlamaCPP. It works great on Mac with Metal most of the times (leverages Metal GPU), but it can be tricky in certain Linux and Windows distributions, depending on the GPU. In the installation document you'll find guides and troubleshooting.
In order for local LLM to work (the second option), you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
In order for LlamaCPP powered LLM to work (the second option), you need to download the LLM model to the `models` folder. You can do so by running the `setup` script:
```bash
poetry run python scripts/setup
```

View File

@@ -44,12 +44,12 @@ poetry install --extras "<extra1> <extra2>..."
Where `<extra>` can be any of the following:
- ui: adds support for UI using Gradio
- llms-ollama: adds support for Ollama LLM, the easiest way to get a local LLM running
- llms-ollama: adds support for Ollama LLM, the easiest way to get a local LLM running, requires Ollama running locally
- llms-llama-cpp: adds support for local LLM using LlamaCPP - expect a messy installation process on some platforms
- llms-sagemaker: adds support for Amazon Sagemaker LLM, requires Sagemaker inference endpoints
- llms-nvidia-tensorrt: add support for Nvidia TensorRT LLM
- llms-openai: adds support for OpenAI LLM, requires OpenAI API key
- llms-openai-like: adds support for 3rd party LLM providers that are compatible with OpenAI's API
- embeddings-ollama: adds support for Ollama Embeddings, requires Ollama running locally
- embeddings-huggingface: adds support for local Embeddings using HuggingFace
- embeddings-sagemaker: adds support for Amazon Sagemaker Embeddings, requires Sagemaker inference endpoints
- embeddings-openai = adds support for OpenAI Embeddings, requires OpenAI API key
@@ -79,21 +79,29 @@ set PGPT_PROFILES=ollama
make run
```
### Local, Ollama-powered setup
### Local, Ollama-powered setup - RECOMMENDED
The easiest way to run PrivateGPT fully locally is to depend on Ollama for the LLM. Ollama provides a local LLM that is easy to install and use.
**The easiest way to run PrivateGPT fully locally** is to depend on Ollama for the LLM. Ollama provides local LLM and Embeddings super easy to install and use, abstracting the complexity of GPU support. It's the recommended setup for local development.
Go to [ollama.ai](https://ollama.ai/) and follow the instructions to install Ollama on your machine.
Once done, you can install PrivateGPT dependencies with the following command:
After the installation, make sure the Ollama desktop app is closed.
Install the models to be used, the default settings-ollama.yaml is configured to user `mistral 7b` LLM (~4GB) and `nomic-embed-text` Embeddings (~275MB). Therefore:
```bash
poetry install --extras "ui llms-ollama embeddings-huggingface vector-stores-qdrant"
ollama pull mistral
ollama pull nomic-embed-text
```
We are installing "embeddings-huggingface" dependency to support local embeddings, because Ollama doesn't support embeddings just yet. But they working on it!
In order for local embeddings to work, you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
Now, start Ollama service (it will start a local inference server, serving both the LLM and the Embeddings):
```bash
poetry run python scripts/setup
ollama serve
```
Once done, on a different terminal, you can install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
```
Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.
@@ -102,7 +110,7 @@ Once installed, you can run PrivateGPT. Make sure you have a working Ollama runn
PGPT_PROFILES=ollama make run
```
PrivateGPT will use the already existing `settings-ollama.yaml` settings file, which is already configured to use Ollama LLM, local Embeddings, and Qdrant. Review it and adapt it to your needs (different LLM model, different Ollama port, etc.)
PrivateGPT will use the already existing `settings-ollama.yaml` settings file, which is already configured to use Ollama LLM and Embeddings, and Qdrant. Review it and adapt it to your needs (different models, different Ollama port, etc.)
The UI will be available at http://localhost:8001
@@ -114,7 +122,7 @@ You need to have access to sagemaker inference endpoints for the LLM and / or th
Edit the `settings-sagemaker.yaml` file to include the correct Sagemaker endpoints.
Then, install PrivateGPT dependencies with the following command:
Then, install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-sagemaker embeddings-sagemaker vector-stores-qdrant"
```
@@ -129,75 +137,6 @@ PrivateGPT will use the already existing `settings-sagemaker.yaml` settings file
The UI will be available at http://localhost:8001
### Local, TensorRT-powered setup
To get the most out of NVIDIA GPUs, you can set up a fully local PrivateGPT using TensorRT as its LLM provider. For more information about Nvidia TensorRT, check the [official documentation](https://github.com/NVIDIA/TensorRT-LLM).
Follow these steps to set up a local TensorRT-powered PrivateGPT:
- Nvidia Cuda 12.2 or higher is currently required to run TensorRT-LLM.
- Install tensorrt_llm via pip as explained [here](https://pypi.org/project/tensorrt-llm/)
```bash
pip install --no-cache-dir --extra-index-url https://pypi.nvidia.com tensorrt-llm
````
- For this example we will use Llama2. The Llama2 model files need to be created via scripts following the instructions [here](https://github.com/NVIDIA/trt-llm-rag-windows/blob/release/1.0/README.md#building-trt-engine).
The following files will be created from following the steps in the link:
* `Llama_float16_tp1_rank0.engine`: The main output of the build script, containing the executable graph of operations with the model weights embedded.
* `config.jsonp`: Includes detailed information about the model, like its general structure and precision, as well as information about which plug-ins were incorporated into the engine.
* `model.cache`: Caches some of the timing and optimization information from model compilation, making successive builds quicker.
- Create a folder inside `models` called `tensorrt`, and move all of the files mentioned above to that directory.
Once done, you can install PrivateGPT dependencies with the following command:
```bash
poetry install --extras "ui llms-nvidia-tensorrt embeddings-huggingface vector-stores-qdrant"
```
We are installing "embeddings-huggingface" dependency to support local embeddings, because TensorRT only covers the LLM.
In order for local embeddings to work, you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
```bash
poetry run python scripts/setup
```
Once installed, you can run PrivateGPT.
```bash
PGPT_PROFILES=tensorrt make run
```
PrivateGPT will use the already existing `settings-tensorrt.yaml` settings file, which is already configured to use Nvidia TensorRT LLM, local Embeddings, and Qdrant. Review it and adapt it to your needs (different LLM model, etc.)
The UI will be available at http://localhost:8001
### Local, Llama-CPP powered setup
If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command to install its dependencies:
```bash
poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"
```
In order for local LLM and embeddings to work, you need to download the models to the `models` folder. You can do so by running the `setup` script:
```bash
poetry run python scripts/setup
```
Once installed, you can run PrivateGPT with the following command:
```bash
PGPT_PROFILES=local make run
```
PrivateGPT will load the already existing `settings-local.yaml` file, which is already configured to use LlamaCPP LLM, HuggingFace embeddings and Qdrant.
The UI will be available at http://localhost:8001
### Non-Private, OpenAI-powered test setup
If you want to test PrivateGPT with OpenAI's LLM and Embeddings -taking into account your data is going to OpenAI!- you can run the following command:
@@ -206,7 +145,7 @@ You need an OPENAI API key to run this setup.
Edit the `settings-openai.yaml` file to include the correct API KEY. Never commit it! It's a secret! As an alternative to editing `settings-openai.yaml`, you can just set the env var OPENAI_API_KEY.
Then, install PrivateGPT dependencies with the following command:
Then, install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-openai embeddings-openai vector-stores-qdrant"
```
@@ -223,7 +162,7 @@ The UI will be available at http://localhost:8001
### Local, Llama-CPP powered setup
If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command to install its dependencies:
If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command:
```bash
poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"

98
poetry.lock generated
View File

@@ -98,7 +98,6 @@ files = [
[package.dependencies]
aiosignal = ">=1.1.2"
async-timeout = {version = ">=4.0,<5.0", markers = "python_version < \"3.11\""}
attrs = ">=17.3.0"
frozenlist = ">=1.1.1"
multidict = ">=4.5,<7.0"
@@ -139,7 +138,6 @@ numpy = "*"
packaging = "*"
pandas = ">=0.25"
toolz = "*"
typing-extensions = {version = ">=4.0.1", markers = "python_version < \"3.11\""}
[package.extras]
dev = ["anywidget", "geopandas", "hatch", "ipython", "m2r", "mypy", "pandas-stubs", "pyarrow (>=11)", "pytest", "pytest-cov", "ruff (>=0.1.3)", "types-jsonschema", "types-setuptools", "vega-datasets", "vegafusion[embed] (>=1.4.0)", "vl-convert-python (>=1.1.0)"]
@@ -168,7 +166,6 @@ files = [
]
[package.dependencies]
exceptiongroup = {version = "*", markers = "python_version < \"3.11\""}
idna = ">=2.8"
sniffio = ">=1.1"
@@ -188,9 +185,6 @@ files = [
{file = "asgiref-3.7.2.tar.gz", hash = "sha256:9e0ce3aa93a819ba5b45120216b23878cf6e8525eb3848653452b4192b92afed"},
]
[package.dependencies]
typing-extensions = {version = ">=4", markers = "python_version < \"3.11\""}
[package.extras]
tests = ["mypy (>=0.800)", "pytest", "pytest-asyncio"]
@@ -198,7 +192,7 @@ tests = ["mypy (>=0.800)", "pytest", "pytest-asyncio"]
name = "async-timeout"
version = "4.0.3"
description = "Timeout context manager for asyncio programs"
optional = false
optional = true
python-versions = ">=3.7"
files = [
{file = "async-timeout-4.0.3.tar.gz", hash = "sha256:4640d96be84d82d02ed59ea2b7105a0f7b33abe8703703cd0ab0bf87c427522f"},
@@ -378,7 +372,6 @@ click = ">=8.0.0"
mypy-extensions = ">=0.4.3"
pathspec = ">=0.9.0"
platformdirs = ">=2"
tomli = {version = ">=1.1.0", markers = "python_full_version < \"3.11.0a7\""}
[package.extras]
colorama = ["colorama (>=0.4.3)"]
@@ -453,7 +446,6 @@ files = [
colorama = {version = "*", markers = "os_name == \"nt\""}
packaging = ">=19.0"
pyproject_hooks = "*"
tomli = {version = ">=1.1.0", markers = "python_version < \"3.11\""}
[package.extras]
docs = ["furo (>=2023.08.17)", "sphinx (>=7.0,<8.0)", "sphinx-argparse-cli (>=1.5)", "sphinx-autodoc-typehints (>=1.10)", "sphinx-issues (>=3.0.0)"]
@@ -837,9 +829,6 @@ files = [
{file = "coverage-7.3.3.tar.gz", hash = "sha256:df04c64e58df96b4427db8d0559e95e2df3138c9916c96f9f6a4dd220db2fdb7"},
]
[package.dependencies]
tomli = {version = "*", optional = true, markers = "python_full_version <= \"3.11.0a6\" and extra == \"toml\""}
[package.extras]
toml = ["tomli"]
@@ -968,20 +957,6 @@ files = [
dnspython = ">=2.0.0"
idna = ">=2.0.0"
[[package]]
name = "exceptiongroup"
version = "1.2.0"
description = "Backport of PEP 654 (exception groups)"
optional = false
python-versions = ">=3.7"
files = [
{file = "exceptiongroup-1.2.0-py3-none-any.whl", hash = "sha256:4bfd3996ac73b41e9b9628b04e079f193850720ea5945fc96a08633c66912f14"},
{file = "exceptiongroup-1.2.0.tar.gz", hash = "sha256:91f5c769735f051a4290d52edd0858999b57e5876e9f85937691bd4c9fa3ed68"},
]
[package.extras]
test = ["pytest (>=6)"]
[[package]]
name = "fastapi"
version = "0.110.0"
@@ -2064,13 +2039,13 @@ test = ["httpx (>=0.24.1)", "pytest (>=7.4.0)", "scipy (>=1.10)"]
[[package]]
name = "llama-index-core"
version = "0.10.13"
version = "0.10.14.post1"
description = "Interface between LLMs and your data"
optional = false
python-versions = ">=3.8.1,<4.0"
files = [
{file = "llama_index_core-0.10.13-py3-none-any.whl", hash = "sha256:40c76fc02be7cd948a333ca541f2ff38cf02774e1c960674e2b68c61943bac90"},
{file = "llama_index_core-0.10.13.tar.gz", hash = "sha256:826fded00767923fba8aca94f46c32b259e8879f517016ab7a3801b1b37187a1"},
{file = "llama_index_core-0.10.14.post1-py3-none-any.whl", hash = "sha256:7b12ebebe023e8f5e50c0fcff4af7a67e4842b2e1ca6a84b09442394d2689de6"},
{file = "llama_index_core-0.10.14.post1.tar.gz", hash = "sha256:adb931fced7bff092b26599e7f89952c171bf2994872906b5712ecc8107d4727"},
]
[package.dependencies]
@@ -2122,6 +2097,20 @@ llama-index-core = ">=0.10.1,<0.11.0"
torch = ">=2.1.2,<3.0.0"
transformers = ">=4.37.0,<5.0.0"
[[package]]
name = "llama-index-embeddings-ollama"
version = "0.1.2"
description = "llama-index embeddings ollama integration"
optional = true
python-versions = ">=3.8.1,<4.0"
files = [
{file = "llama_index_embeddings_ollama-0.1.2-py3-none-any.whl", hash = "sha256:ac7afabfa1134059af351b021e05e256bf86dd15e5176ffa5ab0305bcf03b33f"},
{file = "llama_index_embeddings_ollama-0.1.2.tar.gz", hash = "sha256:a9e0809bddd2e4ad888f249519edc7e3d339c74e4e03fc5a40c3060dc41d47a9"},
]
[package.dependencies]
llama-index-core = ">=0.10.1,<0.11.0"
[[package]]
name = "llama-index-embeddings-openai"
version = "0.1.6"
@@ -2151,22 +2140,6 @@ files = [
llama-cpp-python = ">=0.2.32,<0.3.0"
llama-index-core = ">=0.10.1,<0.11.0"
[[package]]
name = "llama-index-llms-nvidia-tensorrt"
version = "0.1.4"
description = "llama-index llms nvidia tensorrt integration"
optional = true
python-versions = ">=3.8.1,<4.0"
files = [
{file = "llama_index_llms_nvidia_tensorrt-0.1.4-py3-none-any.whl", hash = "sha256:146b249de86317985d57d1acb89e5af1ef1564462899e6711f1ec97b3ba9ce7c"},
{file = "llama_index_llms_nvidia_tensorrt-0.1.4.tar.gz", hash = "sha256:7edddbe1ad2bc8f9fc2812853b800c8ad2b610931b870d49ad7d5be920e6dbfc"},
]
[package.dependencies]
llama-index-core = ">=0.10.1,<0.11.0"
torch = ">=2.1.2,<3.0.0"
transformers = ">=4.37.0,<5.0.0"
[[package]]
name = "llama-index-llms-ollama"
version = "0.1.2"
@@ -2692,7 +2665,6 @@ files = [
[package.dependencies]
mypy-extensions = ">=1.0.0"
tomli = {version = ">=1.1.0", markers = "python_version < \"3.11\""}
typing-extensions = ">=4.1.0"
[package.extras]
@@ -3325,10 +3297,7 @@ files = [
]
[package.dependencies]
numpy = [
{version = ">=1.22.4,<2", markers = "python_version < \"3.11\""},
{version = ">=1.23.2,<2", markers = "python_version == \"3.11\""},
]
numpy = {version = ">=1.23.2,<2", markers = "python_version == \"3.11\""}
python-dateutil = ">=2.8.2"
pytz = ">=2020.1"
tzdata = ">=2022.1"
@@ -4016,9 +3985,6 @@ files = [
{file = "pyproject_hooks-1.0.0.tar.gz", hash = "sha256:f271b298b97f5955d53fb12b72c1fb1948c22c1a6b70b315c54cedaca0264ef5"},
]
[package.dependencies]
tomli = {version = ">=1.1.0", markers = "python_version < \"3.11\""}
[[package]]
name = "pyreadline3"
version = "3.4.1"
@@ -4043,11 +4009,9 @@ files = [
[package.dependencies]
colorama = {version = "*", markers = "sys_platform == \"win32\""}
exceptiongroup = {version = ">=1.0.0rc8", markers = "python_version < \"3.11\""}
iniconfig = "*"
packaging = "*"
pluggy = ">=0.12,<2.0"
tomli = {version = ">=1.0.0", markers = "python_version < \"3.11\""}
[package.extras]
testing = ["argcomplete", "attrs (>=19.2.0)", "hypothesis (>=3.56)", "mock", "nose", "pygments (>=2.7.2)", "requests", "setuptools", "xmlschema"]
@@ -5085,17 +5049,6 @@ dev = ["tokenizers[testing]"]
docs = ["setuptools_rust", "sphinx", "sphinx_rtd_theme"]
testing = ["black (==22.3)", "datasets", "numpy", "pytest", "requests"]
[[package]]
name = "tomli"
version = "2.0.1"
description = "A lil' TOML parser"
optional = false
python-versions = ">=3.7"
files = [
{file = "tomli-2.0.1-py3-none-any.whl", hash = "sha256:939de3e7a6161af0c887ef91b7d41a53e7c5a1ca976325f429cb46ea9bc30ecc"},
{file = "tomli-2.0.1.tar.gz", hash = "sha256:de526c12914f0c550d15924c62d72abc48d6fe7364aa87328337a31007fe8a4f"},
]
[[package]]
name = "tomlkit"
version = "0.12.0"
@@ -5193,13 +5146,13 @@ telegram = ["requests"]
[[package]]
name = "transformers"
version = "4.38.1"
version = "4.38.2"
description = "State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow"
optional = false
python-versions = ">=3.8.0"
files = [
{file = "transformers-4.38.1-py3-none-any.whl", hash = "sha256:a7a9265fb060183e9d975cbbadc4d531b10281589c43f6d07563f86322728973"},
{file = "transformers-4.38.1.tar.gz", hash = "sha256:86dc84ccbe36123647e84cbd50fc31618c109a41e6be92514b064ab55bf1304c"},
{file = "transformers-4.38.2-py3-none-any.whl", hash = "sha256:c4029cb9f01b3dd335e52f364c52d2b37c65b4c78e02e6a08b1919c5c928573e"},
{file = "transformers-4.38.2.tar.gz", hash = "sha256:c5fc7ad682b8a50a48b2a4c05d4ea2de5567adb1bdd00053619dbe5960857dd5"},
]
[package.dependencies]
@@ -5464,7 +5417,6 @@ h11 = ">=0.8"
httptools = {version = ">=0.5.0", optional = true, markers = "extra == \"standard\""}
python-dotenv = {version = ">=0.13", optional = true, markers = "extra == \"standard\""}
pyyaml = {version = ">=5.1", optional = true, markers = "extra == \"standard\""}
typing-extensions = {version = ">=4.0", markers = "python_version < \"3.11\""}
uvloop = {version = ">=0.14.0,<0.15.0 || >0.15.0,<0.15.1 || >0.15.1", optional = true, markers = "(sys_platform != \"win32\" and sys_platform != \"cygwin\") and platform_python_implementation != \"PyPy\" and extra == \"standard\""}
watchfiles = {version = ">=0.13", optional = true, markers = "extra == \"standard\""}
websockets = {version = ">=10.4", optional = true, markers = "extra == \"standard\""}
@@ -5958,10 +5910,10 @@ testing = ["big-O", "jaraco.functools", "jaraco.itertools", "more-itertools", "p
[extras]
embeddings-huggingface = ["llama-index-embeddings-huggingface"]
embeddings-ollama = ["llama-index-embeddings-ollama"]
embeddings-openai = ["llama-index-embeddings-openai"]
embeddings-sagemaker = ["boto3"]
llms-llama-cpp = ["llama-index-llms-llama-cpp"]
llms-nvidia-tensorrt = ["llama-index-llms-nvidia-tensorrt"]
llms-ollama = ["llama-index-llms-ollama"]
llms-openai = ["llama-index-llms-openai"]
llms-openai-like = ["llama-index-llms-openai-like"]
@@ -5973,5 +5925,5 @@ vector-stores-qdrant = ["llama-index-vector-stores-qdrant"]
[metadata]
lock-version = "2.0"
python-versions = ">=3.10,<3.12"
content-hash = "39f0ac666402807cde29f763c14dfb6b2fc9862c0cd31de398c67a1fedbb4b12"
python-versions = ">=3.11,<3.12"
content-hash = "41849a9d15848a354fd4cc0ca9d752148e76fee64d8bb5b881210c2290fc8072"

View File

@@ -57,6 +57,21 @@ class EmbeddingComponent:
openai_settings = settings.openai.api_key
self.embedding_model = OpenAIEmbedding(api_key=openai_settings)
case "ollama":
try:
from llama_index.embeddings.ollama import ( # type: ignore
OllamaEmbedding,
)
except ImportError as e:
raise ImportError(
"Local dependencies not found, install with `poetry install --extras embeddings-ollama`"
) from e
ollama_settings = settings.ollama
self.embedding_model = OllamaEmbedding(
model_name=ollama_settings.embedding_model,
base_url=ollama_settings.api_base,
)
case "mock":
# Not a random number, is the dimensionality used by
# the default embedding model

View File

@@ -109,24 +109,7 @@ class LLMComponent:
ollama_settings = settings.ollama
self.llm = Ollama(
model=ollama_settings.model, base_url=ollama_settings.api_base
)
case "tensorrt":
try:
from llama_index.llms.nvidia_tensorrt import ( # type: ignore
LocalTensorRTLLM,
)
except ImportError as e:
raise ImportError(
"Nvidia TensorRTLLM dependencies not found, install with `poetry install --extras llms-nvidia-tensorrt`"
) from e
prompt_style = get_prompt_style(settings.tensorrt.prompt_style)
self.llm = LocalTensorRTLLM(
model_path=settings.tensorrt.model_path,
engine_name=settings.tensorrt.engine_name,
tokenizer_dir=settings.llm.tokenizer,
completion_to_prompt=prompt_style.completion_to_prompt,
model=ollama_settings.llm_model, base_url=ollama_settings.api_base
)
case "mock":
self.llm = MockLLM()

View File

@@ -3,7 +3,12 @@ import typing
from injector import inject, singleton
from llama_index.core.indices.vector_store import VectorIndexRetriever, VectorStoreIndex
from llama_index.core.vector_stores.types import VectorStore
from llama_index.core.vector_stores.types import (
FilterCondition,
MetadataFilter,
MetadataFilters,
VectorStore,
)
from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.paths import local_data_path
@@ -12,31 +17,26 @@ from private_gpt.settings.settings import Settings
logger = logging.getLogger(__name__)
@typing.no_type_check
def _chromadb_doc_id_metadata_filter(
def _doc_id_metadata_filter(
context_filter: ContextFilter | None,
) -> dict | None:
if context_filter is None or context_filter.docs_ids is None:
return {} # No filter
elif len(context_filter.docs_ids) < 1:
return {"doc_id": "-"} # Effectively filtering out all docs
else:
doc_filter_items = []
if len(context_filter.docs_ids) > 1:
doc_filter = {"$or": doc_filter_items}
) -> MetadataFilters:
filters = MetadataFilters(filters=[], condition=FilterCondition.OR)
if context_filter is not None and context_filter.docs_ids is not None:
for doc_id in context_filter.docs_ids:
doc_filter_items.append({"doc_id": doc_id})
else:
doc_filter = {"doc_id": context_filter.docs_ids[0]}
return doc_filter
filters.filters.append(MetadataFilter(key="doc_id", value=doc_id))
return filters
@singleton
class VectorStoreComponent:
settings: Settings
vector_store: VectorStore
@inject
def __init__(self, settings: Settings) -> None:
self.settings = settings
match settings.vectorstore.database:
case "pgvector":
try:
@@ -96,7 +96,7 @@ class VectorStoreComponent:
from llama_index.vector_stores.qdrant import ( # type: ignore
QdrantVectorStore,
)
from qdrant_client import QdrantClient
from qdrant_client import QdrantClient # type: ignore
except ImportError as e:
raise ImportError(
"Qdrant dependencies not found, install with `poetry install --extras vector-stores-qdrant`"
@@ -126,20 +126,20 @@ class VectorStoreComponent:
f"Vectorstore database {settings.vectorstore.database} not supported"
)
@staticmethod
def get_retriever(
self,
index: VectorStoreIndex,
context_filter: ContextFilter | None = None,
similarity_top_k: int = 2,
) -> VectorIndexRetriever:
# This way we support qdrant (using doc_ids) and chroma (using where clause)
# This way we support qdrant (using doc_ids) and the rest (using filters)
return VectorIndexRetriever(
index=index,
similarity_top_k=similarity_top_k,
doc_ids=context_filter.docs_ids if context_filter else None,
vector_store_kwargs={
"where": _chromadb_doc_id_metadata_filter(context_filter)
},
filters=_doc_id_metadata_filter(context_filter)
if self.settings.vectorstore.database != "qdrant"
else None,
)
def close(self) -> None:

View File

@@ -102,18 +102,10 @@ class ChatService:
vector_index_retriever = self.vector_store_component.get_retriever(
index=self.index, context_filter=context_filter
)
# TODO ContextChatEngine is still not migrated by LlamaIndex to accept
# llm directly, so we are passing legacy ServiceContext until it is fixed.
from llama_index.core import ServiceContext
return ContextChatEngine.from_defaults(
system_prompt=system_prompt,
retriever=vector_index_retriever,
llm=self.llm_component.llm, # Takes no effect at the moment
service_context=ServiceContext.from_defaults(
llm=self.llm_component.llm,
embed_model=self.embedding_component.embedding_model,
),
node_postprocessors=[
MetadataReplacementPostProcessor(target_metadata_key="window"),
],

View File

@@ -81,9 +81,7 @@ class DataSettings(BaseModel):
class LLMSettings(BaseModel):
mode: Literal[
"llamacpp", "openai", "openailike", "sagemaker", "mock", "ollama", "tensorrt"
]
mode: Literal["llamacpp", "openai", "openailike", "sagemaker", "mock", "ollama"]
max_new_tokens: int = Field(
256,
description="The maximum number of token that the LLM is authorized to generate in one completion.",
@@ -122,22 +120,6 @@ class LlamaCPPSettings(BaseModel):
)
class TensorRTSettings(BaseModel):
model_path: str
engine_name: str
prompt_style: Literal["default", "llama2", "tag", "mistral", "chatml"] = Field(
"llama2",
description=(
"The prompt style to use for the chat engine. "
"If `default` - use the default prompt style from the llama_index. It should look like `role: message`.\n"
"If `llama2` - use the llama2 prompt style from the llama_index. Based on `<s>`, `[INST]` and `<<SYS>>`.\n"
"If `tag` - use the `tag` prompt style. It should look like `<|role|>: message`. \n"
"If `mistral` - use the `mistral prompt style. It shoudl look like <s>[INST] {System Prompt} [/INST]</s>[INST] { UserInstructions } [/INST]"
"`llama2` is the historic behaviour. `default` might work better with your custom models."
),
)
class HuggingFaceSettings(BaseModel):
embedding_hf_model_name: str = Field(
description="Name of the HuggingFace model to use for embeddings"
@@ -145,7 +127,7 @@ class HuggingFaceSettings(BaseModel):
class EmbeddingSettings(BaseModel):
mode: Literal["huggingface", "openai", "sagemaker", "mock"]
mode: Literal["huggingface", "openai", "sagemaker", "ollama", "mock"]
ingest_mode: Literal["simple", "batch", "parallel"] = Field(
"simple",
description=(
@@ -194,10 +176,14 @@ class OllamaSettings(BaseModel):
"http://localhost:11434",
description="Base URL of Ollama API. Example: 'https://localhost:11434'.",
)
model: str = Field(
llm_model: str = Field(
None,
description="Model to use. Example: 'llama2-uncensored'.",
)
embedding_model: str = Field(
None,
description="Model to use. Example: 'nomic-embed-text'.",
)
class UISettings(BaseModel):
@@ -314,7 +300,6 @@ class Settings(BaseModel):
llm: LLMSettings
embedding: EmbeddingSettings
llamacpp: LlamaCPPSettings
tensorrt: TensorRTSettings
huggingface: HuggingFaceSettings
sagemaker: SagemakerSettings
openai: OpenAISettings

View File

@@ -1,32 +1,32 @@
[tool.poetry]
name = "private-gpt"
version = "0.2.0"
version = "0.4.0"
description = "Private GPT"
authors = ["Zylon <hi@zylon.ai>"]
[tool.poetry.dependencies]
python = ">=3.10,<3.12"
python = ">=3.11,<3.12"
# PrivateGPT
fastapi = { extras = ["all"], version = "^0.110.0" }
python-multipart = "^0.0.9"
injector = "^0.21.0"
pyyaml = "^6.0.1"
watchdog = "^4.0.0"
transformers = "^4.38.1"
transformers = "^4.38.2"
# LlamaIndex core libs
llama-index-core = "^0.10.13"
llama-index-core = "^0.10.14"
llama-index-readers-file = "^0.1.6"
# Optional LlamaIndex integration libs
llama-index-llms-llama-cpp = {version = "^0.1.3", optional = true}
llama-index-llms-openai = {version = "^0.1.6", optional = true}
llama-index-llms-openai-like = {version ="^0.1.3", optional = true}
llama-index-llms-ollama = {version ="^0.1.2", optional = true}
llama-index-embeddings-ollama = {version ="^0.1.2", optional = true}
llama-index-embeddings-huggingface = {version ="^0.1.4", optional = true}
llama-index-embeddings-openai = {version ="^0.1.6", optional = true}
llama-index-vector-stores-qdrant = {version ="^0.1.3", optional = true}
llama-index-vector-stores-chroma = {version ="^0.1.4", optional = true}
llama-index-vector-stores-postgres = {version ="^0.1.2", optional = true}
llama-index-llms-nvidia-tensorrt = {version ="^0.1.2", optional = true}
# Optional Sagemaker dependency
boto3 = {version ="^1.34.51", optional = true}
# Optional UI
@@ -39,7 +39,7 @@ llms-openai = ["llama-index-llms-openai"]
llms-openai-like = ["llama-index-llms-openai-like"]
llms-ollama = ["llama-index-llms-ollama"]
llms-sagemaker = ["boto3"]
llms-nvidia-tensorrt = ["llama-index-llms-nvidia-tensorrt"]
embeddings-ollama = ["llama-index-embeddings-ollama"]
embeddings-huggingface = ["llama-index-embeddings-huggingface"]
embeddings-openai = ["llama-index-embeddings-openai"]
embeddings-sagemaker = ["boto3"]
@@ -47,6 +47,7 @@ vector-stores-qdrant = ["llama-index-vector-stores-qdrant"]
vector-stores-chroma = ["llama-index-vector-stores-chroma"]
vector-stores-postgres = ["llama-index-vector-stores-postgres"]
[tool.poetry.group.dev.dependencies]
black = "^22"
mypy = "^1.2"

View File

@@ -6,15 +6,13 @@ llm:
max_new_tokens: 512
context_window: 3900
ollama:
model: llama2
api_base: http://localhost:11434
embedding:
mode: huggingface
mode: ollama
huggingface:
embedding_hf_model_name: BAAI/bge-small-en-v1.5
ollama:
llm_model: mistral
embedding_model: nomic-embed-text
api_base: http://localhost:11434
vectorstore:
database: qdrant

View File

@@ -1,25 +0,0 @@
server:
env_name: ${APP_ENV:tensorrt}
llm:
mode: tensorrt
max_new_tokens: 512
context_window: 3900
tensorrt:
model_path: models/tensorrt
engine_name: llama_float16_tp1_rank0.engine
prompt_style: "llama2"
embedding:
mode: huggingface
huggingface:
embedding_hf_model_name: BAAI/bge-small-en-v1.5
vectorstore:
database: qdrant
qdrant:
path: local_data/private_gpt/qdrant

View File

@@ -78,9 +78,6 @@ openai:
model: gpt-3.5-turbo
ollama:
model: llama2-uncensored
tensorrt:
model_path: models/tensorrt
engine_name: llama_float16_tp1_rank0.engine
prompt_style: "llama2"
llm_model: llama2
embedding_model: nomic-embed-text
api_base: http://localhost:11434

View File

@@ -1 +1 @@
0.3.0
0.4.0