chore(main): release 0.4.0 (#1628 )

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
feat: Upgrade to LlamaIndex to 0.10 (#1663 )
2025-10-11 09:43:31 +00:00 · 2024-03-06 17:53:35 +01:00 · 2024-03-06 17:51:30 +01:00
14 changed files with 122 additions and 276 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,13 @@
 # Changelog

+## [0.4.0](https://github.com/imartinez/privateGPT/compare/v0.3.0...v0.4.0) (2024-03-06)
+
+
+### Features
+
+* Upgrade to LlamaIndex to 0.10 ([#1663](https://github.com/imartinez/privateGPT/issues/1663)) ([45f0571](https://github.com/imartinez/privateGPT/commit/45f05711eb71ffccdedb26f37e680ced55795d44))
+* **Vector:** support pgvector ([#1624](https://github.com/imartinez/privateGPT/issues/1624)) ([cd40e39](https://github.com/imartinez/privateGPT/commit/cd40e3982b780b548b9eea6438c759f1c22743a8))
+
 ## [0.3.0](https://github.com/imartinez/privateGPT/compare/v0.2.0...v0.3.0) (2024-02-16)


--- a/fern/docs/pages/installation/concepts.mdx
+++ b/fern/docs/pages/installation/concepts.mdx
@@ -40,20 +40,21 @@ In order to run PrivateGPT in a fully local setup, you will need to run the LLM,
 ### Vector stores
 The vector stores supported (Qdrant, ChromaDB and Postgres) run locally by default.
 ### Embeddings
-For local embeddings you need to install the 'embeddings-huggingface' extra dependencies. It will use Huggingface Embeddings.
+For local Embeddings there are two options:
+* (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
+* You can use the 'embeddings-huggingface' option in PrivateGPT, which will use HuggingFace.

-Note: Ollama will support Embeddings in the short term for easier installation, but it doesn't as of today.
-
-In order for local embeddings to work, you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
+In order for HuggingFace LLM to work (the second option), you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
 ```bash
 poetry run python scripts/setup
 ```
+
 ### LLM
 For local LLM there are two options:
 * (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
 * You can use the 'llms-llama-cpp' option in PrivateGPT, which will use LlamaCPP. It works great on Mac with Metal most of the times (leverages Metal GPU), but it can be tricky in certain Linux and Windows distributions, depending on the GPU. In the installation document you'll find guides and troubleshooting.

-In order for local LLM to work (the second option), you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
+In order for LlamaCPP powered LLM to work (the second option), you need to download the LLM model to the `models` folder. You can do so by running the `setup` script:
 ```bash
 poetry run python scripts/setup
 ```
--- a/fern/docs/pages/installation/installation.mdx
+++ b/fern/docs/pages/installation/installation.mdx
@@ -44,12 +44,12 @@ poetry install --extras "<extra1> <extra2>..."
 Where `<extra>` can be any of the following:

 - ui: adds support for UI using Gradio
- llms-ollama: adds support for Ollama LLM, the easiest way to get a local LLM running
+- llms-ollama: adds support for Ollama LLM, the easiest way to get a local LLM running, requires Ollama running locally
 - llms-llama-cpp: adds support for local LLM using LlamaCPP - expect a messy installation process on some platforms
 - llms-sagemaker: adds support for Amazon Sagemaker LLM, requires Sagemaker inference endpoints
- llms-nvidia-tensorrt: add support for Nvidia TensorRT LLM
 - llms-openai: adds support for OpenAI LLM, requires OpenAI API key
 - llms-openai-like: adds support for 3rd party LLM providers that are compatible with OpenAI's API
+- embeddings-ollama: adds support for Ollama Embeddings, requires Ollama running locally
 - embeddings-huggingface: adds support for local Embeddings using HuggingFace
 - embeddings-sagemaker: adds support for Amazon Sagemaker Embeddings, requires Sagemaker inference endpoints
 - embeddings-openai = adds support for OpenAI Embeddings, requires OpenAI API key
@@ -79,21 +79,29 @@ set PGPT_PROFILES=ollama
 make run
 ```

-### Local, Ollama-powered setup
+### Local, Ollama-powered setup - RECOMMENDED

-The easiest way to run PrivateGPT fully locally is to depend on Ollama for the LLM. Ollama provides a local LLM that is easy to install and use.
+**The easiest way to run PrivateGPT fully locally** is to depend on Ollama for the LLM. Ollama provides local LLM and Embeddings super easy to install and use, abstracting the complexity of GPU support. It's the recommended setup for local development.

 Go to [ollama.ai](https://ollama.ai/) and follow the instructions to install Ollama on your machine.

-Once done, you can install PrivateGPT dependencies with the following command:
+After the installation, make sure the Ollama desktop app is closed.
+
+Install the models to be used, the default settings-ollama.yaml is configured to user `mistral 7b` LLM (~4GB) and `nomic-embed-text` Embeddings (~275MB). Therefore:
+
 ```bash
-poetry install --extras "ui llms-ollama embeddings-huggingface vector-stores-qdrant"
+ollama pull mistral
+ollama pull nomic-embed-text
 ```

-We are installing "embeddings-huggingface" dependency to support local embeddings, because Ollama doesn't support embeddings just yet. But they working on it!
-In order for local embeddings to work, you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
+Now, start Ollama service (it will start a local inference server, serving both the LLM and the Embeddings):
 ```bash
-poetry run python scripts/setup
+ollama serve
+```
+
+Once done, on a different terminal, you can install PrivateGPT with the following command:
+```bash
+poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
 ```

 Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.
@@ -102,7 +110,7 @@ Once installed, you can run PrivateGPT. Make sure you have a working Ollama runn
 PGPT_PROFILES=ollama make run
 ```

-PrivateGPT will use the already existing `settings-ollama.yaml` settings file, which is already configured to use Ollama LLM, local Embeddings, and Qdrant. Review it and adapt it to your needs (different LLM model, different Ollama port, etc.)
+PrivateGPT will use the already existing `settings-ollama.yaml` settings file, which is already configured to use Ollama LLM and Embeddings, and Qdrant. Review it and adapt it to your needs (different models, different Ollama port, etc.)

 The UI will be available at http://localhost:8001

@@ -114,7 +122,7 @@ You need to have access to sagemaker inference endpoints for the LLM and / or th

 Edit the `settings-sagemaker.yaml` file to include the correct Sagemaker endpoints.

-Then, install PrivateGPT dependencies with the following command:
+Then, install PrivateGPT with the following command:
 ```bash
 poetry install --extras "ui llms-sagemaker embeddings-sagemaker vector-stores-qdrant"
 ```
@@ -129,75 +137,6 @@ PrivateGPT will use the already existing `settings-sagemaker.yaml` settings file

 The UI will be available at http://localhost:8001

-### Local, TensorRT-powered setup
-
-To get the most out of NVIDIA GPUs, you can set up a fully local PrivateGPT using TensorRT as its LLM provider. For more information about Nvidia TensorRT, check the [official documentation](https://github.com/NVIDIA/TensorRT-LLM).
-
-Follow these steps to set up a local TensorRT-powered PrivateGPT:
-
- Nvidia Cuda 12.2 or higher is currently required to run TensorRT-LLM.
-
- Install tensorrt_llm via pip as explained [here](https://pypi.org/project/tensorrt-llm/)
-
-```bash
-pip install --no-cache-dir --extra-index-url https://pypi.nvidia.com tensorrt-llm
-````
-
- For this example we will use Llama2. The Llama2 model files need to be created via scripts following the instructions [here](https://github.com/NVIDIA/trt-llm-rag-windows/blob/release/1.0/README.md#building-trt-engine).
-The following files will be created from following the steps in the link:
-
-* `Llama_float16_tp1_rank0.engine`: The main output of the build script, containing the executable graph of operations with the model weights embedded.
-
-* `config.jsonp`: Includes detailed information about the model, like its general structure and precision, as well as information about which plug-ins were incorporated into the engine.
-
-* `model.cache`: Caches some of the timing and optimization information from model compilation, making successive builds quicker.
-
- Create a folder inside `models` called `tensorrt`, and move all of the files mentioned above to that directory.
-
-Once done, you can install PrivateGPT dependencies with the following command:
-```bash
-poetry install --extras "ui llms-nvidia-tensorrt embeddings-huggingface vector-stores-qdrant"
-```
-
-We are installing "embeddings-huggingface" dependency to support local embeddings, because TensorRT only covers the LLM.
-In order for local embeddings to work, you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
-```bash
-poetry run python scripts/setup
-```
-
-Once installed, you can run PrivateGPT.
-
-```bash
-PGPT_PROFILES=tensorrt make run
-```
-
-PrivateGPT will use the already existing `settings-tensorrt.yaml` settings file, which is already configured to use Nvidia TensorRT LLM, local Embeddings, and Qdrant. Review it and adapt it to your needs (different LLM model, etc.)
-
-The UI will be available at http://localhost:8001
-
-### Local, Llama-CPP powered setup
-
-If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command to install its dependencies:
-
-```bash
-poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"
-```
-
-In order for local LLM and embeddings to work, you need to download the models to the `models` folder. You can do so by running the `setup` script:
-```bash
-poetry run python scripts/setup
-```
-
-Once installed, you can run PrivateGPT with the following command:
-
-```bash
-PGPT_PROFILES=local make run
-```
-
-PrivateGPT will load the already existing `settings-local.yaml` file, which is already configured to use LlamaCPP LLM, HuggingFace embeddings and Qdrant.
-
-The UI will be available at http://localhost:8001
-
 ### Non-Private, OpenAI-powered test setup

 If you want to test PrivateGPT with OpenAI's LLM and Embeddings -taking into account your data is going to OpenAI!- you can run the following command:
@@ -206,7 +145,7 @@ You need an OPENAI API key to run this setup.

 Edit the `settings-openai.yaml` file to include the correct API KEY. Never commit it! It's a secret! As an alternative to editing `settings-openai.yaml`, you can just set the env var OPENAI_API_KEY.

-Then, install PrivateGPT dependencies with the following command:
+Then, install PrivateGPT with the following command:
 ```bash
 poetry install --extras "ui llms-openai embeddings-openai vector-stores-qdrant"
 ```
@@ -223,7 +162,7 @@ The UI will be available at http://localhost:8001

 ### Local, Llama-CPP powered setup

-If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command to install its dependencies:
+If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command:

 ```bash
 poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"
--- a/poetry.lock
+++ b/poetry.lock
@@ -98,7 +98,6 @@ files = [

 [package.dependencies]
 aiosignal = ">=1.1.2"
-async-timeout = {version = ">=4.0,<5.0", markers = "python_version < \"3.11\""}
 attrs = ">=17.3.0"
 frozenlist = ">=1.1.1"
 multidict = ">=4.5,<7.0"
@@ -139,7 +138,6 @@ numpy = "*"
 packaging = "*"
 pandas = ">=0.25"
 toolz = "*"
-typing-extensions = {version = ">=4.0.1", markers = "python_version < \"3.11\""}

 [package.extras]
 dev = ["anywidget", "geopandas", "hatch", "ipython", "m2r", "mypy", "pandas-stubs", "pyarrow (>=11)", "pytest", "pytest-cov", "ruff (>=0.1.3)", "types-jsonschema", "types-setuptools", "vega-datasets", "vegafusion[embed] (>=1.4.0)", "vl-convert-python (>=1.1.0)"]
@@ -168,7 +166,6 @@ files = [
 ]

 [package.dependencies]
-exceptiongroup = {version = "*", markers = "python_version < \"3.11\""}
 idna = ">=2.8"
 sniffio = ">=1.1"

@@ -188,9 +185,6 @@ files = [
    {file = "asgiref-3.7.2.tar.gz", hash = "sha256:9e0ce3aa93a819ba5b45120216b23878cf6e8525eb3848653452b4192b92afed"},
 ]

-[package.dependencies]
-typing-extensions = {version = ">=4", markers = "python_version < \"3.11\""}
-
 [package.extras]
 tests = ["mypy (>=0.800)", "pytest", "pytest-asyncio"]

@@ -198,7 +192,7 @@ tests = ["mypy (>=0.800)", "pytest", "pytest-asyncio"]
 name = "async-timeout"
 version = "4.0.3"
 description = "Timeout context manager for asyncio programs"
-optional = false
+optional = true
 python-versions = ">=3.7"
 files = [
    {file = "async-timeout-4.0.3.tar.gz", hash = "sha256:4640d96be84d82d02ed59ea2b7105a0f7b33abe8703703cd0ab0bf87c427522f"},
@@ -378,7 +372,6 @@ click = ">=8.0.0"
 mypy-extensions = ">=0.4.3"
 pathspec = ">=0.9.0"
 platformdirs = ">=2"
-tomli = {version = ">=1.1.0", markers = "python_full_version < \"3.11.0a7\""}

 [package.extras]
 colorama = ["colorama (>=0.4.3)"]
@@ -453,7 +446,6 @@ files = [
 colorama = {version = "*", markers = "os_name == \"nt\""}
 packaging = ">=19.0"
 pyproject_hooks = "*"
-tomli = {version = ">=1.1.0", markers = "python_version < \"3.11\""}

 [package.extras]
 docs = ["furo (>=2023.08.17)", "sphinx (>=7.0,<8.0)", "sphinx-argparse-cli (>=1.5)", "sphinx-autodoc-typehints (>=1.10)", "sphinx-issues (>=3.0.0)"]
@@ -837,9 +829,6 @@ files = [
    {file = "coverage-7.3.3.tar.gz", hash = "sha256:df04c64e58df96b4427db8d0559e95e2df3138c9916c96f9f6a4dd220db2fdb7"},
 ]

-[package.dependencies]
-tomli = {version = "*", optional = true, markers = "python_full_version <= \"3.11.0a6\" and extra == \"toml\""}
-
 [package.extras]
 toml = ["tomli"]

@@ -968,20 +957,6 @@ files = [
 dnspython = ">=2.0.0"
 idna = ">=2.0.0"

-[[package]]
-name = "exceptiongroup"
-version = "1.2.0"
-description = "Backport of PEP 654 (exception groups)"
-optional = false
-python-versions = ">=3.7"
-files = [
-    {file = "exceptiongroup-1.2.0-py3-none-any.whl", hash = "sha256:4bfd3996ac73b41e9b9628b04e079f193850720ea5945fc96a08633c66912f14"},
-    {file = "exceptiongroup-1.2.0.tar.gz", hash = "sha256:91f5c769735f051a4290d52edd0858999b57e5876e9f85937691bd4c9fa3ed68"},
-]
-
-[package.extras]
-test = ["pytest (>=6)"]
-
 [[package]]
 name = "fastapi"
 version = "0.110.0"
@@ -2064,13 +2039,13 @@ test = ["httpx (>=0.24.1)", "pytest (>=7.4.0)", "scipy (>=1.10)"]

 [[package]]
 name = "llama-index-core"
-version = "0.10.13"
+version = "0.10.14.post1"
 description = "Interface between LLMs and your data"
 optional = false
 python-versions = ">=3.8.1,<4.0"
 files = [
-    {file = "llama_index_core-0.10.13-py3-none-any.whl", hash = "sha256:40c76fc02be7cd948a333ca541f2ff38cf02774e1c960674e2b68c61943bac90"},
-    {file = "llama_index_core-0.10.13.tar.gz", hash = "sha256:826fded00767923fba8aca94f46c32b259e8879f517016ab7a3801b1b37187a1"},
+    {file = "llama_index_core-0.10.14.post1-py3-none-any.whl", hash = "sha256:7b12ebebe023e8f5e50c0fcff4af7a67e4842b2e1ca6a84b09442394d2689de6"},
+    {file = "llama_index_core-0.10.14.post1.tar.gz", hash = "sha256:adb931fced7bff092b26599e7f89952c171bf2994872906b5712ecc8107d4727"},
 ]

 [package.dependencies]
@@ -2122,6 +2097,20 @@ llama-index-core = ">=0.10.1,<0.11.0"
 torch = ">=2.1.2,<3.0.0"
 transformers = ">=4.37.0,<5.0.0"

+[[package]]
+name = "llama-index-embeddings-ollama"
+version = "0.1.2"
+description = "llama-index embeddings ollama integration"
+optional = true
+python-versions = ">=3.8.1,<4.0"
+files = [
+    {file = "llama_index_embeddings_ollama-0.1.2-py3-none-any.whl", hash = "sha256:ac7afabfa1134059af351b021e05e256bf86dd15e5176ffa5ab0305bcf03b33f"},
+    {file = "llama_index_embeddings_ollama-0.1.2.tar.gz", hash = "sha256:a9e0809bddd2e4ad888f249519edc7e3d339c74e4e03fc5a40c3060dc41d47a9"},
+]
+
+[package.dependencies]
+llama-index-core = ">=0.10.1,<0.11.0"
+
 [[package]]
 name = "llama-index-embeddings-openai"
 version = "0.1.6"
@@ -2151,22 +2140,6 @@ files = [
 llama-cpp-python = ">=0.2.32,<0.3.0"
 llama-index-core = ">=0.10.1,<0.11.0"

-[[package]]
-name = "llama-index-llms-nvidia-tensorrt"
-version = "0.1.4"
-description = "llama-index llms nvidia tensorrt integration"
-optional = true
-python-versions = ">=3.8.1,<4.0"
-files = [
-    {file = "llama_index_llms_nvidia_tensorrt-0.1.4-py3-none-any.whl", hash = "sha256:146b249de86317985d57d1acb89e5af1ef1564462899e6711f1ec97b3ba9ce7c"},
-    {file = "llama_index_llms_nvidia_tensorrt-0.1.4.tar.gz", hash = "sha256:7edddbe1ad2bc8f9fc2812853b800c8ad2b610931b870d49ad7d5be920e6dbfc"},
-]
-
-[package.dependencies]
-llama-index-core = ">=0.10.1,<0.11.0"
-torch = ">=2.1.2,<3.0.0"
-transformers = ">=4.37.0,<5.0.0"
-
 [[package]]
 name = "llama-index-llms-ollama"
 version = "0.1.2"
@@ -2692,7 +2665,6 @@ files = [

 [package.dependencies]
 mypy-extensions = ">=1.0.0"
-tomli = {version = ">=1.1.0", markers = "python_version < \"3.11\""}
 typing-extensions = ">=4.1.0"

 [package.extras]
@@ -3325,10 +3297,7 @@ files = [
 ]

 [package.dependencies]
-numpy = [
-    {version = ">=1.22.4,<2", markers = "python_version < \"3.11\""},
-    {version = ">=1.23.2,<2", markers = "python_version == \"3.11\""},
-]
+numpy = {version = ">=1.23.2,<2", markers = "python_version == \"3.11\""}
 python-dateutil = ">=2.8.2"
 pytz = ">=2020.1"
 tzdata = ">=2022.1"
@@ -4016,9 +3985,6 @@ files = [
    {file = "pyproject_hooks-1.0.0.tar.gz", hash = "sha256:f271b298b97f5955d53fb12b72c1fb1948c22c1a6b70b315c54cedaca0264ef5"},
 ]

-[package.dependencies]
-tomli = {version = ">=1.1.0", markers = "python_version < \"3.11\""}
-
 [[package]]
 name = "pyreadline3"
 version = "3.4.1"
@@ -4043,11 +4009,9 @@ files = [

 [package.dependencies]
 colorama = {version = "*", markers = "sys_platform == \"win32\""}
-exceptiongroup = {version = ">=1.0.0rc8", markers = "python_version < \"3.11\""}
 iniconfig = "*"
 packaging = "*"
 pluggy = ">=0.12,<2.0"
-tomli = {version = ">=1.0.0", markers = "python_version < \"3.11\""}

 [package.extras]
 testing = ["argcomplete", "attrs (>=19.2.0)", "hypothesis (>=3.56)", "mock", "nose", "pygments (>=2.7.2)", "requests", "setuptools", "xmlschema"]
@@ -5085,17 +5049,6 @@ dev = ["tokenizers[testing]"]
 docs = ["setuptools_rust", "sphinx", "sphinx_rtd_theme"]
 testing = ["black (==22.3)", "datasets", "numpy", "pytest", "requests"]

-[[package]]
-name = "tomli"
-version = "2.0.1"
-description = "A lil' TOML parser"
-optional = false
-python-versions = ">=3.7"
-files = [
-    {file = "tomli-2.0.1-py3-none-any.whl", hash = "sha256:939de3e7a6161af0c887ef91b7d41a53e7c5a1ca976325f429cb46ea9bc30ecc"},
-    {file = "tomli-2.0.1.tar.gz", hash = "sha256:de526c12914f0c550d15924c62d72abc48d6fe7364aa87328337a31007fe8a4f"},
-]
-
 [[package]]
 name = "tomlkit"
 version = "0.12.0"
@@ -5193,13 +5146,13 @@ telegram = ["requests"]

 [[package]]
 name = "transformers"
-version = "4.38.1"
+version = "4.38.2"
 description = "State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow"
 optional = false
 python-versions = ">=3.8.0"
 files = [
-    {file = "transformers-4.38.1-py3-none-any.whl", hash = "sha256:a7a9265fb060183e9d975cbbadc4d531b10281589c43f6d07563f86322728973"},
-    {file = "transformers-4.38.1.tar.gz", hash = "sha256:86dc84ccbe36123647e84cbd50fc31618c109a41e6be92514b064ab55bf1304c"},
+    {file = "transformers-4.38.2-py3-none-any.whl", hash = "sha256:c4029cb9f01b3dd335e52f364c52d2b37c65b4c78e02e6a08b1919c5c928573e"},
+    {file = "transformers-4.38.2.tar.gz", hash = "sha256:c5fc7ad682b8a50a48b2a4c05d4ea2de5567adb1bdd00053619dbe5960857dd5"},
 ]

 [package.dependencies]
@@ -5464,7 +5417,6 @@ h11 = ">=0.8"
 httptools = {version = ">=0.5.0", optional = true, markers = "extra == \"standard\""}
 python-dotenv = {version = ">=0.13", optional = true, markers = "extra == \"standard\""}
 pyyaml = {version = ">=5.1", optional = true, markers = "extra == \"standard\""}
-typing-extensions = {version = ">=4.0", markers = "python_version < \"3.11\""}
 uvloop = {version = ">=0.14.0,<0.15.0 || >0.15.0,<0.15.1 || >0.15.1", optional = true, markers = "(sys_platform != \"win32\" and sys_platform != \"cygwin\") and platform_python_implementation != \"PyPy\" and extra == \"standard\""}
 watchfiles = {version = ">=0.13", optional = true, markers = "extra == \"standard\""}
 websockets = {version = ">=10.4", optional = true, markers = "extra == \"standard\""}
@@ -5958,10 +5910,10 @@ testing = ["big-O", "jaraco.functools", "jaraco.itertools", "more-itertools", "p

 [extras]
 embeddings-huggingface = ["llama-index-embeddings-huggingface"]
+embeddings-ollama = ["llama-index-embeddings-ollama"]
 embeddings-openai = ["llama-index-embeddings-openai"]
 embeddings-sagemaker = ["boto3"]
 llms-llama-cpp = ["llama-index-llms-llama-cpp"]
-llms-nvidia-tensorrt = ["llama-index-llms-nvidia-tensorrt"]
 llms-ollama = ["llama-index-llms-ollama"]
 llms-openai = ["llama-index-llms-openai"]
 llms-openai-like = ["llama-index-llms-openai-like"]
@@ -5973,5 +5925,5 @@ vector-stores-qdrant = ["llama-index-vector-stores-qdrant"]

 [metadata]
 lock-version = "2.0"
-python-versions = ">=3.10,<3.12"
-content-hash = "39f0ac666402807cde29f763c14dfb6b2fc9862c0cd31de398c67a1fedbb4b12"
+python-versions = ">=3.11,<3.12"
+content-hash = "41849a9d15848a354fd4cc0ca9d752148e76fee64d8bb5b881210c2290fc8072"
--- a/private_gpt/components/embedding/embedding_component.py
+++ b/private_gpt/components/embedding/embedding_component.py
@@ -57,6 +57,21 @@ class EmbeddingComponent:

                openai_settings = settings.openai.api_key
                self.embedding_model = OpenAIEmbedding(api_key=openai_settings)
+            case "ollama":
+                try:
+                    from llama_index.embeddings.ollama import (  # type: ignore
+                        OllamaEmbedding,
+                    )
+                except ImportError as e:
+                    raise ImportError(
+                        "Local dependencies not found, install with `poetry install --extras embeddings-ollama`"
+                    ) from e
+
+                ollama_settings = settings.ollama
+                self.embedding_model = OllamaEmbedding(
+                    model_name=ollama_settings.embedding_model,
+                    base_url=ollama_settings.api_base,
+                )
            case "mock":
                # Not a random number, is the dimensionality used by
                # the default embedding model
--- a/private_gpt/components/llm/llm_component.py
+++ b/private_gpt/components/llm/llm_component.py
@@ -109,24 +109,7 @@ class LLMComponent:

                ollama_settings = settings.ollama
                self.llm = Ollama(
-                    model=ollama_settings.model, base_url=ollama_settings.api_base
-                )
-            case "tensorrt":
-                try:
-                    from llama_index.llms.nvidia_tensorrt import (  # type: ignore
-                        LocalTensorRTLLM,
-                    )
-                except ImportError as e:
-                    raise ImportError(
-                        "Nvidia TensorRTLLM dependencies not found, install with `poetry install --extras llms-nvidia-tensorrt`"
-                    ) from e
-
-                prompt_style = get_prompt_style(settings.tensorrt.prompt_style)
-                self.llm = LocalTensorRTLLM(
-                    model_path=settings.tensorrt.model_path,
-                    engine_name=settings.tensorrt.engine_name,
-                    tokenizer_dir=settings.llm.tokenizer,
-                    completion_to_prompt=prompt_style.completion_to_prompt,
+                    model=ollama_settings.llm_model, base_url=ollama_settings.api_base
                )
            case "mock":
                self.llm = MockLLM()
--- a/private_gpt/components/vector_store/vector_store_component.py
+++ b/private_gpt/components/vector_store/vector_store_component.py
@@ -3,7 +3,12 @@ import typing

 from injector import inject, singleton
 from llama_index.core.indices.vector_store import VectorIndexRetriever, VectorStoreIndex
-from llama_index.core.vector_stores.types import VectorStore
+from llama_index.core.vector_stores.types import (
+    FilterCondition,
+    MetadataFilter,
+    MetadataFilters,
+    VectorStore,
+)

 from private_gpt.open_ai.extensions.context_filter import ContextFilter
 from private_gpt.paths import local_data_path
@@ -12,31 +17,26 @@ from private_gpt.settings.settings import Settings
 logger = logging.getLogger(__name__)


-@typing.no_type_check
-def _chromadb_doc_id_metadata_filter(
+def _doc_id_metadata_filter(
    context_filter: ContextFilter | None,
-) -> dict | None:
-    if context_filter is None or context_filter.docs_ids is None:
-        return {}  # No filter
-    elif len(context_filter.docs_ids) < 1:
-        return {"doc_id": "-"}  # Effectively filtering out all docs
-    else:
-        doc_filter_items = []
-        if len(context_filter.docs_ids) > 1:
-            doc_filter = {"$or": doc_filter_items}
+) -> MetadataFilters:
+    filters = MetadataFilters(filters=[], condition=FilterCondition.OR)
+
+    if context_filter is not None and context_filter.docs_ids is not None:
        for doc_id in context_filter.docs_ids:
-                doc_filter_items.append({"doc_id": doc_id})
-        else:
-            doc_filter = {"doc_id": context_filter.docs_ids[0]}
-        return doc_filter
+            filters.filters.append(MetadataFilter(key="doc_id", value=doc_id))
+
+    return filters


@singleton
 class VectorStoreComponent:
+    settings: Settings
    vector_store: VectorStore

    @inject
    def __init__(self, settings: Settings) -> None:
+        self.settings = settings
        match settings.vectorstore.database:
            case "pgvector":
                try:
@@ -96,7 +96,7 @@ class VectorStoreComponent:
                    from llama_index.vector_stores.qdrant import (  # type: ignore
                        QdrantVectorStore,
                    )
-                    from qdrant_client import QdrantClient
+                    from qdrant_client import QdrantClient  # type: ignore
                except ImportError as e:
                    raise ImportError(
                        "Qdrant dependencies not found, install with `poetry install --extras vector-stores-qdrant`"
@@ -126,20 +126,20 @@ class VectorStoreComponent:
                    f"Vectorstore database {settings.vectorstore.database} not supported"
                )

-    @staticmethod
    def get_retriever(
+        self,
        index: VectorStoreIndex,
        context_filter: ContextFilter | None = None,
        similarity_top_k: int = 2,
    ) -> VectorIndexRetriever:
-        # This way we support qdrant (using doc_ids) and chroma (using where clause)
+        # This way we support qdrant (using doc_ids) and the rest (using filters)
        return VectorIndexRetriever(
            index=index,
            similarity_top_k=similarity_top_k,
            doc_ids=context_filter.docs_ids if context_filter else None,
-            vector_store_kwargs={
-                "where": _chromadb_doc_id_metadata_filter(context_filter)
-            },
+            filters=_doc_id_metadata_filter(context_filter)
+            if self.settings.vectorstore.database != "qdrant"
+            else None,
        )

    def close(self) -> None:
--- a/private_gpt/server/chat/chat_service.py
+++ b/private_gpt/server/chat/chat_service.py
@@ -102,18 +102,10 @@ class ChatService:
            vector_index_retriever = self.vector_store_component.get_retriever(
                index=self.index, context_filter=context_filter
            )
-            # TODO ContextChatEngine is still not migrated by LlamaIndex to accept
-            # llm directly, so we are passing legacy ServiceContext until it is fixed.
-            from llama_index.core import ServiceContext
-
            return ContextChatEngine.from_defaults(
                system_prompt=system_prompt,
                retriever=vector_index_retriever,
                llm=self.llm_component.llm,  # Takes no effect at the moment
-                service_context=ServiceContext.from_defaults(
-                    llm=self.llm_component.llm,
-                    embed_model=self.embedding_component.embedding_model,
-                ),
                node_postprocessors=[
                    MetadataReplacementPostProcessor(target_metadata_key="window"),
                ],
--- a/private_gpt/settings/settings.py
+++ b/private_gpt/settings/settings.py
@@ -81,9 +81,7 @@ class DataSettings(BaseModel):


 class LLMSettings(BaseModel):
-    mode: Literal[
-        "llamacpp", "openai", "openailike", "sagemaker", "mock", "ollama", "tensorrt"
-    ]
+    mode: Literal["llamacpp", "openai", "openailike", "sagemaker", "mock", "ollama"]
    max_new_tokens: int = Field(
        256,
        description="The maximum number of token that the LLM is authorized to generate in one completion.",
@@ -122,22 +120,6 @@ class LlamaCPPSettings(BaseModel):
    )


-class TensorRTSettings(BaseModel):
-    model_path: str
-    engine_name: str
-    prompt_style: Literal["default", "llama2", "tag", "mistral", "chatml"] = Field(
-        "llama2",
-        description=(
-            "The prompt style to use for the chat engine. "
-            "If `default` - use the default prompt style from the llama_index. It should look like `role: message`.\n"
-            "If `llama2` - use the llama2 prompt style from the llama_index. Based on `<s>`, `[INST]` and `<<SYS>>`.\n"
-            "If `tag` - use the `tag` prompt style. It should look like `<|role|>: message`. \n"
-            "If `mistral` - use the `mistral prompt style. It shoudl look like <s>[INST] {System Prompt} [/INST]</s>[INST] { UserInstructions } [/INST]"
-            "`llama2` is the historic behaviour. `default` might work better with your custom models."
-        ),
-    )
-
-
 class HuggingFaceSettings(BaseModel):
    embedding_hf_model_name: str = Field(
        description="Name of the HuggingFace model to use for embeddings"
@@ -145,7 +127,7 @@ class HuggingFaceSettings(BaseModel):


 class EmbeddingSettings(BaseModel):
-    mode: Literal["huggingface", "openai", "sagemaker", "mock"]
+    mode: Literal["huggingface", "openai", "sagemaker", "ollama", "mock"]
    ingest_mode: Literal["simple", "batch", "parallel"] = Field(
        "simple",
        description=(
@@ -194,10 +176,14 @@ class OllamaSettings(BaseModel):
        "http://localhost:11434",
        description="Base URL of Ollama API. Example: 'https://localhost:11434'.",
    )
-    model: str = Field(
+    llm_model: str = Field(
        None,
        description="Model to use. Example: 'llama2-uncensored'.",
    )
+    embedding_model: str = Field(
+        None,
+        description="Model to use. Example: 'nomic-embed-text'.",
+    )


 class UISettings(BaseModel):
@@ -314,7 +300,6 @@ class Settings(BaseModel):
    llm: LLMSettings
    embedding: EmbeddingSettings
    llamacpp: LlamaCPPSettings
-    tensorrt: TensorRTSettings
    huggingface: HuggingFaceSettings
    sagemaker: SagemakerSettings
    openai: OpenAISettings
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,32 +1,32 @@
 [tool.poetry]
 name = "private-gpt"
-version = "0.2.0"
+version = "0.4.0"
 description = "Private GPT"
 authors = ["Zylon <hi@zylon.ai>"]

 [tool.poetry.dependencies]
-python = ">=3.10,<3.12"
+python = ">=3.11,<3.12"
 # PrivateGPT
 fastapi = { extras = ["all"], version = "^0.110.0" }
 python-multipart = "^0.0.9"
 injector = "^0.21.0"
 pyyaml = "^6.0.1"
 watchdog = "^4.0.0"
-transformers = "^4.38.1"
+transformers = "^4.38.2"
 # LlamaIndex core libs
-llama-index-core = "^0.10.13"
+llama-index-core = "^0.10.14"
 llama-index-readers-file = "^0.1.6"
 # Optional LlamaIndex integration libs
 llama-index-llms-llama-cpp = {version = "^0.1.3", optional = true}
 llama-index-llms-openai = {version = "^0.1.6", optional = true}
 llama-index-llms-openai-like = {version ="^0.1.3", optional = true}
 llama-index-llms-ollama = {version ="^0.1.2", optional = true}
+llama-index-embeddings-ollama = {version ="^0.1.2", optional = true}
 llama-index-embeddings-huggingface = {version ="^0.1.4", optional = true}
 llama-index-embeddings-openai = {version ="^0.1.6", optional = true}
 llama-index-vector-stores-qdrant = {version ="^0.1.3", optional = true}
 llama-index-vector-stores-chroma = {version ="^0.1.4", optional = true}
 llama-index-vector-stores-postgres = {version ="^0.1.2", optional = true}
-llama-index-llms-nvidia-tensorrt = {version ="^0.1.2", optional = true}
 # Optional Sagemaker dependency
 boto3 = {version ="^1.34.51", optional = true}
 # Optional UI
@@ -39,7 +39,7 @@ llms-openai = ["llama-index-llms-openai"]
 llms-openai-like = ["llama-index-llms-openai-like"]
 llms-ollama = ["llama-index-llms-ollama"]
 llms-sagemaker = ["boto3"]
-llms-nvidia-tensorrt = ["llama-index-llms-nvidia-tensorrt"]
+embeddings-ollama = ["llama-index-embeddings-ollama"]
 embeddings-huggingface = ["llama-index-embeddings-huggingface"]
 embeddings-openai = ["llama-index-embeddings-openai"]
 embeddings-sagemaker = ["boto3"]
@@ -47,6 +47,7 @@ vector-stores-qdrant = ["llama-index-vector-stores-qdrant"]
 vector-stores-chroma = ["llama-index-vector-stores-chroma"]
 vector-stores-postgres = ["llama-index-vector-stores-postgres"]

+
 [tool.poetry.group.dev.dependencies]
 black = "^22"
 mypy = "^1.2"
--- a/settings-ollama.yaml
+++ b/settings-ollama.yaml
@@ -6,15 +6,13 @@ llm:
  max_new_tokens: 512
  context_window: 3900

-ollama:
-  model: llama2
-  api_base: http://localhost:11434
-
 embedding:
-  mode: huggingface
+  mode: ollama

-huggingface:
-  embedding_hf_model_name: BAAI/bge-small-en-v1.5
+ollama:
+  llm_model: mistral
+  embedding_model: nomic-embed-text
+  api_base: http://localhost:11434

 vectorstore:
  database: qdrant
--- a/settings-tensorrt.yaml
+++ b/settings-tensorrt.yaml
@@ -1,25 +0,0 @@
-server:
-  env_name: ${APP_ENV:tensorrt}
-
-llm:
-  mode: tensorrt
-  max_new_tokens: 512
-  context_window: 3900
-
-tensorrt:
-  model_path: models/tensorrt
-  engine_name: llama_float16_tp1_rank0.engine
-  prompt_style: "llama2"
-
-embedding:
-  mode: huggingface
-
-huggingface:
-  embedding_hf_model_name: BAAI/bge-small-en-v1.5
-
-vectorstore:
-  database: qdrant
-
-qdrant:
-  path: local_data/private_gpt/qdrant
-
--- a/settings.yaml
+++ b/settings.yaml
@@ -78,9 +78,6 @@ openai:
  model: gpt-3.5-turbo

 ollama:
-  model: llama2-uncensored
-
-tensorrt:
-  model_path: models/tensorrt
-  engine_name: llama_float16_tp1_rank0.engine
-  prompt_style: "llama2"
+  llm_model: llama2
+  embedding_model: nomic-embed-text
+  api_base: http://localhost:11434
--- a/version.txt
+++ b/version.txt
@@ -1 +1 @@
-0.3.0
+0.4.0
@@ -1 +1 @@
 .3.0
 .4.0