Compare commits

...

21 Commits
v0.6.0 ... main

Author SHA1 Message Date
Iván Martínez
b7ee43788d
Update README.md 2024-11-13 20:29:56 +01:00
meng-hui
940bdd49af
fix: 503 when private gpt gets ollama service (#2104)
When running private gpt with external ollama API, ollama service
returns 503 on startup because ollama service (traefik) might not be
ready.

- Add healthcheck to ollama service to test for connection to external
ollama
- private-gpt-ollama service depends on ollama being service_healthy

Co-authored-by: Koh Meng Hui <kohmh@duck.com>
2024-10-17 12:44:28 +02:00
Javier Martinez
5851b02378
feat: update llama-index + dependencies (#2092)
* chore: update libraries

* fix: mypy

* chore: more updates

* fix: mypy/black

* chore: fix docker warnings

* fix: mypy

* fix: black
2024-09-26 16:29:52 +02:00
Dmitri Qiu
5fbb402477
fix: Sanitize null bytes before ingestion (#2090)
* Sanitize null bytes before ingestion

* Added comments
2024-09-25 12:00:03 +02:00
J
fa3c30661d
fix: Add default mode option to settings (#2078)
* Add default mode option to settings

* Revise default_mode to Literal (enum) and add to settings.yaml

* Revise to pass make check/test

* Default mode: RAG

---------

Co-authored-by: Jason <jason@sowinsight.solutions>
2024-09-24 08:33:02 +02:00
Liam Dowd
f9182b3a86
feat: Adding MistralAI mode (#2065)
* Adding MistralAI mode

* Update embedding_component.py

* Update ui.py

* Update settings.py

* Update embedding_component.py

* Update settings.py

* Update settings.py

* Update settings-mistral.yaml

* Update llm_component.py

* Update settings-mistral.yaml

* Update settings.py

* Update settings.py

* Update ui.py

* Update embedding_component.py

* Delete settings-mistral.yaml

---------

Co-authored-by: SkiingIsFun123 <101684827+SkiingIsFun123@users.noreply.github.com>
Co-authored-by: Javier Martinez <javiermartinezalvarez98@gmail.com>
2024-09-24 08:31:30 +02:00
Javier Martinez
8c12c6830b
fix: docker permissions (#2059)
* fix: missing depends_on

* chore: update copy permissions

* chore: update entrypoint

* Revert "chore: update entrypoint"

This reverts commit f73a36af2f.

* Revert "chore: update copy permissions"

This reverts commit fabc3f66bb.

* style: fix docker warning

* fix: multiples fixes

* fix: user permissions writing local_data folder
2024-09-24 08:30:58 +02:00
Javier Martinez
77461b96cf
feat: add retry connection to ollama (#2084)
* feat: add retry connection to ollama

When Ollama is running in the docker-compose, traefik is not ready sometimes to route the request, and it fails

* fix: mypy
2024-09-16 16:43:05 +02:00
Trivikram Kamat
42628596b2
ci: bump actions/checkout to v4 (#2077) 2024-09-09 08:53:13 +02:00
Artur Martins
7603b3627d
fix: Rectify ffmpy poetry config; update version from 0.3.2 to 0.4.0 (#2062)
* Fix: Rectify ffmpy 0.3.2 poetry config

* keep optional set to false for ffmpy

* Updating ffmpy to version 0.4.0

* Remove comment about a fix
2024-08-21 10:39:58 +02:00
Javier Martinez
89477ea9d3
fix: naming image and ollama-cpu (#2056) 2024-08-12 08:23:16 +02:00
github-actions[bot]
22904ca8ad
chore(main): release 0.6.2 (#2049)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-08-08 18:16:41 +02:00
Javier Martinez
7fefe408b4
fix: auto-update version (#2052) 2024-08-08 16:50:42 +02:00
Javier Martinez
b1acf9dc2c
fix: publish image name (#2043) 2024-08-07 17:39:32 +02:00
Javier Martinez
4ca6d0cb55
fix: add numpy issue to troubleshooting (#2048)
* docs: add numpy issue to troubleshooting

* fix: troubleshooting link

...
2024-08-07 12:16:03 +02:00
Javier Martinez
b16abbefe4
fix: update matplotlib to 3.9.1-post1 to fix win install
* chore: block matplotlib to fix installation in window machines

* chore: remove workaround, just update poetry.lock

* fix: update matplotlib to last version
2024-08-07 11:26:42 +02:00
github-actions[bot]
ca2b8da69c
chore(main): release 0.6.1 (#2041)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-08-05 17:17:34 +02:00
Javier Martinez
f09f6dd255
fix: add built image from DockerHub (#2042)
* chore: update docker-compose with profiles

* docs: add quick start doc

* chore: generate docker release when new version is released

* chore: add dockerhub image in docker-compose

* docs: update quickstart with local/remote images

* chore: update docker tag

* chore: refactor dockerfile names

* chore: update docker-compose names

* docs: update llamacpp naming

* fix: naming

* docs: fix llamacpp command
2024-08-05 17:15:38 +02:00
Liam Dowd
1c665f7900
fix: Adding azopenai to model list (#2035)
Fixing the error I encountered while using the azopenai mode
2024-08-05 16:30:10 +02:00
Javier Martinez
1d4c14d7a3
fix(deploy): generate docker release when new version is released (#2038) 2024-08-05 16:28:19 +02:00
Javier Martinez
dae0727a1b
fix(deploy): improve Docker-Compose and quickstart on Docker (#2037)
* chore: update docker-compose with profiles

* docs: add quick start doc
2024-08-05 16:28:19 +02:00
37 changed files with 3338 additions and 2505 deletions

16
.docker/router.yml Normal file
View File

@ -0,0 +1,16 @@
http:
services:
ollama:
loadBalancer:
healthCheck:
interval: 5s
path: /
servers:
- url: http://ollama-cpu:11434
- url: http://ollama-cuda:11434
- url: http://host.docker.internal:11434
routers:
ollama-router:
rule: "PathPrefix(`/`)"
service: ollama

View File

@ -0,0 +1,19 @@
{
"$schema": "https://raw.githubusercontent.com/googleapis/release-please/main/schemas/config.json",
"release-type": "simple",
"version-file": "version.txt",
"extra-files": [
{
"type": "toml",
"path": "pyproject.toml",
"jsonpath": "$.tool.poetry.version"
},
{
"type": "generic",
"path": "docker-compose.yaml"
}
],
"packages": {
".": {}
}
}

View File

@ -0,0 +1,3 @@
{
".": "0.6.2"
}

View File

@ -1,45 +0,0 @@
name: docker
on:
release:
types: [ published ]
workflow_dispatch:
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Log in to the Container registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=sha
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
file: Dockerfile.external
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

83
.github/workflows/generate-release.yml vendored Normal file
View File

@ -0,0 +1,83 @@
name: generate-release
on:
release:
types: [ published ]
workflow_dispatch:
env:
REGISTRY: docker.io
IMAGE_NAME: zylonai/private-gpt
platforms: linux/amd64,linux/arm64
DEFAULT_TYPE: "ollama"
jobs:
build-and-push-image:
runs-on: ubuntu-latest
strategy:
matrix:
type: [ llamacpp-cpu, ollama ]
permissions:
contents: read
packages: write
outputs:
version: ${{ steps.version.outputs.version }}
steps:
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/free-disk-space@main
with:
tool-cache: false
android: true
dotnet: true
haskell: true
large-packages: true
docker-images: false
swap-storage: true
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=semver,pattern={{version}},enable=${{ matrix.type == env.DEFAULT_TYPE }}
type=semver,pattern={{version}}-${{ matrix.type }}
type=semver,pattern={{major}}.{{minor}},enable=${{ matrix.type == env.DEFAULT_TYPE }}
type=semver,pattern={{major}}.{{minor}}-${{ matrix.type }}
type=raw,value=latest,enable=${{ matrix.type == env.DEFAULT_TYPE }}
type=sha
flavor: |
latest=false
- name: Build and push Docker image
uses: docker/build-push-action@v6
with:
context: .
file: Dockerfile.${{ matrix.type }}
platforms: ${{ env.platforms }}
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
- name: Version output
id: version
run: echo "version=${{ steps.meta.outputs.version }}" >> "$GITHUB_OUTPUT"

View File

@ -13,7 +13,8 @@ jobs:
release-please:
runs-on: ubuntu-latest
steps:
- uses: google-github-actions/release-please-action@v3
- uses: google-github-actions/release-please-action@v4
id: release
with:
release-type: simple
version-file: version.txt
config-file: .github/release_please/.release-please-config.json
manifest-file: .github/release_please/.release-please-manifest.json

View File

@ -14,7 +14,7 @@ jobs:
setup:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- uses: ./.github/workflows/actions/install_dependencies
checks:
@ -28,7 +28,7 @@ jobs:
- ruff
- mypy
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- uses: ./.github/workflows/actions/install_dependencies
- name: run ${{ matrix.quality-command }}
run: make ${{ matrix.quality-command }}
@ -38,7 +38,7 @@ jobs:
runs-on: ubuntu-latest
name: test
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- uses: ./.github/workflows/actions/install_dependencies
- name: run test
run: make test-coverage

View File

@ -1,5 +1,25 @@
# Changelog
## [0.6.2](https://github.com/zylon-ai/private-gpt/compare/v0.6.1...v0.6.2) (2024-08-08)
### Bug Fixes
* add numpy issue to troubleshooting ([#2048](https://github.com/zylon-ai/private-gpt/issues/2048)) ([4ca6d0c](https://github.com/zylon-ai/private-gpt/commit/4ca6d0cb556be7a598f7d3e3b00d2a29214ee1e8))
* auto-update version ([#2052](https://github.com/zylon-ai/private-gpt/issues/2052)) ([7fefe40](https://github.com/zylon-ai/private-gpt/commit/7fefe408b4267684c6e3c1a43c5dc2b73ec61fe4))
* publish image name ([#2043](https://github.com/zylon-ai/private-gpt/issues/2043)) ([b1acf9d](https://github.com/zylon-ai/private-gpt/commit/b1acf9dc2cbca2047cd0087f13254ff5cda6e570))
* update matplotlib to 3.9.1-post1 to fix win install ([b16abbe](https://github.com/zylon-ai/private-gpt/commit/b16abbefe49527ac038d235659854b98345d5387))
## [0.6.1](https://github.com/zylon-ai/private-gpt/compare/v0.6.0...v0.6.1) (2024-08-05)
### Bug Fixes
* add built image from DockerHub ([#2042](https://github.com/zylon-ai/private-gpt/issues/2042)) ([f09f6dd](https://github.com/zylon-ai/private-gpt/commit/f09f6dd2553077d4566dbe6b48a450e05c2f049e))
* Adding azopenai to model list ([#2035](https://github.com/zylon-ai/private-gpt/issues/2035)) ([1c665f7](https://github.com/zylon-ai/private-gpt/commit/1c665f7900658144f62814b51f6e3434a6d7377f))
* **deploy:** generate docker release when new version is released ([#2038](https://github.com/zylon-ai/private-gpt/issues/2038)) ([1d4c14d](https://github.com/zylon-ai/private-gpt/commit/1d4c14d7a3c383c874b323d934be01afbaca899e))
* **deploy:** improve Docker-Compose and quickstart on Docker ([#2037](https://github.com/zylon-ai/private-gpt/issues/2037)) ([dae0727](https://github.com/zylon-ai/private-gpt/commit/dae0727a1b4abd35d2b0851fe30e0a4ed67e0fbb))
## [0.6.0](https://github.com/zylon-ai/private-gpt/compare/v0.5.0...v0.6.0) (2024-08-02)

View File

@ -1,6 +1,6 @@
### IMPORTANT, THIS IMAGE CAN ONLY BE RUN IN LINUX DOCKER
### You will run into a segfault in mac
FROM python:3.11.6-slim-bookworm as base
FROM python:3.11.6-slim-bookworm AS base
# Install poetry
RUN pip install pipx
@ -20,14 +20,14 @@ RUN apt update && apt install -y \
# https://python-poetry.org/docs/configuration/#virtualenvsin-project
ENV POETRY_VIRTUALENVS_IN_PROJECT=true
FROM base as dependencies
FROM base AS dependencies
WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./
ARG POETRY_EXTRAS="ui embeddings-huggingface llms-llama-cpp vector-stores-qdrant"
RUN poetry install --no-root --extras "${POETRY_EXTRAS}"
FROM base as app
FROM base AS app
ENV PYTHONUNBUFFERED=1
ENV PORT=8080

View File

@ -1,4 +1,4 @@
FROM python:3.11.6-slim-bookworm as base
FROM python:3.11.6-slim-bookworm AS base
# Install poetry
RUN pip install pipx
@ -10,14 +10,14 @@ ENV PATH=".venv/bin/:$PATH"
# https://python-poetry.org/docs/configuration/#virtualenvsin-project
ENV POETRY_VIRTUALENVS_IN_PROJECT=true
FROM base as dependencies
FROM base AS dependencies
WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./
ARG POETRY_EXTRAS="ui vector-stores-qdrant llms-ollama embeddings-ollama"
RUN poetry install --no-root --extras "${POETRY_EXTRAS}"
FROM base as app
FROM base AS app
ENV PYTHONUNBUFFERED=1
ENV PORT=8080
ENV APP_ENV=prod

View File

@ -1,4 +1,6 @@
# 🔒 PrivateGPT 📑
# PrivateGPT
<a href="https://trendshift.io/repositories/2601" target="_blank"><img src="https://trendshift.io/api/badge/repositories/2601" alt="imartinez%2FprivateGPT | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
[![Tests](https://github.com/zylon-ai/private-gpt/actions/workflows/tests.yml/badge.svg)](https://github.com/zylon-ai/private-gpt/actions/workflows/tests.yml?query=branch%3Amain)
[![Website](https://img.shields.io/website?up_message=check%20it&down_message=down&url=https%3A%2F%2Fdocs.privategpt.dev%2F&label=Documentation)](https://docs.privategpt.dev/)

View File

@ -1,19 +1,116 @@
services:
private-gpt:
#-----------------------------------
#---- Private-GPT services ---------
#-----------------------------------
# Private-GPT service for the Ollama CPU and GPU modes
# This service builds from an external Dockerfile and runs the Ollama mode.
private-gpt-ollama:
image: ${PGPT_IMAGE:-zylonai/private-gpt}:${PGPT_TAG:-0.6.2}-ollama # x-release-please-version
user: root
build:
dockerfile: Dockerfile.external
context: .
dockerfile: Dockerfile.ollama
volumes:
- ./local_data/:/home/worker/app/local_data
- ./local_data:/home/worker/app/local_data
ports:
- 8001:8001
- "8001:8001"
environment:
PORT: 8001
PGPT_PROFILES: docker
PGPT_MODE: ollama
PGPT_EMBED_MODE: ollama
PGPT_OLLAMA_API_BASE: http://ollama:11434
HF_TOKEN: ${HF_TOKEN:-}
profiles:
- ""
- ollama-cpu
- ollama-cuda
- ollama-api
depends_on:
ollama:
condition: service_healthy
# Private-GPT service for the local mode
# This service builds from a local Dockerfile and runs the application in local mode.
private-gpt-llamacpp-cpu:
image: ${PGPT_IMAGE:-zylonai/private-gpt}:${PGPT_TAG:-0.6.2}-llamacpp-cpu # x-release-please-version
user: root
build:
context: .
dockerfile: Dockerfile.llamacpp-cpu
volumes:
- ./local_data/:/home/worker/app/local_data
- ./models/:/home/worker/app/models
entrypoint: sh -c ".venv/bin/python scripts/setup && .venv/bin/python -m private_gpt"
ports:
- "8001:8001"
environment:
PORT: 8001
PGPT_PROFILES: local
HF_TOKEN: ${HF_TOKEN:-}
profiles:
- llamacpp-cpu
#-----------------------------------
#---- Ollama services --------------
#-----------------------------------
# Traefik reverse proxy for the Ollama service
# This will route requests to the Ollama service based on the profile.
ollama:
image: traefik:v2.10
healthcheck:
test: ["CMD", "sh", "-c", "wget -q --spider http://ollama:11434 || exit 1"]
interval: 10s
retries: 3
start_period: 5s
timeout: 5s
ports:
- "8080:8080"
command:
- "--providers.file.filename=/etc/router.yml"
- "--log.level=ERROR"
- "--api.insecure=true"
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--entrypoints.web.address=:11434"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./.docker/router.yml:/etc/router.yml:ro
extra_hosts:
- "host.docker.internal:host-gateway"
profiles:
- ""
- ollama-cpu
- ollama-cuda
- ollama-api
# Ollama service for the CPU mode
ollama-cpu:
image: ollama/ollama:latest
ports:
- 11434:11434
- "11434:11434"
volumes:
- ./models:/root/.ollama
profiles:
- ""
- ollama-cpu
# Ollama service for the CUDA mode
ollama-cuda:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ./models:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
profiles:
- ollama-cuda

View File

@ -10,6 +10,9 @@ tabs:
overview:
display-name: Overview
icon: "fa-solid fa-home"
quickstart:
display-name: Quickstart
icon: "fa-solid fa-rocket"
installation:
display-name: Installation
icon: "fa-solid fa-download"
@ -32,6 +35,12 @@ navigation:
contents:
- page: Introduction
path: ./docs/pages/overview/welcome.mdx
- tab: quickstart
layout:
- section: Getting started
contents:
- page: Quickstart
path: ./docs/pages/quickstart/quickstart.mdx
# How to install PrivateGPT, with FAQ and troubleshooting
- tab: installation
layout:

View File

@ -307,11 +307,12 @@ If you have all required dependencies properly configured running the
following powershell command should succeed.
```powershell
$env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
$env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python numpy==1.26.0
```
If your installation was correct, you should see a message similar to the following next
time you start the server `BLAS = 1`.
time you start the server `BLAS = 1`. If there is some issue, please refer to the
[troubleshooting](/installation/getting-started/troubleshooting#building-llama-cpp-with-nvidia-gpu-support) section.
```console
llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
@ -339,11 +340,12 @@ Some tips:
After that running the following command in the repository will install llama.cpp with GPU support:
```bash
CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python numpy==1.26.0
```
If your installation was correct, you should see a message similar to the following next
time you start the server `BLAS = 1`.
time you start the server `BLAS = 1`. If there is some issue, please refer to the
[troubleshooting](/installation/getting-started/troubleshooting#building-llama-cpp-with-nvidia-gpu-support) section.
```
llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)

View File

@ -46,4 +46,19 @@ huggingface:
embedding:
embed_dim: 384
```
</Callout>
</Callout>
# Building Llama-cpp with NVIDIA GPU support
## Out-of-memory error
If you encounter an out-of-memory error while running `llama-cpp` with CUDA, you can try the following steps to resolve the issue:
1. **Set the next environment:**
```bash
TOKENIZERS_PARALLELISM=true
```
2. **Run PrivateGPT:**
```bash
poetry run python -m privategpt
```
Give thanks to [MarioRossiGithub](https://github.com/MarioRossiGithub) for providing the following solution.

View File

@ -0,0 +1,105 @@
This guide provides a quick start for running different profiles of PrivateGPT using Docker Compose.
The profiles cater to various environments, including Ollama setups (CPU, CUDA, MacOS), and a fully local setup.
By default, Docker Compose will download pre-built images from a remote registry when starting the services. However, you have the option to build the images locally if needed. Details on building Docker image locally are provided at the end of this guide.
If you want to run PrivateGPT locally without Docker, refer to the [Local Installation Guide](/installation).
## Prerequisites
- **Docker and Docker Compose:** Ensure both are installed on your system.
[Installation Guide for Docker](https://docs.docker.com/get-docker/), [Installation Guide for Docker Compose](https://docs.docker.com/compose/install/).
- **Clone PrivateGPT Repository:** Clone the PrivateGPT repository to your machine and navigate to the directory:
```sh
git clone https://github.com/zylon-ai/private-gpt.git
cd private-gpt
```
## Setups
### Ollama Setups (Recommended)
#### 1. Default/Ollama CPU
**Description:**
This profile runs the Ollama service using CPU resources. It is the standard configuration for running Ollama-based Private-GPT services without GPU acceleration.
**Run:**
To start the services using pre-built images, run:
```sh
docker-compose up
```
or with a specific profile:
```sh
docker-compose --profile ollama-cpu up
```
#### 2. Ollama Nvidia CUDA
**Description:**
This profile leverages GPU acceleration with CUDA support, suitable for computationally intensive tasks that benefit from GPU resources.
**Requirements:**
Ensure that your system has compatible GPU hardware and the necessary NVIDIA drivers installed. The installation process is detailed [here](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html).
**Run:**
To start the services with CUDA support using pre-built images, run:
```sh
docker-compose --profile ollama-cuda up
```
#### 3. Ollama External API
**Description:**
This profile is designed for running PrivateGPT using Ollama installed on the host machine. This setup is particularly useful for MacOS users, as Docker does not yet support Metal GPU.
**Requirements:**
Install Ollama on your machine by following the instructions at [ollama.ai](https://ollama.ai/).
**Run:**
To start the Ollama service, use:
```sh
OLLAMA_HOST=0.0.0.0 ollama serve
```
To start the services with the host configuration using pre-built images, run:
```sh
docker-compose --profile ollama-api up
```
### Fully Local Setups
#### 1. LlamaCPP CPU
**Description:**
This profile runs the Private-GPT services locally using `llama-cpp` and Hugging Face models.
**Requirements:**
A **Hugging Face Token (HF_TOKEN)** is required for accessing Hugging Face models. Obtain your token following [this guide](/installation/getting-started/troubleshooting#downloading-gated-and-private-models).
**Run:**
Start the services with your Hugging Face token using pre-built images:
```sh
HF_TOKEN=<your_hf_token> docker-compose --profile llamacpp-cpu up
```
Replace `<your_hf_token>` with your actual Hugging Face token.
## Building Locally
If you prefer to build Docker images locally, which is useful when making changes to the codebase or the Dockerfiles, follow these steps:
### Building Locally
To build the Docker images locally, navigate to the cloned repository directory and run:
```sh
docker-compose build
```
This command compiles the necessary Docker images based on the current codebase and Dockerfile configurations.
### Forcing a Rebuild with --build
If you have made changes and need to ensure these changes are reflected in the Docker images, you can force a rebuild before starting the services:
```sh
docker-compose up --build
```
or with a specific profile:
```sh
docker-compose --profile <profile_name> up --build
```
Replace `<profile_name>` with the desired profile.

5072
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -144,6 +144,23 @@ class EmbeddingComponent:
api_key=settings.gemini.api_key,
model_name=settings.gemini.embedding_model,
)
case "mistralai":
try:
from llama_index.embeddings.mistralai import ( # type: ignore
MistralAIEmbedding,
)
except ImportError as e:
raise ImportError(
"Mistral dependencies not found, install with `poetry install --extras embeddings-mistral`"
) from e
api_key = settings.openai.api_key
model = settings.openai.embedding_model
self.embedding_model = MistralAIEmbedding(
api_key=api_key,
model=model,
)
case "mock":
# Not a random number, is the dimensionality used by
# the default embedding model

View File

@ -403,7 +403,7 @@ class PipelineIngestComponent(BaseIngestComponentWithIndex):
self.transformations,
show_progress=self.show_progress,
)
self.node_q.put(("process", file_name, documents, nodes))
self.node_q.put(("process", file_name, documents, list(nodes)))
finally:
self.doc_semaphore.release()
self.doc_q.task_done() # unblock Q joins

View File

@ -92,7 +92,13 @@ class IngestionHelper:
return string_reader.load_data([file_data.read_text()])
logger.debug("Specific reader found for extension=%s", extension)
return reader_cls().load_data(file_data)
documents = reader_cls().load_data(file_data)
# Sanitize NUL bytes in text which can't be stored in Postgres
for i in range(len(documents)):
documents[i].text = documents[i].text.replace("\u0000", "")
return documents
@staticmethod
def _exclude_metadata(documents: list[Document]) -> None:

View File

@ -120,7 +120,6 @@ class LLMComponent:
api_version="",
temperature=settings.llm.temperature,
context_window=settings.llm.context_window,
max_new_tokens=settings.llm.max_new_tokens,
messages_to_prompt=prompt_style.messages_to_prompt,
completion_to_prompt=prompt_style.completion_to_prompt,
tokenizer=settings.llm.tokenizer,
@ -184,10 +183,10 @@ class LLMComponent:
return wrapper
Ollama.chat = add_keep_alive(Ollama.chat)
Ollama.stream_chat = add_keep_alive(Ollama.stream_chat)
Ollama.complete = add_keep_alive(Ollama.complete)
Ollama.stream_complete = add_keep_alive(Ollama.stream_complete)
Ollama.chat = add_keep_alive(Ollama.chat) # type: ignore
Ollama.stream_chat = add_keep_alive(Ollama.stream_chat) # type: ignore
Ollama.complete = add_keep_alive(Ollama.complete) # type: ignore
Ollama.stream_complete = add_keep_alive(Ollama.stream_complete) # type: ignore
self.llm = llm

View File

@ -40,7 +40,8 @@ class AbstractPromptStyle(abc.ABC):
logger.debug("Got for messages='%s' the prompt='%s'", messages, prompt)
return prompt
def completion_to_prompt(self, completion: str) -> str:
def completion_to_prompt(self, prompt: str) -> str:
completion = prompt # Fix: Llama-index parameter has to be named as prompt
prompt = self._completion_to_prompt(completion)
logger.debug("Got for completion='%s' the prompt='%s'", completion, prompt)
return prompt
@ -285,8 +286,9 @@ class ChatMLPromptStyle(AbstractPromptStyle):
def get_prompt_style(
prompt_style: Literal["default", "llama2", "llama3", "tag", "mistral", "chatml"]
| None
prompt_style: (
Literal["default", "llama2", "llama3", "tag", "mistral", "chatml"] | None
)
) -> AbstractPromptStyle:
"""Get the prompt style to use from the given string.

View File

@ -38,10 +38,10 @@ class NodeStoreComponent:
case "postgres":
try:
from llama_index.core.storage.docstore.postgres_docstore import (
from llama_index.storage.docstore.postgres import ( # type: ignore
PostgresDocumentStore,
)
from llama_index.core.storage.index_store.postgres_index_store import (
from llama_index.storage.index_store.postgres import ( # type: ignore
PostgresIndexStore,
)
except ImportError:
@ -55,6 +55,7 @@ class NodeStoreComponent:
self.index_store = PostgresIndexStore.from_params(
**settings.postgres.model_dump(exclude_none=True)
)
self.doc_store = PostgresDocumentStore.from_params(
**settings.postgres.model_dump(exclude_none=True)
)

View File

@ -1,14 +1,17 @@
from collections.abc import Generator
from typing import Any
from collections.abc import Generator, Sequence
from typing import TYPE_CHECKING, Any
from llama_index.core.schema import BaseNode, MetadataMode
from llama_index.core.vector_stores.utils import node_to_metadata_dict
from llama_index.vector_stores.chroma import ChromaVectorStore # type: ignore
if TYPE_CHECKING:
from collections.abc import Mapping
def chunk_list(
lst: list[BaseNode], max_chunk_size: int
) -> Generator[list[BaseNode], None, None]:
lst: Sequence[BaseNode], max_chunk_size: int
) -> Generator[Sequence[BaseNode], None, None]:
"""Yield successive max_chunk_size-sized chunks from lst.
Args:
@ -60,7 +63,7 @@ class BatchedChromaVectorStore(ChromaVectorStore): # type: ignore
)
self.chroma_client = chroma_client
def add(self, nodes: list[BaseNode], **add_kwargs: Any) -> list[str]:
def add(self, nodes: Sequence[BaseNode], **add_kwargs: Any) -> list[str]:
"""Add nodes to index, batching the insertion to avoid issues.
Args:
@ -78,8 +81,8 @@ class BatchedChromaVectorStore(ChromaVectorStore): # type: ignore
all_ids = []
for node_chunk in node_chunks:
embeddings = []
metadatas = []
embeddings: list[Sequence[float]] = []
metadatas: list[Mapping[str, Any]] = []
ids = []
documents = []
for node in node_chunk:

View File

@ -1,4 +1,5 @@
from dataclasses import dataclass
from typing import TYPE_CHECKING
from injector import inject, singleton
from llama_index.core.chat_engine import ContextChatEngine, SimpleChatEngine
@ -26,6 +27,9 @@ from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.server.chunks.chunks_service import Chunk
from private_gpt.settings.settings import Settings
if TYPE_CHECKING:
from llama_index.core.postprocessor.types import BaseNodePostprocessor
class Completion(BaseModel):
response: str
@ -114,12 +118,15 @@ class ChatService:
context_filter=context_filter,
similarity_top_k=self.settings.rag.similarity_top_k,
)
node_postprocessors = [
node_postprocessors: list[BaseNodePostprocessor] = [
MetadataReplacementPostProcessor(target_metadata_key="window"),
SimilarityPostprocessor(
similarity_cutoff=settings.rag.similarity_value
),
]
if settings.rag.similarity_value:
node_postprocessors.append(
SimilarityPostprocessor(
similarity_cutoff=settings.rag.similarity_value
)
)
if settings.rag.rerank.enabled:
rerank_postprocessor = SentenceTransformerRerank(

View File

@ -90,9 +90,9 @@ class SummarizeService:
# Add context documents to summarize
if use_context:
# 1. Recover all ref docs
ref_docs: dict[
str, RefDocInfo
] | None = self.storage_context.docstore.get_all_ref_doc_info()
ref_docs: dict[str, RefDocInfo] | None = (
self.storage_context.docstore.get_all_ref_doc_info()
)
if ref_docs is None:
raise ValueError("No documents have been ingested yet.")

View File

@ -136,19 +136,19 @@ class LLMSettings(BaseModel):
0.1,
description="The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual.",
)
prompt_style: Literal[
"default", "llama2", "llama3", "tag", "mistral", "chatml"
] = Field(
"llama2",
description=(
"The prompt style to use for the chat engine. "
"If `default` - use the default prompt style from the llama_index. It should look like `role: message`.\n"
"If `llama2` - use the llama2 prompt style from the llama_index. Based on `<s>`, `[INST]` and `<<SYS>>`.\n"
"If `llama3` - use the llama3 prompt style from the llama_index."
"If `tag` - use the `tag` prompt style. It should look like `<|role|>: message`. \n"
"If `mistral` - use the `mistral prompt style. It shoudl look like <s>[INST] {System Prompt} [/INST]</s>[INST] { UserInstructions } [/INST]"
"`llama2` is the historic behaviour. `default` might work better with your custom models."
),
prompt_style: Literal["default", "llama2", "llama3", "tag", "mistral", "chatml"] = (
Field(
"llama2",
description=(
"The prompt style to use for the chat engine. "
"If `default` - use the default prompt style from the llama_index. It should look like `role: message`.\n"
"If `llama2` - use the llama2 prompt style from the llama_index. Based on `<s>`, `[INST]` and `<<SYS>>`.\n"
"If `llama3` - use the llama3 prompt style from the llama_index."
"If `tag` - use the `tag` prompt style. It should look like `<|role|>: message`. \n"
"If `mistral` - use the `mistral prompt style. It shoudl look like <s>[INST] {System Prompt} [/INST]</s>[INST] { UserInstructions } [/INST]"
"`llama2` is the historic behaviour. `default` might work better with your custom models."
),
)
)
@ -197,7 +197,14 @@ class HuggingFaceSettings(BaseModel):
class EmbeddingSettings(BaseModel):
mode: Literal[
"huggingface", "openai", "azopenai", "sagemaker", "ollama", "mock", "gemini"
"huggingface",
"openai",
"azopenai",
"sagemaker",
"ollama",
"mock",
"gemini",
"mistralai",
]
ingest_mode: Literal["simple", "batch", "parallel", "pipeline"] = Field(
"simple",
@ -350,6 +357,10 @@ class AzureOpenAISettings(BaseModel):
class UISettings(BaseModel):
enabled: bool
path: str
default_mode: Literal["RAG", "Search", "Basic", "Summarize"] = Field(
"RAG",
description="The default mode.",
)
default_chat_system_prompt: str = Field(
None,
description="The default system prompt to use for the chat mode.",

View File

@ -1,4 +1,5 @@
"""This file should be imported if and only if you want to run the UI locally."""
import base64
import logging
import time
@ -99,8 +100,11 @@ class PrivateGptUi:
self._selected_filename = None
# Initialize system prompt based on default mode
self.mode = MODES[0]
self._system_prompt = self._get_default_system_prompt(self.mode)
default_mode_map = {mode.value: mode for mode in Modes}
self._default_mode = default_mode_map.get(
settings().ui.default_mode, Modes.RAG_MODE
)
self._system_prompt = self._get_default_system_prompt(self._default_mode)
def _chat(
self, message: str, history: list[list[str]], mode: Modes, *_: Any
@ -390,7 +394,7 @@ class PrivateGptUi:
with gr.Row(equal_height=False):
with gr.Column(scale=3):
default_mode = MODES[0]
default_mode = self._default_mode
mode = gr.Radio(
[mode.value for mode in MODES],
label="Mode",
@ -519,6 +523,7 @@ class PrivateGptUi:
"llamacpp": config_settings.llamacpp.llm_hf_model_file,
"openai": config_settings.openai.model,
"openailike": config_settings.openai.model,
"azopenai": config_settings.azopenai.llm_model,
"sagemaker": config_settings.sagemaker.llm_endpoint_name,
"mock": llm_mode,
"ollama": config_settings.ollama.llm_model,

View File

@ -3,10 +3,13 @@ from collections import deque
from collections.abc import Iterator, Mapping
from typing import Any
from httpx import ConnectError
from tqdm import tqdm # type: ignore
from private_gpt.utils.retry import retry
try:
from ollama import Client # type: ignore
from ollama import Client, ResponseError # type: ignore
except ImportError as e:
raise ImportError(
"Ollama dependencies not found, install with `poetry install --extras llms-ollama or embeddings-ollama`"
@ -14,13 +17,25 @@ except ImportError as e:
logger = logging.getLogger(__name__)
_MAX_RETRIES = 5
_JITTER = (3.0, 10.0)
@retry(
is_async=False,
exceptions=(ConnectError, ResponseError),
tries=_MAX_RETRIES,
jitter=_JITTER,
logger=logger,
)
def check_connection(client: Client) -> bool:
try:
client.list()
return True
except (ConnectError, ResponseError) as e:
raise e
except Exception as e:
logger.error(f"Failed to connect to Ollama: {e!s}")
logger.error(f"Failed to connect to Ollama: {type(e).__name__}: {e!s}")
return False

View File

@ -0,0 +1,31 @@
import logging
from collections.abc import Callable
from typing import Any
from retry_async import retry as retry_untyped # type: ignore
retry_logger = logging.getLogger(__name__)
def retry(
exceptions: Any = Exception,
*,
is_async: bool = False,
tries: int = -1,
delay: float = 0,
max_delay: float | None = None,
backoff: float = 1,
jitter: float | tuple[float, float] = 0,
logger: logging.Logger = retry_logger,
) -> Callable[..., Any]:
wrapped = retry_untyped(
exceptions=exceptions,
is_async=is_async,
tries=tries,
delay=delay,
max_delay=max_delay,
backoff=backoff,
jitter=jitter,
logger=logger,
)
return wrapped # type: ignore

View File

@ -1,88 +1,81 @@
[tool.poetry]
name = "private-gpt"
version = "0.6.0"
version = "0.6.2"
description = "Private GPT"
authors = ["Zylon <hi@zylon.ai>"]
[tool.poetry.dependencies]
python = ">=3.11,<3.12"
# PrivateGPT
fastapi = { extras = ["all"], version = "^0.111.0" }
python-multipart = "^0.0.9"
injector = "^0.21.0"
pyyaml = "^6.0.1"
fastapi = { extras = ["all"], version = "^0.115.0" }
python-multipart = "^0.0.10"
injector = "^0.22.0"
pyyaml = "^6.0.2"
watchdog = "^4.0.1"
transformers = "^4.42.3"
transformers = "^4.44.2"
docx2txt = "^0.8"
cryptography = "^3.1"
# LlamaIndex core libs
llama-index-core = "^0.10.52"
llama-index-readers-file = "^0.1.27"
llama-index-core = ">=0.11.2,<0.12.0"
llama-index-readers-file = "*"
# Optional LlamaIndex integration libs
llama-index-llms-llama-cpp = {version = "^0.1.4", optional = true}
llama-index-llms-openai = {version = "^0.1.25", optional = true}
llama-index-llms-openai-like = {version ="^0.1.3", optional = true}
llama-index-llms-ollama = {version ="^0.2.2", optional = true}
llama-index-llms-azure-openai = {version ="^0.1.8", optional = true}
llama-index-llms-gemini = {version ="^0.1.11", optional = true}
llama-index-embeddings-ollama = {version ="^0.1.2", optional = true}
llama-index-embeddings-huggingface = {version ="^0.2.2", optional = true}
llama-index-embeddings-openai = {version ="^0.1.10", optional = true}
llama-index-embeddings-azure-openai = {version ="^0.1.10", optional = true}
llama-index-embeddings-gemini = {version ="^0.1.8", optional = true}
llama-index-vector-stores-qdrant = {version ="^0.2.10", optional = true}
llama-index-vector-stores-milvus = {version ="^0.1.20", optional = true}
llama-index-vector-stores-chroma = {version ="^0.1.10", optional = true}
llama-index-vector-stores-postgres = {version ="^0.1.11", optional = true}
llama-index-vector-stores-clickhouse = {version ="^0.1.3", optional = true}
llama-index-storage-docstore-postgres = {version ="^0.1.3", optional = true}
llama-index-storage-index-store-postgres = {version ="^0.1.4", optional = true}
llama-index-llms-llama-cpp = {version = "*", optional = true}
llama-index-llms-openai = {version ="*", optional = true}
llama-index-llms-openai-like = {version ="*", optional = true}
llama-index-llms-ollama = {version ="*", optional = true}
llama-index-llms-azure-openai = {version ="*", optional = true}
llama-index-llms-gemini = {version ="*", optional = true}
llama-index-embeddings-ollama = {version ="*", optional = true}
llama-index-embeddings-huggingface = {version ="*", optional = true}
llama-index-embeddings-openai = {version ="*", optional = true}
llama-index-embeddings-azure-openai = {version ="*", optional = true}
llama-index-embeddings-gemini = {version ="*", optional = true}
llama-index-embeddings-mistralai = {version ="*", optional = true}
llama-index-vector-stores-qdrant = {version ="*", optional = true}
llama-index-vector-stores-milvus = {version ="*", optional = true}
llama-index-vector-stores-chroma = {version ="*", optional = true}
llama-index-vector-stores-postgres = {version ="*", optional = true}
llama-index-vector-stores-clickhouse = {version ="*", optional = true}
llama-index-storage-docstore-postgres = {version ="*", optional = true}
llama-index-storage-index-store-postgres = {version ="*", optional = true}
# Postgres
psycopg2-binary = {version ="^2.9.9", optional = true}
asyncpg = {version="^0.29.0", optional = true}
# ClickHouse
clickhouse-connect = {version = "^0.7.15", optional = true}
clickhouse-connect = {version = "^0.7.19", optional = true}
# Optional Sagemaker dependency
boto3 = {version ="^1.34.139", optional = true}
# Optional Qdrant client
qdrant-client = {version ="^1.9.0", optional = true}
boto3 = {version ="^1.35.26", optional = true}
# Optional Reranker dependencies
torch = {version ="^2.3.1", optional = true}
sentence-transformers = {version ="^3.0.1", optional = true}
torch = {version ="^2.4.1", optional = true}
sentence-transformers = {version ="^3.1.1", optional = true}
# Optional UI
gradio = {version ="^4.37.2", optional = true}
# Fix: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/16289#issuecomment-2255106490
ffmpy = {git = "https://github.com/EuDs63/ffmpy.git", rev = "333a19ee4d21f32537c0508aa1942ef1aa7afe24", optional = true}
# Optional Google Gemini dependency
google-generativeai = {version ="^0.5.4", optional = true}
# Optional Ollama client
ollama = {version ="^0.3.0", optional = true}
gradio = {version ="^4.44.0", optional = true}
ffmpy = {version ="^0.4.0", optional = true}
# Optional HF Transformers
einops = {version = "^0.8.0", optional = true}
retry-async = "^0.1.4"
[tool.poetry.extras]
ui = ["gradio", "ffmpy"]
llms-llama-cpp = ["llama-index-llms-llama-cpp"]
llms-openai = ["llama-index-llms-openai"]
llms-openai-like = ["llama-index-llms-openai-like"]
llms-ollama = ["llama-index-llms-ollama", "ollama"]
llms-ollama = ["llama-index-llms-ollama"]
llms-sagemaker = ["boto3"]
llms-azopenai = ["llama-index-llms-azure-openai"]
llms-gemini = ["llama-index-llms-gemini", "google-generativeai"]
embeddings-ollama = ["llama-index-embeddings-ollama", "ollama"]
llms-gemini = ["llama-index-llms-gemini"]
embeddings-ollama = ["llama-index-embeddings-ollama"]
embeddings-huggingface = ["llama-index-embeddings-huggingface", "einops"]
embeddings-openai = ["llama-index-embeddings-openai"]
embeddings-sagemaker = ["boto3"]
embeddings-azopenai = ["llama-index-embeddings-azure-openai"]
embeddings-gemini = ["llama-index-embeddings-gemini"]
embeddings-mistral = ["llama-index-embeddings-mistralai"]
vector-stores-qdrant = ["llama-index-vector-stores-qdrant"]
vector-stores-clickhouse = ["llama-index-vector-stores-clickhouse", "clickhouse_connect"]
vector-stores-chroma = ["llama-index-vector-stores-chroma"]
@ -92,14 +85,14 @@ storage-nodestore-postgres = ["llama-index-storage-docstore-postgres","llama-ind
rerank-sentence-transformers = ["torch", "sentence-transformers"]
[tool.poetry.group.dev.dependencies]
black = "^22"
mypy = "^1.2"
pre-commit = "^2"
pytest = "^7"
pytest-cov = "^3"
black = "^24"
mypy = "^1.11"
pre-commit = "^3"
pytest = "^8"
pytest-cov = "^5"
ruff = "^0"
pytest-asyncio = "^0.21.1"
types-pyyaml = "^6.0.12.12"
pytest-asyncio = "^0.24.0"
types-pyyaml = "^6.0.12.20240917"
[build-system]
requires = ["poetry-core>=1.0.0"]

View File

@ -25,21 +25,23 @@ data:
ui:
enabled: true
path: /
# "RAG", "Search", "Basic", or "Summarize"
default_mode: "RAG"
default_chat_system_prompt: >
You are a helpful, respectful and honest assistant.
You are a helpful, respectful and honest assistant.
Always answer as helpfully as possible and follow ALL given instructions.
Do not speculate or make up information.
Do not reference any given instructions or context.
default_query_system_prompt: >
You can only answer questions about the provided context.
If you know the answer but it is not based in the provided context, don't provide
You can only answer questions about the provided context.
If you know the answer but it is not based in the provided context, don't provide
the answer, just state the answer is not in the context provided.
default_summarization_system_prompt: >
Provide a comprehensive summary of the provided context information.
Provide a comprehensive summary of the provided context information.
The summary should cover all the key points and main ideas presented in
the original text, while also condensing the information into a concise
the original text, while also condensing the information into a concise
and easy-to-understand format. Please ensure that the summary includes
relevant details and examples that support the main ideas, while avoiding
relevant details and examples that support the main ideas, while avoiding
any unnecessary information or repetition.
delete_file_button_enabled: true
delete_all_files_button_enabled: true

View File

@ -5,7 +5,7 @@ from private_gpt.launcher import create_app
from tests.fixtures.mock_injector import MockInjector
@pytest.fixture()
@pytest.fixture
def test_client(request: pytest.FixtureRequest, injector: MockInjector) -> TestClient:
if request is not None and hasattr(request, "param"):
injector.bind_settings(request.param or {})

View File

@ -19,6 +19,6 @@ class IngestHelper:
return ingest_result
@pytest.fixture()
@pytest.fixture
def ingest_helper(test_client: TestClient) -> IngestHelper:
return IngestHelper(test_client)

View File

@ -37,6 +37,6 @@ class MockInjector:
return self.test_injector.get(interface)
@pytest.fixture()
@pytest.fixture
def injector() -> MockInjector:
return MockInjector()

View File

@ -6,7 +6,7 @@ import pytest
from fastapi.testclient import TestClient
@pytest.fixture()
@pytest.fixture
def file_path() -> str:
return "test.txt"

View File

@ -1 +1 @@
0.6.0
0.6.2