Compare commits

...

40 Commits
0.1.1 ... main

Author SHA1 Message Date
Iván Martínez
b7ee43788d
Update README.md 2024-11-13 20:29:56 +01:00
meng-hui
940bdd49af
fix: 503 when private gpt gets ollama service (#2104)
When running private gpt with external ollama API, ollama service
returns 503 on startup because ollama service (traefik) might not be
ready.

- Add healthcheck to ollama service to test for connection to external
ollama
- private-gpt-ollama service depends on ollama being service_healthy

Co-authored-by: Koh Meng Hui <kohmh@duck.com>
2024-10-17 12:44:28 +02:00
Javier Martinez
5851b02378
feat: update llama-index + dependencies (#2092)
* chore: update libraries

* fix: mypy

* chore: more updates

* fix: mypy/black

* chore: fix docker warnings

* fix: mypy

* fix: black
2024-09-26 16:29:52 +02:00
Dmitri Qiu
5fbb402477
fix: Sanitize null bytes before ingestion (#2090)
* Sanitize null bytes before ingestion

* Added comments
2024-09-25 12:00:03 +02:00
J
fa3c30661d
fix: Add default mode option to settings (#2078)
* Add default mode option to settings

* Revise default_mode to Literal (enum) and add to settings.yaml

* Revise to pass make check/test

* Default mode: RAG

---------

Co-authored-by: Jason <jason@sowinsight.solutions>
2024-09-24 08:33:02 +02:00
Liam Dowd
f9182b3a86
feat: Adding MistralAI mode (#2065)
* Adding MistralAI mode

* Update embedding_component.py

* Update ui.py

* Update settings.py

* Update embedding_component.py

* Update settings.py

* Update settings.py

* Update settings-mistral.yaml

* Update llm_component.py

* Update settings-mistral.yaml

* Update settings.py

* Update settings.py

* Update ui.py

* Update embedding_component.py

* Delete settings-mistral.yaml

---------

Co-authored-by: SkiingIsFun123 <101684827+SkiingIsFun123@users.noreply.github.com>
Co-authored-by: Javier Martinez <javiermartinezalvarez98@gmail.com>
2024-09-24 08:31:30 +02:00
Javier Martinez
8c12c6830b
fix: docker permissions (#2059)
* fix: missing depends_on

* chore: update copy permissions

* chore: update entrypoint

* Revert "chore: update entrypoint"

This reverts commit f73a36af2f.

* Revert "chore: update copy permissions"

This reverts commit fabc3f66bb.

* style: fix docker warning

* fix: multiples fixes

* fix: user permissions writing local_data folder
2024-09-24 08:30:58 +02:00
Javier Martinez
77461b96cf
feat: add retry connection to ollama (#2084)
* feat: add retry connection to ollama

When Ollama is running in the docker-compose, traefik is not ready sometimes to route the request, and it fails

* fix: mypy
2024-09-16 16:43:05 +02:00
Trivikram Kamat
42628596b2
ci: bump actions/checkout to v4 (#2077) 2024-09-09 08:53:13 +02:00
Artur Martins
7603b3627d
fix: Rectify ffmpy poetry config; update version from 0.3.2 to 0.4.0 (#2062)
* Fix: Rectify ffmpy 0.3.2 poetry config

* keep optional set to false for ffmpy

* Updating ffmpy to version 0.4.0

* Remove comment about a fix
2024-08-21 10:39:58 +02:00
Javier Martinez
89477ea9d3
fix: naming image and ollama-cpu (#2056) 2024-08-12 08:23:16 +02:00
github-actions[bot]
22904ca8ad
chore(main): release 0.6.2 (#2049)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-08-08 18:16:41 +02:00
Javier Martinez
7fefe408b4
fix: auto-update version (#2052) 2024-08-08 16:50:42 +02:00
Javier Martinez
b1acf9dc2c
fix: publish image name (#2043) 2024-08-07 17:39:32 +02:00
Javier Martinez
4ca6d0cb55
fix: add numpy issue to troubleshooting (#2048)
* docs: add numpy issue to troubleshooting

* fix: troubleshooting link

...
2024-08-07 12:16:03 +02:00
Javier Martinez
b16abbefe4
fix: update matplotlib to 3.9.1-post1 to fix win install
* chore: block matplotlib to fix installation in window machines

* chore: remove workaround, just update poetry.lock

* fix: update matplotlib to last version
2024-08-07 11:26:42 +02:00
github-actions[bot]
ca2b8da69c
chore(main): release 0.6.1 (#2041)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-08-05 17:17:34 +02:00
Javier Martinez
f09f6dd255
fix: add built image from DockerHub (#2042)
* chore: update docker-compose with profiles

* docs: add quick start doc

* chore: generate docker release when new version is released

* chore: add dockerhub image in docker-compose

* docs: update quickstart with local/remote images

* chore: update docker tag

* chore: refactor dockerfile names

* chore: update docker-compose names

* docs: update llamacpp naming

* fix: naming

* docs: fix llamacpp command
2024-08-05 17:15:38 +02:00
Liam Dowd
1c665f7900
fix: Adding azopenai to model list (#2035)
Fixing the error I encountered while using the azopenai mode
2024-08-05 16:30:10 +02:00
Javier Martinez
1d4c14d7a3
fix(deploy): generate docker release when new version is released (#2038) 2024-08-05 16:28:19 +02:00
Javier Martinez
dae0727a1b
fix(deploy): improve Docker-Compose and quickstart on Docker (#2037)
* chore: update docker-compose with profiles

* docs: add quick start doc
2024-08-05 16:28:19 +02:00
github-actions[bot]
6674b46fea
chore(main): release 0.6.0 (#1834)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-08-02 11:28:22 +02:00
Javier Martinez
e44a7f5773
chore: bump version (#2033) 2024-08-02 11:26:03 +02:00
Javier Martinez
cf61bf780f
feat(llm): add progress bar when ollama is pulling models (#2031)
* fix: add ollama progress bar when pulling models

* feat: add ollama queue

* fix: mypy
2024-08-01 19:14:26 +02:00
Javier Martinez
50b3027a24
docs: update docs and capture (#2029)
* docs: update Readme

* style: refactor image

* docs: change important to tip
2024-08-01 10:01:22 +02:00
Javier Martinez
54659588b5
fix: nomic embeddings (#2030)
* fix: allow to configure trust_remote_code

based on: https://github.com/zylon-ai/private-gpt/issues/1893#issuecomment-2118629391

* fix: nomic hf embeddings
2024-08-01 09:43:30 +02:00
Javier Martinez
8119842ae6
feat(recipe): add our first recipe Summarize (#2028)
* feat: add summary recipe

* test: add summary tests

* docs: move all recipes docs

* docs: add recipes and summarize doc

* docs: update openapi reference

* refactor: split method in two method (summary)

* feat: add initial summarize ui

* feat: add mode explanation

* fix: mypy

* feat: allow to configure async property in summarize

* refactor: move modes to enum and update mode explanations

* docs: fix url

* docs: remove list-llm pages

* docs: remove double header

* fix: summary description
2024-07-31 16:53:27 +02:00
Javier Martinez
40638a18a5
fix: unify embedding models (#2027)
* feat: unify embedding model to nomic

* docs: add embedding dimensions mismatch

* docs: fix fern
2024-07-31 14:35:46 +02:00
Javier Martinez
9027d695c1
feat: make llama3.1 as default (#2022)
* feat: change ollama default model to llama3.1

* chore: bump versions

* feat: Change default model in local mode to llama3.1

* chore: make sure last poetry version is used

* fix: mypy

* fix: do not add BOS (with last llamacpp-python version)
2024-07-31 14:35:36 +02:00
Javier Martinez
e54a8fe043
fix: prevent to ingest local files (by default) (#2010)
* feat: prevent to local ingestion (by default) and add white-list

* docs: add local ingestion warning

* docs: add missing comment

* fix: update exception error

* fix: black
2024-07-31 14:33:46 +02:00
Javier Martinez
1020cd5328
fix: light mode (#2025) 2024-07-31 12:59:31 +02:00
Quentin McGaw
65c5a1708b
chore(docker): dockerfiles improvements and fixes (#1792)
* `UID` and `GID` build arguments for `worker` user

* `POETRY_EXTRAS` build argument with default values

* Copy `Makefile` for `make ingest` command

* Do NOT copy markdown files
I doubt anyone reads a markdown file within a Docker image

* Fix PYTHONPATH value

* Set home directory to `/home/worker` when creating user

* Combine `ENV` instructions together

* Define environment variables with their defaults
- For documentation purposes
- Reflect defaults set in settings-docker.yml

* `PGPT_EMBEDDING_MODE` to define embedding mode

* Remove ineffective `python3 -m pipx ensurepath`

* Use `&&` instead of `;` to chain commands to detect failure better

* Add `--no-root` flag to poetry install commands

* Set PGPT_PROFILES to docker

* chore: remove envs

* chore: update to use ollama in docker-compose

* chore: don't copy makefile

* chore: don't copy fern

* fix: tiktoken cache

* fix: docker compose port

* fix: ffmpy dependency (#2020)

* fix: ffmpy dependency

* fix: block ffmpy to commit sha

* feat(llm): autopull ollama models (#2019)

* chore: update ollama (llm)

* feat: allow to autopull ollama models

* fix: mypy

* chore: install always ollama client

* refactor: check connection and pull ollama method to utils

* docs: update ollama config with autopulling info

...

* chore: autopull ollama models

* chore: add GID/UID comment

...

---------

Co-authored-by: Javier Martinez <javiermartinezalvarez98@gmail.com>
2024-07-30 17:59:38 +02:00
Robert Hirsch
d080969407
added llama3 prompt (#1962)
* added llama3 prompt

* more fixes to pass tests; changed type VectorStore -> BasePydanticVectorStore, see https://github.com/run-llama/llama_index/blob/main/CHANGELOG.md#2024-05-14

* fix: new llama3 prompt

---------

Co-authored-by: Javier Martinez <javiermartinezalvarez98@gmail.com>
2024-07-29 17:28:00 +02:00
Javier Martinez
d4375d078f
fix(ui): gradio bug fixes (#2021)
* fix: when two user messages were sent

* fix: add source divider

* fix: add favicon

* fix: add zylon link

* refactor: update label
2024-07-29 16:48:16 +02:00
Javier Martinez
20bad17c98
feat(llm): autopull ollama models (#2019)
* chore: update ollama (llm)

* feat: allow to autopull ollama models

* fix: mypy

* chore: install always ollama client

* refactor: check connection and pull ollama method to utils

* docs: update ollama config with autopulling info
2024-07-29 13:25:42 +02:00
Javier Martinez
dabf556dae
fix: ffmpy dependency (#2020)
* fix: ffmpy dependency

* fix: block ffmpy to commit sha
2024-07-29 11:56:57 +02:00
Iván Martínez
05a986231c
Add proper param to demo urls (#2007) 2024-07-22 14:44:03 +02:00
Javier Martinez
b62669784b
docs: update welcome page (#2004) 2024-07-18 14:42:39 +02:00
Javier Martinez
2c78bb2958
docs: add PR and issue templates (#2002)
* chore: add pull request template

* chore: add issue templates

* chore: require more information in bugs
2024-07-18 12:56:10 +02:00
Iván Martínez
90d211c5cd
Update README.md (#2003)
* Update README.md

Remove the outdated contact form and point to Zylon website for those looking for a ready-to-use enterprise solution built on top of PrivateGPT

* Update README.md

Update text to address the comments

* Update README.md

Improve text
2024-07-18 12:11:24 +02:00
64 changed files with 5605 additions and 3603 deletions

16
.docker/router.yml Normal file
View File

@ -0,0 +1,16 @@
http:
services:
ollama:
loadBalancer:
healthCheck:
interval: 5s
path: /
servers:
- url: http://ollama-cpu:11434
- url: http://ollama-cuda:11434
- url: http://host.docker.internal:11434
routers:
ollama-router:
rule: "PathPrefix(`/`)"
service: ollama

105
.github/ISSUE_TEMPLATE/bug.yml vendored Normal file
View File

@ -0,0 +1,105 @@
name: Bug Report
description: Report a bug or issue with the project.
title: "[BUG] "
labels: ["bug"]
body:
- type: markdown
attributes:
value: |
**Please describe the bug you encountered.**
- type: checkboxes
id: pre-check
attributes:
label: Pre-check
description: Please confirm that you have searched for duplicate issues before creating this one.
options:
- label: I have searched the existing issues and none cover this bug.
required: true
- type: textarea
id: description
attributes:
label: Description
description: Provide a detailed description of the bug.
placeholder: "Detailed description of the bug"
validations:
required: true
- type: textarea
id: steps
attributes:
label: Steps to Reproduce
description: Provide the steps to reproduce the bug.
placeholder: "1. Step one\n2. Step two\n3. Step three"
validations:
required: true
- type: input
id: expected
attributes:
label: Expected Behavior
description: Describe what you expected to happen.
placeholder: "Expected behavior"
validations:
required: true
- type: input
id: actual
attributes:
label: Actual Behavior
description: Describe what actually happened.
placeholder: "Actual behavior"
validations:
required: true
- type: input
id: environment
attributes:
label: Environment
description: Provide details about your environment (e.g., OS, GPU, profile, etc.).
placeholder: "Environment details"
validations:
required: true
- type: input
id: additional
attributes:
label: Additional Information
description: Provide any additional information that may be relevant (e.g., logs, screenshots).
placeholder: "Any additional information that may be relevant"
- type: input
id: version
attributes:
label: Version
description: Provide the version of the project where you encountered the bug.
placeholder: "Version number"
- type: markdown
attributes:
value: |
**Please ensure the following setup checklist has been reviewed before submitting the bug report.**
- type: checkboxes
id: general-setup-checklist
attributes:
label: Setup Checklist
description: Verify the following general aspects of your setup.
options:
- label: Confirm that you have followed the installation instructions in the projects documentation.
- label: Check that you are using the latest version of the project.
- label: Verify disk space availability for model storage and data processing.
- label: Ensure that you have the necessary permissions to run the project.
- type: checkboxes
id: nvidia-setup-checklist
attributes:
label: NVIDIA GPU Setup Checklist
description: Verify the following aspects of your NVIDIA GPU setup.
options:
- label: Check that the all CUDA dependencies are installed and are compatible with your GPU (refer to [CUDA's documentation](https://docs.nvidia.com/deploy/cuda-compatibility/#frequently-asked-questions))
- label: Ensure an NVIDIA GPU is installed and recognized by the system (run `nvidia-smi` to verify).
- label: Ensure proper permissions are set for accessing GPU resources.
- label: Docker users - Verify that the NVIDIA Container Toolkit is configured correctly (e.g. run `sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi`)

8
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@ -0,0 +1,8 @@
blank_issues_enabled: false
contact_links:
- name: Documentation
url: https://docs.privategpt.dev
about: Please refer to our documentation for more details and guidance.
- name: Discord
url: https://discord.gg/bK6mRVpErU
about: Join our Discord community to ask questions and get help.

19
.github/ISSUE_TEMPLATE/docs.yml vendored Normal file
View File

@ -0,0 +1,19 @@
name: Documentation
description: Suggest a change or addition to the documentation.
title: "[DOCS] "
labels: ["documentation"]
body:
- type: markdown
attributes:
value: |
**Please describe the documentation change or addition you would like to suggest.**
- type: textarea
id: description
attributes:
label: Description
description: Provide a detailed description of the documentation change.
placeholder: "Detailed description of the documentation change"
validations:
required: true

37
.github/ISSUE_TEMPLATE/feature.yml vendored Normal file
View File

@ -0,0 +1,37 @@
name: Enhancement
description: Suggest an enhancement or improvement to the project.
title: "[FEATURE] "
labels: ["enhancement"]
body:
- type: markdown
attributes:
value: |
**Please describe the enhancement or improvement you would like to suggest.**
- type: textarea
id: feature_description
attributes:
label: Feature Description
description: Provide a detailed description of the enhancement.
placeholder: "Detailed description of the enhancement"
validations:
required: true
- type: textarea
id: reason
attributes:
label: Reason
description: Explain the reason for this enhancement.
placeholder: "Reason for the enhancement"
validations:
required: true
- type: textarea
id: value
attributes:
label: Value of Feature
description: Describe the value or benefits this feature will bring.
placeholder: "Value or benefits of the feature"
validations:
required: true

19
.github/ISSUE_TEMPLATE/question.yml vendored Normal file
View File

@ -0,0 +1,19 @@
name: Question
description: Ask a question about the project.
title: "[QUESTION] "
labels: ["question"]
body:
- type: markdown
attributes:
value: |
**Please describe your question in detail.**
- type: textarea
id: question
attributes:
label: Question
description: Provide a detailed description of your question.
placeholder: "Detailed description of the question"
validations:
required: true

37
.github/pull_request_template.md vendored Normal file
View File

@ -0,0 +1,37 @@
# Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
## Type of Change
Please delete options that are not relevant.
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update
## How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
- [ ] Added new unit/integration tests
- [ ] I stared at the code and made sure it makes sense
**Test Configuration**:
* Firmware version:
* Hardware:
* Toolchain:
* SDK:
## Checklist:
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged and published in downstream modules
- [ ] I ran `make check; make test` to ensure mypy and tests pass

View File

@ -0,0 +1,19 @@
{
"$schema": "https://raw.githubusercontent.com/googleapis/release-please/main/schemas/config.json",
"release-type": "simple",
"version-file": "version.txt",
"extra-files": [
{
"type": "toml",
"path": "pyproject.toml",
"jsonpath": "$.tool.poetry.version"
},
{
"type": "generic",
"path": "docker-compose.yaml"
}
],
"packages": {
".": {}
}
}

View File

@ -0,0 +1,3 @@
{
".": "0.6.2"
}

View File

@ -8,7 +8,7 @@ inputs:
poetry_version:
required: true
type: string
default: "1.5.1"
default: "1.8.3"
runs:
using: composite

View File

@ -1,45 +0,0 @@
name: docker
on:
release:
types: [ published ]
workflow_dispatch:
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Log in to the Container registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=sha
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
file: Dockerfile.external
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

83
.github/workflows/generate-release.yml vendored Normal file
View File

@ -0,0 +1,83 @@
name: generate-release
on:
release:
types: [ published ]
workflow_dispatch:
env:
REGISTRY: docker.io
IMAGE_NAME: zylonai/private-gpt
platforms: linux/amd64,linux/arm64
DEFAULT_TYPE: "ollama"
jobs:
build-and-push-image:
runs-on: ubuntu-latest
strategy:
matrix:
type: [ llamacpp-cpu, ollama ]
permissions:
contents: read
packages: write
outputs:
version: ${{ steps.version.outputs.version }}
steps:
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/free-disk-space@main
with:
tool-cache: false
android: true
dotnet: true
haskell: true
large-packages: true
docker-images: false
swap-storage: true
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=semver,pattern={{version}},enable=${{ matrix.type == env.DEFAULT_TYPE }}
type=semver,pattern={{version}}-${{ matrix.type }}
type=semver,pattern={{major}}.{{minor}},enable=${{ matrix.type == env.DEFAULT_TYPE }}
type=semver,pattern={{major}}.{{minor}}-${{ matrix.type }}
type=raw,value=latest,enable=${{ matrix.type == env.DEFAULT_TYPE }}
type=sha
flavor: |
latest=false
- name: Build and push Docker image
uses: docker/build-push-action@v6
with:
context: .
file: Dockerfile.${{ matrix.type }}
platforms: ${{ env.platforms }}
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
- name: Version output
id: version
run: echo "version=${{ steps.meta.outputs.version }}" >> "$GITHUB_OUTPUT"

View File

@ -13,7 +13,8 @@ jobs:
release-please:
runs-on: ubuntu-latest
steps:
- uses: google-github-actions/release-please-action@v3
- uses: google-github-actions/release-please-action@v4
id: release
with:
release-type: simple
version-file: version.txt
config-file: .github/release_please/.release-please-config.json
manifest-file: .github/release_please/.release-please-manifest.json

View File

@ -14,7 +14,7 @@ jobs:
setup:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- uses: ./.github/workflows/actions/install_dependencies
checks:
@ -28,7 +28,7 @@ jobs:
- ruff
- mypy
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- uses: ./.github/workflows/actions/install_dependencies
- name: run ${{ matrix.quality-command }}
run: make ${{ matrix.quality-command }}
@ -38,7 +38,7 @@ jobs:
runs-on: ubuntu-latest
name: test
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- uses: ./.github/workflows/actions/install_dependencies
- name: run test
run: make test-coverage

View File

@ -1,5 +1,63 @@
# Changelog
## [0.6.2](https://github.com/zylon-ai/private-gpt/compare/v0.6.1...v0.6.2) (2024-08-08)
### Bug Fixes
* add numpy issue to troubleshooting ([#2048](https://github.com/zylon-ai/private-gpt/issues/2048)) ([4ca6d0c](https://github.com/zylon-ai/private-gpt/commit/4ca6d0cb556be7a598f7d3e3b00d2a29214ee1e8))
* auto-update version ([#2052](https://github.com/zylon-ai/private-gpt/issues/2052)) ([7fefe40](https://github.com/zylon-ai/private-gpt/commit/7fefe408b4267684c6e3c1a43c5dc2b73ec61fe4))
* publish image name ([#2043](https://github.com/zylon-ai/private-gpt/issues/2043)) ([b1acf9d](https://github.com/zylon-ai/private-gpt/commit/b1acf9dc2cbca2047cd0087f13254ff5cda6e570))
* update matplotlib to 3.9.1-post1 to fix win install ([b16abbe](https://github.com/zylon-ai/private-gpt/commit/b16abbefe49527ac038d235659854b98345d5387))
## [0.6.1](https://github.com/zylon-ai/private-gpt/compare/v0.6.0...v0.6.1) (2024-08-05)
### Bug Fixes
* add built image from DockerHub ([#2042](https://github.com/zylon-ai/private-gpt/issues/2042)) ([f09f6dd](https://github.com/zylon-ai/private-gpt/commit/f09f6dd2553077d4566dbe6b48a450e05c2f049e))
* Adding azopenai to model list ([#2035](https://github.com/zylon-ai/private-gpt/issues/2035)) ([1c665f7](https://github.com/zylon-ai/private-gpt/commit/1c665f7900658144f62814b51f6e3434a6d7377f))
* **deploy:** generate docker release when new version is released ([#2038](https://github.com/zylon-ai/private-gpt/issues/2038)) ([1d4c14d](https://github.com/zylon-ai/private-gpt/commit/1d4c14d7a3c383c874b323d934be01afbaca899e))
* **deploy:** improve Docker-Compose and quickstart on Docker ([#2037](https://github.com/zylon-ai/private-gpt/issues/2037)) ([dae0727](https://github.com/zylon-ai/private-gpt/commit/dae0727a1b4abd35d2b0851fe30e0a4ed67e0fbb))
## [0.6.0](https://github.com/zylon-ai/private-gpt/compare/v0.5.0...v0.6.0) (2024-08-02)
### Features
* bump dependencies ([#1987](https://github.com/zylon-ai/private-gpt/issues/1987)) ([b687dc8](https://github.com/zylon-ai/private-gpt/commit/b687dc852413404c52d26dcb94536351a63b169d))
* **docs:** add privategpt-ts sdk ([#1924](https://github.com/zylon-ai/private-gpt/issues/1924)) ([d13029a](https://github.com/zylon-ai/private-gpt/commit/d13029a046f6e19e8ee65bef3acd96365c738df2))
* **docs:** Fix setup docu ([#1926](https://github.com/zylon-ai/private-gpt/issues/1926)) ([067a5f1](https://github.com/zylon-ai/private-gpt/commit/067a5f144ca6e605c99d7dbe9ca7d8207ac8808d))
* **docs:** update doc for ipex-llm ([#1968](https://github.com/zylon-ai/private-gpt/issues/1968)) ([19a7c06](https://github.com/zylon-ai/private-gpt/commit/19a7c065ef7f42b37f289dd28ac945f7afc0e73a))
* **docs:** update documentation and fix preview-docs ([#2000](https://github.com/zylon-ai/private-gpt/issues/2000)) ([4523a30](https://github.com/zylon-ai/private-gpt/commit/4523a30c8f004aac7a7ae224671e2c45ec0cb973))
* **llm:** add progress bar when ollama is pulling models ([#2031](https://github.com/zylon-ai/private-gpt/issues/2031)) ([cf61bf7](https://github.com/zylon-ai/private-gpt/commit/cf61bf780f8d122e4057d002abf03563bb45614a))
* **llm:** autopull ollama models ([#2019](https://github.com/zylon-ai/private-gpt/issues/2019)) ([20bad17](https://github.com/zylon-ai/private-gpt/commit/20bad17c9857809158e689e9671402136c1e3d84))
* **llm:** Support for Google Gemini LLMs and Embeddings ([#1965](https://github.com/zylon-ai/private-gpt/issues/1965)) ([fc13368](https://github.com/zylon-ai/private-gpt/commit/fc13368bc72d1f4c27644677431420ed77731c03))
* make llama3.1 as default ([#2022](https://github.com/zylon-ai/private-gpt/issues/2022)) ([9027d69](https://github.com/zylon-ai/private-gpt/commit/9027d695c11fbb01e62424b855665de71d513417))
* prompt_style applied to all LLMs + extra LLM params. ([#1835](https://github.com/zylon-ai/private-gpt/issues/1835)) ([e21bf20](https://github.com/zylon-ai/private-gpt/commit/e21bf20c10938b24711d9f2c765997f44d7e02a9))
* **recipe:** add our first recipe `Summarize` ([#2028](https://github.com/zylon-ai/private-gpt/issues/2028)) ([8119842](https://github.com/zylon-ai/private-gpt/commit/8119842ae6f1f5ecfaf42b06fa0d1ffec675def4))
* **vectordb:** Milvus vector db Integration ([#1996](https://github.com/zylon-ai/private-gpt/issues/1996)) ([43cc31f](https://github.com/zylon-ai/private-gpt/commit/43cc31f74015f8d8fcbf7a8ea7d7d9ecc66cf8c9))
* **vectorstore:** Add clickhouse support as vectore store ([#1883](https://github.com/zylon-ai/private-gpt/issues/1883)) ([2612928](https://github.com/zylon-ai/private-gpt/commit/26129288394c7483e6fc0496a11dc35679528cc1))
### Bug Fixes
* "no such group" error in Dockerfile, added docx2txt and cryptography deps ([#1841](https://github.com/zylon-ai/private-gpt/issues/1841)) ([947e737](https://github.com/zylon-ai/private-gpt/commit/947e737f300adf621d2261d527192f36f3387f8e))
* **config:** make tokenizer optional and include a troubleshooting doc ([#1998](https://github.com/zylon-ai/private-gpt/issues/1998)) ([01b7ccd](https://github.com/zylon-ai/private-gpt/commit/01b7ccd0648be032846647c9a184925d3682f612))
* **docs:** Fix concepts.mdx referencing to installation page ([#1779](https://github.com/zylon-ai/private-gpt/issues/1779)) ([dde0224](https://github.com/zylon-ai/private-gpt/commit/dde02245bcd51a7ede7b6789c82ae217cac53d92))
* **docs:** Update installation.mdx ([#1866](https://github.com/zylon-ai/private-gpt/issues/1866)) ([c1802e7](https://github.com/zylon-ai/private-gpt/commit/c1802e7cf0e56a2603213ec3b6a4af8fadb8a17a))
* ffmpy dependency ([#2020](https://github.com/zylon-ai/private-gpt/issues/2020)) ([dabf556](https://github.com/zylon-ai/private-gpt/commit/dabf556dae9cb00fe0262270e5138d982585682e))
* light mode ([#2025](https://github.com/zylon-ai/private-gpt/issues/2025)) ([1020cd5](https://github.com/zylon-ai/private-gpt/commit/1020cd53288af71a17882781f392512568f1b846))
* **LLM:** mistral ignoring assistant messages ([#1954](https://github.com/zylon-ai/private-gpt/issues/1954)) ([c7212ac](https://github.com/zylon-ai/private-gpt/commit/c7212ac7cc891f9e3c713cc206ae9807c5dfdeb6))
* **llm:** special tokens and leading space ([#1831](https://github.com/zylon-ai/private-gpt/issues/1831)) ([347be64](https://github.com/zylon-ai/private-gpt/commit/347be643f7929c56382a77c3f45f0867605e0e0a))
* make embedding_api_base match api_base when on docker ([#1859](https://github.com/zylon-ai/private-gpt/issues/1859)) ([2a432bf](https://github.com/zylon-ai/private-gpt/commit/2a432bf9c5582a94eb4052b1e80cabdb118d298e))
* nomic embeddings ([#2030](https://github.com/zylon-ai/private-gpt/issues/2030)) ([5465958](https://github.com/zylon-ai/private-gpt/commit/54659588b5b109a3dd17cca835e275240464d275))
* prevent to ingest local files (by default) ([#2010](https://github.com/zylon-ai/private-gpt/issues/2010)) ([e54a8fe](https://github.com/zylon-ai/private-gpt/commit/e54a8fe0433252808d0a60f6a08a43c9f5a42f3b))
* Replacing unsafe `eval()` with `json.loads()` ([#1890](https://github.com/zylon-ai/private-gpt/issues/1890)) ([9d0d614](https://github.com/zylon-ai/private-gpt/commit/9d0d614706581a8bfa57db45f62f84ab23d26f15))
* **settings:** enable cors by default so it will work when using ts sdk (spa) ([#1925](https://github.com/zylon-ai/private-gpt/issues/1925)) ([966af47](https://github.com/zylon-ai/private-gpt/commit/966af4771dbe5cf3fdf554b5fdf8f732407859c4))
* **ui:** gradio bug fixes ([#2021](https://github.com/zylon-ai/private-gpt/issues/2021)) ([d4375d0](https://github.com/zylon-ai/private-gpt/commit/d4375d078f18ba53562fd71651159f997fff865f))
* unify embedding models ([#2027](https://github.com/zylon-ai/private-gpt/issues/2027)) ([40638a1](https://github.com/zylon-ai/private-gpt/commit/40638a18a5713d60fec8fe52796dcce66d88258c))
## [0.5.0](https://github.com/zylon-ai/private-gpt/compare/v0.4.0...v0.5.0) (2024-04-02)

View File

@ -1,40 +0,0 @@
FROM python:3.11.6-slim-bookworm as base
# Install poetry
RUN pip install pipx
RUN python3 -m pipx ensurepath
RUN pipx install poetry
ENV PATH="/root/.local/bin:$PATH"
ENV PATH=".venv/bin/:$PATH"
# https://python-poetry.org/docs/configuration/#virtualenvsin-project
ENV POETRY_VIRTUALENVS_IN_PROJECT=true
FROM base as dependencies
WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./
RUN poetry install --extras "ui vector-stores-qdrant llms-ollama embeddings-ollama"
FROM base as app
ENV PYTHONUNBUFFERED=1
ENV PORT=8080
EXPOSE 8080
# Prepare a non-root user
RUN adduser --system worker
WORKDIR /home/worker/app
RUN mkdir local_data; chown worker local_data
RUN mkdir models; chown worker models
COPY --chown=worker --from=dependencies /home/worker/app/.venv/ .venv
COPY --chown=worker private_gpt/ private_gpt
COPY --chown=worker fern/ fern
COPY --chown=worker *.yaml *.md ./
COPY --chown=worker scripts/ scripts
ENV PYTHONPATH="$PYTHONPATH:/private_gpt/"
USER worker
ENTRYPOINT python -m private_gpt

62
Dockerfile.llamacpp-cpu Normal file
View File

@ -0,0 +1,62 @@
### IMPORTANT, THIS IMAGE CAN ONLY BE RUN IN LINUX DOCKER
### You will run into a segfault in mac
FROM python:3.11.6-slim-bookworm AS base
# Install poetry
RUN pip install pipx
RUN python3 -m pipx ensurepath
RUN pipx install poetry==1.8.3
ENV PATH="/root/.local/bin:$PATH"
ENV PATH=".venv/bin/:$PATH"
# Dependencies to build llama-cpp
RUN apt update && apt install -y \
libopenblas-dev\
ninja-build\
build-essential\
pkg-config\
wget
# https://python-poetry.org/docs/configuration/#virtualenvsin-project
ENV POETRY_VIRTUALENVS_IN_PROJECT=true
FROM base AS dependencies
WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./
ARG POETRY_EXTRAS="ui embeddings-huggingface llms-llama-cpp vector-stores-qdrant"
RUN poetry install --no-root --extras "${POETRY_EXTRAS}"
FROM base AS app
ENV PYTHONUNBUFFERED=1
ENV PORT=8080
ENV APP_ENV=prod
ENV PYTHONPATH="$PYTHONPATH:/home/worker/app/private_gpt/"
EXPOSE 8080
# Prepare a non-root user
# More info about how to configure UIDs and GIDs in Docker:
# https://github.com/systemd/systemd/blob/main/docs/UIDS-GIDS.md
# Define the User ID (UID) for the non-root user
# UID 100 is chosen to avoid conflicts with existing system users
ARG UID=100
# Define the Group ID (GID) for the non-root user
# GID 65534 is often used for the 'nogroup' or 'nobody' group
ARG GID=65534
RUN adduser --system --gid ${GID} --uid ${UID} --home /home/worker worker
WORKDIR /home/worker/app
RUN chown worker /home/worker/app
RUN mkdir local_data && chown worker local_data
RUN mkdir models && chown worker models
COPY --chown=worker --from=dependencies /home/worker/app/.venv/ .venv
COPY --chown=worker private_gpt/ private_gpt
COPY --chown=worker *.yaml ./
COPY --chown=worker scripts/ scripts
USER worker
ENTRYPOINT python -m private_gpt

View File

@ -1,51 +0,0 @@
### IMPORTANT, THIS IMAGE CAN ONLY BE RUN IN LINUX DOCKER
### You will run into a segfault in mac
FROM python:3.11.6-slim-bookworm as base
# Install poetry
RUN pip install pipx
RUN python3 -m pipx ensurepath
RUN pipx install poetry
ENV PATH="/root/.local/bin:$PATH"
ENV PATH=".venv/bin/:$PATH"
# Dependencies to build llama-cpp
RUN apt update && apt install -y \
libopenblas-dev\
ninja-build\
build-essential\
pkg-config\
wget
# https://python-poetry.org/docs/configuration/#virtualenvsin-project
ENV POETRY_VIRTUALENVS_IN_PROJECT=true
FROM base as dependencies
WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./
RUN poetry install --extras "ui embeddings-huggingface llms-llama-cpp vector-stores-qdrant"
FROM base as app
ENV PYTHONUNBUFFERED=1
ENV PORT=8080
EXPOSE 8080
# Prepare a non-root user
RUN adduser --group worker
RUN adduser --system --ingroup worker worker
WORKDIR /home/worker/app
RUN mkdir local_data; chown worker local_data
RUN mkdir models; chown worker models
COPY --chown=worker --from=dependencies /home/worker/app/.venv/ .venv
COPY --chown=worker private_gpt/ private_gpt
COPY --chown=worker fern/ fern
COPY --chown=worker *.yaml *.md ./
COPY --chown=worker scripts/ scripts
ENV PYTHONPATH="$PYTHONPATH:/private_gpt/"
USER worker
ENTRYPOINT python -m private_gpt

51
Dockerfile.ollama Normal file
View File

@ -0,0 +1,51 @@
FROM python:3.11.6-slim-bookworm AS base
# Install poetry
RUN pip install pipx
RUN python3 -m pipx ensurepath
RUN pipx install poetry==1.8.3
ENV PATH="/root/.local/bin:$PATH"
ENV PATH=".venv/bin/:$PATH"
# https://python-poetry.org/docs/configuration/#virtualenvsin-project
ENV POETRY_VIRTUALENVS_IN_PROJECT=true
FROM base AS dependencies
WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./
ARG POETRY_EXTRAS="ui vector-stores-qdrant llms-ollama embeddings-ollama"
RUN poetry install --no-root --extras "${POETRY_EXTRAS}"
FROM base AS app
ENV PYTHONUNBUFFERED=1
ENV PORT=8080
ENV APP_ENV=prod
ENV PYTHONPATH="$PYTHONPATH:/home/worker/app/private_gpt/"
EXPOSE 8080
# Prepare a non-root user
# More info about how to configure UIDs and GIDs in Docker:
# https://github.com/systemd/systemd/blob/main/docs/UIDS-GIDS.md
# Define the User ID (UID) for the non-root user
# UID 100 is chosen to avoid conflicts with existing system users
ARG UID=100
# Define the Group ID (GID) for the non-root user
# GID 65534 is often used for the 'nogroup' or 'nobody' group
ARG GID=65534
RUN adduser --system --gid ${GID} --uid ${UID} --home /home/worker worker
WORKDIR /home/worker/app
RUN chown worker /home/worker/app
RUN mkdir local_data && chown worker local_data
RUN mkdir models && chown worker models
COPY --chown=worker --from=dependencies /home/worker/app/.venv/ .venv
COPY --chown=worker private_gpt/ private_gpt
COPY --chown=worker *.yaml .
COPY --chown=worker scripts/ scripts
USER worker
ENTRYPOINT python -m private_gpt

View File

@ -1,22 +1,24 @@
# 🔒 PrivateGPT 📑
# PrivateGPT
<a href="https://trendshift.io/repositories/2601" target="_blank"><img src="https://trendshift.io/api/badge/repositories/2601" alt="imartinez%2FprivateGPT | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
[![Tests](https://github.com/zylon-ai/private-gpt/actions/workflows/tests.yml/badge.svg)](https://github.com/zylon-ai/private-gpt/actions/workflows/tests.yml?query=branch%3Amain)
[![Website](https://img.shields.io/website?up_message=check%20it&down_message=down&url=https%3A%2F%2Fdocs.privategpt.dev%2F&label=Documentation)](https://docs.privategpt.dev/)
[![Discord](https://img.shields.io/discord/1164200432894234644?logo=discord&label=PrivateGPT)](https://discord.gg/bK6mRVpErU)
[![X (formerly Twitter) Follow](https://img.shields.io/twitter/follow/ZylonPrivateGPT)](https://twitter.com/ZylonPrivateGPT)
> Install & usage docs: https://docs.privategpt.dev/
>
> Join the community: [Twitter](https://twitter.com/ZylonPrivateGPT) & [Discord](https://discord.gg/bK6mRVpErU)
![Gradio UI](/fern/docs/assets/ui.png?raw=true)
PrivateGPT is a production-ready AI project that allows you to ask questions about your documents using the power
of Large Language Models (LLMs), even in scenarios without an Internet connection. 100% private, no data leaves your
execution environment at any point.
>[!TIP]
> If you are looking for an **enterprise-ready, fully private AI workspace**
> check out [Zylon's website](https://zylon.ai) or [request a demo](https://cal.com/zylon/demo?source=pgpt-readme).
> Crafted by the team behind PrivateGPT, Zylon is a best-in-class AI collaborative
> workspace that can be easily deployed on-premise (data center, bare metal...) or in your private cloud (AWS, GCP, Azure...).
The project provides an API offering all the primitives required to build private, context-aware AI applications.
It follows and extends the [OpenAI API standard](https://openai.com/blog/openai-api),
and supports both normal and streaming responses.
@ -38,13 +40,10 @@ In addition to this, a working [Gradio UI](https://www.gradio.app/)
client is provided to test the API, together with a set of useful tools such as bulk model
download script, ingestion script, documents folder watch, etc.
> 👂 **Need help applying PrivateGPT to your specific use case?**
> [Let us know more about it](https://forms.gle/4cSDmH13RZBHV9at7)
> and we'll try to help! We are refining PrivateGPT through your feedback.
## 🎞️ Overview
DISCLAIMER: This README is not updated as frequently as the [documentation](https://docs.privategpt.dev/).
Please check it out for the latest updates!
>[!WARNING]
> This README is not updated as frequently as the [documentation](https://docs.privategpt.dev/).
> Please check it out for the latest updates!
### Motivation behind PrivateGPT
Generative AI is a game changer for our society, but adoption in companies of all sizes and data-sensitive

View File

@ -1,16 +1,116 @@
services:
private-gpt:
#-----------------------------------
#---- Private-GPT services ---------
#-----------------------------------
# Private-GPT service for the Ollama CPU and GPU modes
# This service builds from an external Dockerfile and runs the Ollama mode.
private-gpt-ollama:
image: ${PGPT_IMAGE:-zylonai/private-gpt}:${PGPT_TAG:-0.6.2}-ollama # x-release-please-version
user: root
build:
dockerfile: Dockerfile.external
context: .
dockerfile: Dockerfile.ollama
volumes:
- ./local_data/:/home/worker/app/local_data
- ./local_data:/home/worker/app/local_data
ports:
- 8001:8080
- "8001:8001"
environment:
PORT: 8080
PORT: 8001
PGPT_PROFILES: docker
PGPT_MODE: ollama
PGPT_EMBED_MODE: ollama
PGPT_OLLAMA_API_BASE: http://ollama:11434
HF_TOKEN: ${HF_TOKEN:-}
profiles:
- ""
- ollama-cpu
- ollama-cuda
- ollama-api
depends_on:
ollama:
condition: service_healthy
# Private-GPT service for the local mode
# This service builds from a local Dockerfile and runs the application in local mode.
private-gpt-llamacpp-cpu:
image: ${PGPT_IMAGE:-zylonai/private-gpt}:${PGPT_TAG:-0.6.2}-llamacpp-cpu # x-release-please-version
user: root
build:
context: .
dockerfile: Dockerfile.llamacpp-cpu
volumes:
- ./local_data/:/home/worker/app/local_data
- ./models/:/home/worker/app/models
entrypoint: sh -c ".venv/bin/python scripts/setup && .venv/bin/python -m private_gpt"
ports:
- "8001:8001"
environment:
PORT: 8001
PGPT_PROFILES: local
HF_TOKEN: ${HF_TOKEN:-}
profiles:
- llamacpp-cpu
#-----------------------------------
#---- Ollama services --------------
#-----------------------------------
# Traefik reverse proxy for the Ollama service
# This will route requests to the Ollama service based on the profile.
ollama:
image: traefik:v2.10
healthcheck:
test: ["CMD", "sh", "-c", "wget -q --spider http://ollama:11434 || exit 1"]
interval: 10s
retries: 3
start_period: 5s
timeout: 5s
ports:
- "8080:8080"
command:
- "--providers.file.filename=/etc/router.yml"
- "--log.level=ERROR"
- "--api.insecure=true"
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--entrypoints.web.address=:11434"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./.docker/router.yml:/etc/router.yml:ro
extra_hosts:
- "host.docker.internal:host-gateway"
profiles:
- ""
- ollama-cpu
- ollama-cuda
- ollama-api
# Ollama service for the CPU mode
ollama-cpu:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ./models:/root/.ollama
profiles:
- ""
- ollama-cpu
# Ollama service for the CUDA mode
ollama-cuda:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ./models:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
profiles:
- ollama-cuda

View File

@ -10,6 +10,9 @@ tabs:
overview:
display-name: Overview
icon: "fa-solid fa-home"
quickstart:
display-name: Quickstart
icon: "fa-solid fa-rocket"
installation:
display-name: Installation
icon: "fa-solid fa-download"
@ -32,6 +35,12 @@ navigation:
contents:
- page: Introduction
path: ./docs/pages/overview/welcome.mdx
- tab: quickstart
layout:
- section: Getting started
contents:
- page: Quickstart
path: ./docs/pages/quickstart/quickstart.mdx
# How to install PrivateGPT, with FAQ and troubleshooting
- tab: installation
layout:
@ -74,14 +83,16 @@ navigation:
path: ./docs/pages/ui/gradio.mdx
- page: Alternatives
path: ./docs/pages/ui/alternatives.mdx
# Small code snippet or example of usage to help users
- tab: recipes
layout:
- section: Choice of LLM
- section: Getting started
contents:
# TODO: add recipes
- page: List of LLMs
path: ./docs/pages/recipes/list-llm.mdx
- page: Quickstart
path: ./docs/pages/recipes/quickstart.mdx
- section: General use cases
contents:
- page: Summarize
path: ./docs/pages/recipes/summarize.mdx
# More advanced usage of PrivateGPT, by API
- tab: api-reference
layout:

Binary file not shown.

Before

Width:  |  Height:  |  Size: 212 KiB

After

Width:  |  Height:  |  Size: 154 KiB

View File

@ -28,6 +28,11 @@ pyenv local 3.11
Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) for dependency management:
Follow the instructions on the official Poetry website to install it.
<Callout intent="warning">
A bug exists in Poetry versions 1.7.0 and earlier. We strongly recommend upgrading to a tested version.
To upgrade Poetry to latest tested version, run `poetry self update 1.8.3` after installing it.
</Callout>
### 4. Optional: Install `make`
To run various scripts, you need to install `make`. Follow the instructions for your operating system:
#### macOS
@ -130,18 +135,22 @@ Go to [ollama.ai](https://ollama.ai/) and follow the instructions to install Oll
After the installation, make sure the Ollama desktop app is closed.
Install the models to be used, the default settings-ollama.yaml is configured to user `mistral 7b` LLM (~4GB) and `nomic-embed-text` Embeddings (~275MB). Therefore:
```bash
ollama pull mistral
ollama pull nomic-embed-text
```
Now, start Ollama service (it will start a local inference server, serving both the LLM and the Embeddings):
```bash
ollama serve
```
Install the models to be used, the default settings-ollama.yaml is configured to user llama3.1 8b LLM (~4GB) and nomic-embed-text Embeddings (~275MB)
By default, PGPT will automatically pull models as needed. This behavior can be changed by modifying the `ollama.autopull_models` property.
In any case, if you want to manually pull models, run the following commands:
```bash
ollama pull llama3.1
ollama pull nomic-embed-text
```
Once done, on a different terminal, you can install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
@ -298,11 +307,12 @@ If you have all required dependencies properly configured running the
following powershell command should succeed.
```powershell
$env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
$env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python numpy==1.26.0
```
If your installation was correct, you should see a message similar to the following next
time you start the server `BLAS = 1`.
time you start the server `BLAS = 1`. If there is some issue, please refer to the
[troubleshooting](/installation/getting-started/troubleshooting#building-llama-cpp-with-nvidia-gpu-support) section.
```console
llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
@ -330,11 +340,12 @@ Some tips:
After that running the following command in the repository will install llama.cpp with GPU support:
```bash
CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python numpy==1.26.0
```
If your installation was correct, you should see a message similar to the following next
time you start the server `BLAS = 1`.
time you start the server `BLAS = 1`. If there is some issue, please refer to the
[troubleshooting](/installation/getting-started/troubleshooting#building-llama-cpp-with-nvidia-gpu-support) section.
```
llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)

View File

@ -24,8 +24,41 @@ PrivateGPT uses the `AutoTokenizer` library to tokenize input text accurately. I
In your `settings.yaml` file, specify the model you want to use:
```yaml
llm:
tokenizer: mistralai/Mistral-7B-Instruct-v0.2
tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct
```
2. **Set Access Token for Gated Models:**
If you are using a gated model, ensure the `access_token` is set as mentioned in the previous section.
This configuration ensures that PrivateGPT can download and use the correct tokenizer for the model you are working with.
This configuration ensures that PrivateGPT can download and use the correct tokenizer for the model you are working with.
# Embedding dimensions mismatch
If you encounter an error message like `Embedding dimensions mismatch`, it is likely due to the embedding model and
current vector dimension mismatch. To resolve this issue, ensure that the model and the input data have the same vector dimensions.
By default, PrivateGPT uses `nomic-embed-text` embeddings, which have a vector dimension of 768.
If you are using a different embedding model, ensure that the vector dimensions match the model's output.
<Callout intent = "warning">
In versions below to 0.6.0, the default embedding model was `BAAI/bge-small-en-v1.5` in `huggingface` setup.
If you plan to reuse the old generated embeddings, you need to update the `settings.yaml` file to use the correct embedding model:
```yaml
huggingface:
embedding_hf_model_name: BAAI/bge-small-en-v1.5
embedding:
embed_dim: 384
```
</Callout>
# Building Llama-cpp with NVIDIA GPU support
## Out-of-memory error
If you encounter an out-of-memory error while running `llama-cpp` with CUDA, you can try the following steps to resolve the issue:
1. **Set the next environment:**
```bash
TOKENIZERS_PARALLELISM=true
```
2. **Run PrivateGPT:**
```bash
poetry run python -m privategpt
```
Give thanks to [MarioRossiGithub](https://github.com/MarioRossiGithub) for providing the following solution.

View File

@ -8,6 +8,14 @@ The ingestion of documents can be done in different ways:
## Bulk Local Ingestion
You will need to activate `data.local_ingestion.enabled` in your setting file to use this feature. Additionally,
it is probably a good idea to set `data.local_ingestion.allow_ingest_from` to specify which folders are allowed to be ingested.
<Callout intent = "warning">
Be careful enabling this feature in a production environment, as it can be a security risk, as it allows users to
ingest any local file with permissions.
</Callout>
When you are running PrivateGPT in a fully local setup, you can ingest a complete folder for convenience (containing
pdf, text files, etc.)
and optionally watch changes on it with the command:

View File

@ -1,5 +1,13 @@
PrivateGPT provides an **API** containing all the building blocks required to
build **private, context-aware AI applications**.
<Callout intent = "tip">
If you are looking for an **enterprise-ready, fully private AI workspace**
check out [Zylon's website](https://zylon.ai) or [request a demo](https://cal.com/zylon/demo?source=pgpt-docs).
Crafted by the team behind PrivateGPT, Zylon is a best-in-class AI collaborative
workspace that can be easily deployed on-premise (data center, bare metal...) or in your private cloud (AWS, GCP, Azure...).
</Callout>
The API follows and extends OpenAI API standard, and supports both normal and streaming responses.
That means that, if you can use OpenAI API in one of your tools, you can use your own PrivateGPT API instead,
with no code changes, **and for free** if you are running PrivateGPT in a `local` setup.

View File

@ -0,0 +1,105 @@
This guide provides a quick start for running different profiles of PrivateGPT using Docker Compose.
The profiles cater to various environments, including Ollama setups (CPU, CUDA, MacOS), and a fully local setup.
By default, Docker Compose will download pre-built images from a remote registry when starting the services. However, you have the option to build the images locally if needed. Details on building Docker image locally are provided at the end of this guide.
If you want to run PrivateGPT locally without Docker, refer to the [Local Installation Guide](/installation).
## Prerequisites
- **Docker and Docker Compose:** Ensure both are installed on your system.
[Installation Guide for Docker](https://docs.docker.com/get-docker/), [Installation Guide for Docker Compose](https://docs.docker.com/compose/install/).
- **Clone PrivateGPT Repository:** Clone the PrivateGPT repository to your machine and navigate to the directory:
```sh
git clone https://github.com/zylon-ai/private-gpt.git
cd private-gpt
```
## Setups
### Ollama Setups (Recommended)
#### 1. Default/Ollama CPU
**Description:**
This profile runs the Ollama service using CPU resources. It is the standard configuration for running Ollama-based Private-GPT services without GPU acceleration.
**Run:**
To start the services using pre-built images, run:
```sh
docker-compose up
```
or with a specific profile:
```sh
docker-compose --profile ollama-cpu up
```
#### 2. Ollama Nvidia CUDA
**Description:**
This profile leverages GPU acceleration with CUDA support, suitable for computationally intensive tasks that benefit from GPU resources.
**Requirements:**
Ensure that your system has compatible GPU hardware and the necessary NVIDIA drivers installed. The installation process is detailed [here](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html).
**Run:**
To start the services with CUDA support using pre-built images, run:
```sh
docker-compose --profile ollama-cuda up
```
#### 3. Ollama External API
**Description:**
This profile is designed for running PrivateGPT using Ollama installed on the host machine. This setup is particularly useful for MacOS users, as Docker does not yet support Metal GPU.
**Requirements:**
Install Ollama on your machine by following the instructions at [ollama.ai](https://ollama.ai/).
**Run:**
To start the Ollama service, use:
```sh
OLLAMA_HOST=0.0.0.0 ollama serve
```
To start the services with the host configuration using pre-built images, run:
```sh
docker-compose --profile ollama-api up
```
### Fully Local Setups
#### 1. LlamaCPP CPU
**Description:**
This profile runs the Private-GPT services locally using `llama-cpp` and Hugging Face models.
**Requirements:**
A **Hugging Face Token (HF_TOKEN)** is required for accessing Hugging Face models. Obtain your token following [this guide](/installation/getting-started/troubleshooting#downloading-gated-and-private-models).
**Run:**
Start the services with your Hugging Face token using pre-built images:
```sh
HF_TOKEN=<your_hf_token> docker-compose --profile llamacpp-cpu up
```
Replace `<your_hf_token>` with your actual Hugging Face token.
## Building Locally
If you prefer to build Docker images locally, which is useful when making changes to the codebase or the Dockerfiles, follow these steps:
### Building Locally
To build the Docker images locally, navigate to the cloned repository directory and run:
```sh
docker-compose build
```
This command compiles the necessary Docker images based on the current codebase and Dockerfile configurations.
### Forcing a Rebuild with --build
If you have made changes and need to ensure these changes are reflected in the Docker images, you can force a rebuild before starting the services:
```sh
docker-compose up --build
```
or with a specific profile:
```sh
docker-compose --profile <profile_name> up --build
```
Replace `<profile_name>` with the desired profile.

View File

@ -1,122 +0,0 @@
# List of working LLM
**Do you have any working combination of LLM and embeddings?**
Please open a PR to add it to the list, and come on our Discord to tell us about it!
## Prompt style
LLMs might have been trained with different prompt styles.
The prompt style is the way the prompt is written, and how the system message is injected in the prompt.
For example, `llama2` looks like this:
```text
<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>
{{ user_message }} [/INST]
```
While `default` (the `llama_index` default) looks like this:
```text
system: {{ system_prompt }}
user: {{ user_message }}
assistant: {{ assistant_message }}
```
The "`tag`" style looks like this:
```text
<|system|>: {{ system_prompt }}
<|user|>: {{ user_message }}
<|assistant|>: {{ assistant_message }}
```
The "`mistral`" style looks like this:
```text
<s>[INST] You are an AI assistant. [/INST]</s>[INST] Hello, how are you doing? [/INST]
```
The "`chatml`" style looks like this:
```text
<|im_start|>system
{{ system_prompt }}<|im_end|>
<|im_start|>user"
{{ user_message }}<|im_end|>
<|im_start|>assistant
{{ assistant_message }}
```
Some LLMs will not understand these prompt styles, and will not work (returning nothing).
You can try to change the prompt style to `default` (or `tag`) in the settings, and it will
change the way the messages are formatted to be passed to the LLM.
## Example of configuration
You might want to change the prompt depending on the language and model you are using.
### English, with instructions
`settings-en.yaml`:
```yml
local:
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.1-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.1.Q4_K_M.gguf
embedding_hf_model_name: BAAI/bge-small-en-v1.5
prompt_style: "llama2"
```
### French, with instructions
`settings-fr.yaml`:
```yml
local:
llm_hf_repo_id: TheBloke/Vigogne-2-7B-Instruct-GGUF
llm_hf_model_file: vigogne-2-7b-instruct.Q4_K_M.gguf
embedding_hf_model_name: dangvantuan/sentence-camembert-base
prompt_style: "default"
# prompt_style: "tag" # also works
# The default system prompt is injected only when the `prompt_style` != default, and there are no system message in the discussion
# default_system_prompt: Vous êtes un assistant IA qui répond à la question posée à la fin en utilisant le contexte suivant. Si vous ne connaissez pas la réponse, dites simplement que vous ne savez pas, n'essayez pas d'inventer une réponse. Veuillez répondre exclusivement en français.
```
You might want to change the prompt as the one above might not directly answer your question.
You can read online about how to write a good prompt, but in a nutshell, make it (extremely) directive.
You can try and troubleshot your prompt by writing multiline requests in the UI, while
writing your interaction with the model, for example:
```text
Tu es un programmeur senior qui programme en python et utilise le framework fastapi. Ecrit moi un serveur qui retourne "hello world".
```
Another example:
```text
Context: None
Situation: tu es au milieu d'un champ.
Tache: va a la rivière, en bas du champ.
Décrit comment aller a la rivière.
```
### Optimised Models
GodziLLa2-70B LLM (English, rank 2 on HuggingFace OpenLLM Leaderboard), bge large Embedding Model (rank 1 on HuggingFace MTEB Leaderboard)
`settings-optimised.yaml`:
```yml
local:
llm_hf_repo_id: TheBloke/GodziLLa2-70B-GGUF
llm_hf_model_file: godzilla2-70b.Q4_K_M.gguf
embedding_hf_model_name: BAAI/bge-large-en
prompt_style: "llama2"
```
### German speaking model
`settings-de.yaml`:
```yml
local:
llm_hf_repo_id: TheBloke/em_german_leo_mistral-GGUF
llm_hf_model_file: em_german_leo_mistral.Q4_K_M.gguf
embedding_hf_model_name: T-Systems-onsite/german-roberta-sentence-transformer-v2
#llama, default or tag
prompt_style: "default"
```

View File

@ -0,0 +1,23 @@
# Recipes
Recipes are predefined use cases that help users solve very specific tasks using PrivateGPT.
They provide a streamlined approach to achieve common goals with the platform, offering both a starting point and inspiration for further exploration.
The main goal of Recipes is to empower the community to create and share solutions, expanding the capabilities of PrivateGPT.
## How to Create a New Recipe
1. **Identify the Task**: Define a specific task or problem that the Recipe will address.
2. **Develop the Solution**: Create a clear and concise guide, including any necessary code snippets or configurations.
3. **Submit a PR**: Fork the PrivateGPT repository, add your Recipe to the appropriate section, and submit a PR for review.
We encourage you to be creative and think outside the box! Your contributions help shape the future of PrivateGPT.
## Available Recipes
<Cards>
<Card
title="Summarize"
icon="fa-solid fa-file-alt"
href="/recipes/general-use-cases/summarize"
/>
</Cards>

View File

@ -0,0 +1,20 @@
The Summarize Recipe provides a method to extract concise summaries from ingested documents or texts using PrivateGPT.
This tool is particularly useful for quickly understanding large volumes of information by distilling key points and main ideas.
## Use Case
The primary use case for the `Summarize` tool is to automate the summarization of lengthy documents,
making it easier for users to grasp the essential information without reading through entire texts.
This can be applied in various scenarios, such as summarizing research papers, news articles, or business reports.
## Key Features
1. **Ingestion-compatible**: The user provides the text to be summarized. The text can be directly inputted or retrieved from ingested documents within the system.
2. **Customization**: The summary generation can be influenced by providing specific `instructions` or a `prompt`. These inputs guide the model on how to frame the summary, allowing for customization according to user needs.
3. **Streaming Support**: The tool supports streaming, allowing for real-time summary generation, which can be particularly useful for handling large texts or providing immediate feedback.
## Contributing
If you have ideas for improving the Summarize or want to add new features, feel free to contribute!
You can submit your enhancements via a pull request on our [GitHub repository](https://github.com/zylon-ai/private-gpt).

View File

@ -339,6 +339,48 @@
}
}
},
"/v1/summarize": {
"post": {
"tags": [
"Recipes"
],
"summary": "Summarize",
"description": "Given a text, the model will return a summary.\n\nOptionally include `instructions` to influence the way the summary is generated.\n\nIf `use_context`\nis set to `true`, the model will also use the content coming from the ingested\ndocuments in the summary. The documents being used can\nbe filtered by their metadata using the `context_filter`.\nIngested documents metadata can be found using `/ingest/list` endpoint.\nIf you want all ingested documents to be used, remove `context_filter` altogether.\n\nIf `prompt` is set, it will be used as the prompt for the summarization,\notherwise the default prompt will be used.\n\nWhen using `'stream': true`, the API will return data chunks following [OpenAI's\nstreaming model](https://platform.openai.com/docs/api-reference/chat/streaming):\n```\n{\"id\":\"12345\",\"object\":\"completion.chunk\",\"created\":1694268190,\n\"model\":\"private-gpt\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\"Hello\"},\n\"finish_reason\":null}]}\n```",
"operationId": "summarize_v1_summarize_post",
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/SummarizeBody"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/SummarizeResponse"
}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/v1/embeddings": {
"post": {
"tags": [
@ -500,6 +542,10 @@
"Chunk": {
"properties": {
"object": {
"type": "string",
"enum": [
"context.chunk"
],
"const": "context.chunk",
"title": "Object"
},
@ -612,10 +658,18 @@
"ChunksResponse": {
"properties": {
"object": {
"type": "string",
"enum": [
"list"
],
"const": "list",
"title": "Object"
},
"model": {
"type": "string",
"enum": [
"private-gpt"
],
"const": "private-gpt",
"title": "Model"
},
@ -728,6 +782,10 @@
"title": "Index"
},
"object": {
"type": "string",
"enum": [
"embedding"
],
"const": "embedding",
"title": "Object"
},
@ -779,10 +837,18 @@
"EmbeddingsResponse": {
"properties": {
"object": {
"type": "string",
"enum": [
"list"
],
"const": "list",
"title": "Object"
},
"model": {
"type": "string",
"enum": [
"private-gpt"
],
"const": "private-gpt",
"title": "Model"
},
@ -818,6 +884,10 @@
"HealthResponse": {
"properties": {
"status": {
"type": "string",
"enum": [
"ok"
],
"const": "ok",
"title": "Status",
"default": "ok"
@ -829,10 +899,18 @@
"IngestResponse": {
"properties": {
"object": {
"type": "string",
"enum": [
"list"
],
"const": "list",
"title": "Object"
},
"model": {
"type": "string",
"enum": [
"private-gpt"
],
"const": "private-gpt",
"title": "Model"
},
@ -879,6 +957,10 @@
"IngestedDoc": {
"properties": {
"object": {
"type": "string",
"enum": [
"ingest.document"
],
"const": "ingest.document",
"title": "Object"
},
@ -1001,6 +1083,10 @@
]
},
"model": {
"type": "string",
"enum": [
"private-gpt"
],
"const": "private-gpt",
"title": "Model"
},
@ -1074,6 +1160,78 @@
"title": "OpenAIMessage",
"description": "Inference result, with the source of the message.\n\nRole could be the assistant or system\n(providing a default response, not AI generated)."
},
"SummarizeBody": {
"properties": {
"text": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Text"
},
"use_context": {
"type": "boolean",
"title": "Use Context",
"default": false
},
"context_filter": {
"anyOf": [
{
"$ref": "#/components/schemas/ContextFilter"
},
{
"type": "null"
}
]
},
"prompt": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Prompt"
},
"instructions": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Instructions"
},
"stream": {
"type": "boolean",
"title": "Stream",
"default": false
}
},
"type": "object",
"title": "SummarizeBody"
},
"SummarizeResponse": {
"properties": {
"summary": {
"type": "string",
"title": "Summary"
}
},
"type": "object",
"required": [
"summary"
],
"title": "SummarizeResponse"
},
"ValidationError": {
"properties": {
"loc": {

6528
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -31,6 +31,7 @@ class EmbeddingComponent:
self.embedding_model = HuggingFaceEmbedding(
model_name=settings.huggingface.embedding_hf_model_name,
cache_folder=str(models_cache_path),
trust_remote_code=settings.huggingface.trust_remote_code,
)
case "sagemaker":
try:
@ -71,16 +72,46 @@ class EmbeddingComponent:
from llama_index.embeddings.ollama import ( # type: ignore
OllamaEmbedding,
)
from ollama import Client # type: ignore
except ImportError as e:
raise ImportError(
"Local dependencies not found, install with `poetry install --extras embeddings-ollama`"
) from e
ollama_settings = settings.ollama
# Calculate embedding model. If not provided tag, it will be use latest
model_name = (
ollama_settings.embedding_model + ":latest"
if ":" not in ollama_settings.embedding_model
else ollama_settings.embedding_model
)
self.embedding_model = OllamaEmbedding(
model_name=ollama_settings.embedding_model,
model_name=model_name,
base_url=ollama_settings.embedding_api_base,
)
if ollama_settings.autopull_models:
if ollama_settings.autopull_models:
from private_gpt.utils.ollama import (
check_connection,
pull_model,
)
# TODO: Reuse llama-index client when llama-index is updated
client = Client(
host=ollama_settings.embedding_api_base,
timeout=ollama_settings.request_timeout,
)
if not check_connection(client):
raise ValueError(
f"Failed to connect to Ollama, "
f"check if Ollama server is running on {ollama_settings.api_base}"
)
pull_model(client, model_name)
case "azopenai":
try:
from llama_index.embeddings.azure_openai import ( # type: ignore
@ -113,6 +144,23 @@ class EmbeddingComponent:
api_key=settings.gemini.api_key,
model_name=settings.gemini.embedding_model,
)
case "mistralai":
try:
from llama_index.embeddings.mistralai import ( # type: ignore
MistralAIEmbedding,
)
except ImportError as e:
raise ImportError(
"Mistral dependencies not found, install with `poetry install --extras embeddings-mistral`"
) from e
api_key = settings.openai.api_key
model = settings.openai.embedding_model
self.embedding_model = MistralAIEmbedding(
api_key=api_key,
model=model,
)
case "mock":
# Not a random number, is the dimensionality used by
# the default embedding model

View File

@ -403,7 +403,7 @@ class PipelineIngestComponent(BaseIngestComponentWithIndex):
self.transformations,
show_progress=self.show_progress,
)
self.node_q.put(("process", file_name, documents, nodes))
self.node_q.put(("process", file_name, documents, list(nodes)))
finally:
self.doc_semaphore.release()
self.doc_q.task_done() # unblock Q joins

View File

@ -92,7 +92,13 @@ class IngestionHelper:
return string_reader.load_data([file_data.read_text()])
logger.debug("Specific reader found for extension=%s", extension)
return reader_cls().load_data(file_data)
documents = reader_cls().load_data(file_data)
# Sanitize NUL bytes in text which can't be stored in Postgres
for i in range(len(documents)):
documents[i].text = documents[i].text.replace("\u0000", "")
return documents
@staticmethod
def _exclude_metadata(documents: list[Document]) -> None:

View File

@ -120,7 +120,6 @@ class LLMComponent:
api_version="",
temperature=settings.llm.temperature,
context_window=settings.llm.context_window,
max_new_tokens=settings.llm.max_new_tokens,
messages_to_prompt=prompt_style.messages_to_prompt,
completion_to_prompt=prompt_style.completion_to_prompt,
tokenizer=settings.llm.tokenizer,
@ -146,8 +145,15 @@ class LLMComponent:
"repeat_penalty": ollama_settings.repeat_penalty, # ollama llama-cpp
}
self.llm = Ollama(
model=ollama_settings.llm_model,
# calculate llm model. If not provided tag, it will be use latest
model_name = (
ollama_settings.llm_model + ":latest"
if ":" not in ollama_settings.llm_model
else ollama_settings.llm_model
)
llm = Ollama(
model=model_name,
base_url=ollama_settings.api_base,
temperature=settings.llm.temperature,
context_window=settings.llm.context_window,
@ -155,6 +161,16 @@ class LLMComponent:
request_timeout=ollama_settings.request_timeout,
)
if ollama_settings.autopull_models:
from private_gpt.utils.ollama import check_connection, pull_model
if not check_connection(llm.client):
raise ValueError(
f"Failed to connect to Ollama, "
f"check if Ollama server is running on {ollama_settings.api_base}"
)
pull_model(llm.client, model_name)
if (
ollama_settings.keep_alive
!= ollama_settings.model_fields["keep_alive"].default
@ -167,10 +183,12 @@ class LLMComponent:
return wrapper
Ollama.chat = add_keep_alive(Ollama.chat)
Ollama.stream_chat = add_keep_alive(Ollama.stream_chat)
Ollama.complete = add_keep_alive(Ollama.complete)
Ollama.stream_complete = add_keep_alive(Ollama.stream_complete)
Ollama.chat = add_keep_alive(Ollama.chat) # type: ignore
Ollama.stream_chat = add_keep_alive(Ollama.stream_chat) # type: ignore
Ollama.complete = add_keep_alive(Ollama.complete) # type: ignore
Ollama.stream_complete = add_keep_alive(Ollama.stream_complete) # type: ignore
self.llm = llm
case "azopenai":
try:

View File

@ -40,7 +40,8 @@ class AbstractPromptStyle(abc.ABC):
logger.debug("Got for messages='%s' the prompt='%s'", messages, prompt)
return prompt
def completion_to_prompt(self, completion: str) -> str:
def completion_to_prompt(self, prompt: str) -> str:
completion = prompt # Fix: Llama-index parameter has to be named as prompt
prompt = self._completion_to_prompt(completion)
logger.debug("Got for completion='%s' the prompt='%s'", completion, prompt)
return prompt
@ -138,6 +139,72 @@ class Llama2PromptStyle(AbstractPromptStyle):
)
class Llama3PromptStyle(AbstractPromptStyle):
r"""Template for Meta's Llama 3.1.
The format follows this structure:
<|begin_of_text|>
<|start_header_id|>system<|end_header_id|>
[System message content]<|eot_id|>
<|start_header_id|>user<|end_header_id|>
[User message content]<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
[Assistant message content]<|eot_id|>
...
(Repeat for each message, including possible 'ipython' role)
"""
BOS, EOS = "<|begin_of_text|>", "<|end_of_text|>"
B_INST, E_INST = "<|start_header_id|>", "<|end_header_id|>"
EOT = "<|eot_id|>"
B_SYS, E_SYS = "<|start_header_id|>system<|end_header_id|>", "<|eot_id|>"
ASSISTANT_INST = "<|start_header_id|>assistant<|end_header_id|>"
DEFAULT_SYSTEM_PROMPT = """\
You are a helpful, respectful and honest assistant. \
Always answer as helpfully as possible and follow ALL given instructions. \
Do not speculate or make up information. \
Do not reference any given instructions or context. \
"""
def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
prompt = ""
has_system_message = False
for i, message in enumerate(messages):
if not message or message.content is None:
continue
if message.role == MessageRole.SYSTEM:
prompt += f"{self.B_SYS}\n\n{message.content.strip()}{self.E_SYS}"
has_system_message = True
else:
role_header = f"{self.B_INST}{message.role.value}{self.E_INST}"
prompt += f"{role_header}\n\n{message.content.strip()}{self.EOT}"
# Add assistant header if the last message is not from the assistant
if i == len(messages) - 1 and message.role != MessageRole.ASSISTANT:
prompt += f"{self.ASSISTANT_INST}\n\n"
# Add default system prompt if no system message was provided
if not has_system_message:
prompt = (
f"{self.B_SYS}\n\n{self.DEFAULT_SYSTEM_PROMPT}{self.E_SYS}" + prompt
)
# TODO: Implement tool handling logic
return prompt
def _completion_to_prompt(self, completion: str) -> str:
return (
f"{self.B_SYS}\n\n{self.DEFAULT_SYSTEM_PROMPT}{self.E_SYS}"
f"{self.B_INST}user{self.E_INST}\n\n{completion.strip()}{self.EOT}"
f"{self.ASSISTANT_INST}\n\n"
)
class TagPromptStyle(AbstractPromptStyle):
"""Tag prompt style (used by Vigogne) that uses the prompt style `<|ROLE|>`.
@ -219,7 +286,9 @@ class ChatMLPromptStyle(AbstractPromptStyle):
def get_prompt_style(
prompt_style: Literal["default", "llama2", "tag", "mistral", "chatml"] | None
prompt_style: (
Literal["default", "llama2", "llama3", "tag", "mistral", "chatml"] | None
)
) -> AbstractPromptStyle:
"""Get the prompt style to use from the given string.
@ -230,6 +299,8 @@ def get_prompt_style(
return DefaultPromptStyle()
elif prompt_style == "llama2":
return Llama2PromptStyle()
elif prompt_style == "llama3":
return Llama3PromptStyle()
elif prompt_style == "tag":
return TagPromptStyle()
elif prompt_style == "mistral":

View File

@ -38,10 +38,10 @@ class NodeStoreComponent:
case "postgres":
try:
from llama_index.core.storage.docstore.postgres_docstore import (
from llama_index.storage.docstore.postgres import ( # type: ignore
PostgresDocumentStore,
)
from llama_index.core.storage.index_store.postgres_index_store import (
from llama_index.storage.index_store.postgres import ( # type: ignore
PostgresIndexStore,
)
except ImportError:
@ -55,6 +55,7 @@ class NodeStoreComponent:
self.index_store = PostgresIndexStore.from_params(
**settings.postgres.model_dump(exclude_none=True)
)
self.doc_store = PostgresDocumentStore.from_params(
**settings.postgres.model_dump(exclude_none=True)
)

View File

@ -1,14 +1,17 @@
from collections.abc import Generator
from typing import Any
from collections.abc import Generator, Sequence
from typing import TYPE_CHECKING, Any
from llama_index.core.schema import BaseNode, MetadataMode
from llama_index.core.vector_stores.utils import node_to_metadata_dict
from llama_index.vector_stores.chroma import ChromaVectorStore # type: ignore
if TYPE_CHECKING:
from collections.abc import Mapping
def chunk_list(
lst: list[BaseNode], max_chunk_size: int
) -> Generator[list[BaseNode], None, None]:
lst: Sequence[BaseNode], max_chunk_size: int
) -> Generator[Sequence[BaseNode], None, None]:
"""Yield successive max_chunk_size-sized chunks from lst.
Args:
@ -60,7 +63,7 @@ class BatchedChromaVectorStore(ChromaVectorStore): # type: ignore
)
self.chroma_client = chroma_client
def add(self, nodes: list[BaseNode], **add_kwargs: Any) -> list[str]:
def add(self, nodes: Sequence[BaseNode], **add_kwargs: Any) -> list[str]:
"""Add nodes to index, batching the insertion to avoid issues.
Args:
@ -78,8 +81,8 @@ class BatchedChromaVectorStore(ChromaVectorStore): # type: ignore
all_ids = []
for node_chunk in node_chunks:
embeddings = []
metadatas = []
embeddings: list[Sequence[float]] = []
metadatas: list[Mapping[str, Any]] = []
ids = []
documents = []
for node in node_chunk:

View File

@ -15,6 +15,7 @@ from private_gpt.server.completions.completions_router import completions_router
from private_gpt.server.embeddings.embeddings_router import embeddings_router
from private_gpt.server.health.health_router import health_router
from private_gpt.server.ingest.ingest_router import ingest_router
from private_gpt.server.recipes.summarize.summarize_router import summarize_router
from private_gpt.settings.settings import Settings
logger = logging.getLogger(__name__)
@ -32,12 +33,14 @@ def create_app(root_injector: Injector) -> FastAPI:
app.include_router(chat_router)
app.include_router(chunks_router)
app.include_router(ingest_router)
app.include_router(summarize_router)
app.include_router(embeddings_router)
app.include_router(health_router)
# Add LlamaIndex simple observability
global_handler = create_global_handler("simple")
LlamaIndexSettings.callback_manager = CallbackManager([global_handler])
if global_handler:
LlamaIndexSettings.callback_manager = CallbackManager([global_handler])
settings = root_injector.get(Settings)
if settings.server.cors.enabled:

View File

@ -1,4 +1,5 @@
from dataclasses import dataclass
from typing import TYPE_CHECKING
from injector import inject, singleton
from llama_index.core.chat_engine import ContextChatEngine, SimpleChatEngine
@ -26,6 +27,9 @@ from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.server.chunks.chunks_service import Chunk
from private_gpt.settings.settings import Settings
if TYPE_CHECKING:
from llama_index.core.postprocessor.types import BaseNodePostprocessor
class Completion(BaseModel):
response: str
@ -114,12 +118,15 @@ class ChatService:
context_filter=context_filter,
similarity_top_k=self.settings.rag.similarity_top_k,
)
node_postprocessors = [
node_postprocessors: list[BaseNodePostprocessor] = [
MetadataReplacementPostProcessor(target_metadata_key="window"),
SimilarityPostprocessor(
similarity_cutoff=settings.rag.similarity_value
),
]
if settings.rag.similarity_value:
node_postprocessors.append(
SimilarityPostprocessor(
similarity_cutoff=settings.rag.similarity_value
)
)
if settings.rag.rerank.enabled:
rerank_postprocessor = SentenceTransformerRerank(

View File

@ -0,0 +1,86 @@
from fastapi import APIRouter, Depends, Request
from pydantic import BaseModel
from starlette.responses import StreamingResponse
from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.open_ai.openai_models import (
to_openai_sse_stream,
)
from private_gpt.server.recipes.summarize.summarize_service import SummarizeService
from private_gpt.server.utils.auth import authenticated
summarize_router = APIRouter(prefix="/v1", dependencies=[Depends(authenticated)])
class SummarizeBody(BaseModel):
text: str | None = None
use_context: bool = False
context_filter: ContextFilter | None = None
prompt: str | None = None
instructions: str | None = None
stream: bool = False
class SummarizeResponse(BaseModel):
summary: str
@summarize_router.post(
"/summarize",
response_model=None,
summary="Summarize",
responses={200: {"model": SummarizeResponse}},
tags=["Recipes"],
)
def summarize(
request: Request, body: SummarizeBody
) -> SummarizeResponse | StreamingResponse:
"""Given a text, the model will return a summary.
Optionally include `instructions` to influence the way the summary is generated.
If `use_context`
is set to `true`, the model will also use the content coming from the ingested
documents in the summary. The documents being used can
be filtered by their metadata using the `context_filter`.
Ingested documents metadata can be found using `/ingest/list` endpoint.
If you want all ingested documents to be used, remove `context_filter` altogether.
If `prompt` is set, it will be used as the prompt for the summarization,
otherwise the default prompt will be used.
When using `'stream': true`, the API will return data chunks following [OpenAI's
streaming model](https://platform.openai.com/docs/api-reference/chat/streaming):
```
{"id":"12345","object":"completion.chunk","created":1694268190,
"model":"private-gpt","choices":[{"index":0,"delta":{"content":"Hello"},
"finish_reason":null}]}
```
"""
service: SummarizeService = request.state.injector.get(SummarizeService)
if body.stream:
completion_gen = service.stream_summarize(
text=body.text,
instructions=body.instructions,
use_context=body.use_context,
context_filter=body.context_filter,
prompt=body.prompt,
)
return StreamingResponse(
to_openai_sse_stream(
response_generator=completion_gen,
),
media_type="text/event-stream",
)
else:
completion = service.summarize(
text=body.text,
instructions=body.instructions,
use_context=body.use_context,
context_filter=body.context_filter,
prompt=body.prompt,
)
return SummarizeResponse(
summary=completion,
)

View File

@ -0,0 +1,172 @@
from itertools import chain
from injector import inject, singleton
from llama_index.core import (
Document,
StorageContext,
SummaryIndex,
)
from llama_index.core.base.response.schema import Response, StreamingResponse
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.response_synthesizers import ResponseMode
from llama_index.core.storage.docstore.types import RefDocInfo
from llama_index.core.types import TokenGen
from private_gpt.components.embedding.embedding_component import EmbeddingComponent
from private_gpt.components.llm.llm_component import LLMComponent
from private_gpt.components.node_store.node_store_component import NodeStoreComponent
from private_gpt.components.vector_store.vector_store_component import (
VectorStoreComponent,
)
from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.settings.settings import Settings
DEFAULT_SUMMARIZE_PROMPT = (
"Provide a comprehensive summary of the provided context information. "
"The summary should cover all the key points and main ideas presented in "
"the original text, while also condensing the information into a concise "
"and easy-to-understand format. Please ensure that the summary includes "
"relevant details and examples that support the main ideas, while avoiding "
"any unnecessary information or repetition."
)
@singleton
class SummarizeService:
@inject
def __init__(
self,
settings: Settings,
llm_component: LLMComponent,
node_store_component: NodeStoreComponent,
vector_store_component: VectorStoreComponent,
embedding_component: EmbeddingComponent,
) -> None:
self.settings = settings
self.llm_component = llm_component
self.node_store_component = node_store_component
self.vector_store_component = vector_store_component
self.embedding_component = embedding_component
self.storage_context = StorageContext.from_defaults(
vector_store=vector_store_component.vector_store,
docstore=node_store_component.doc_store,
index_store=node_store_component.index_store,
)
@staticmethod
def _filter_ref_docs(
ref_docs: dict[str, RefDocInfo], context_filter: ContextFilter | None
) -> list[RefDocInfo]:
if context_filter is None or not context_filter.docs_ids:
return list(ref_docs.values())
return [
ref_doc
for doc_id, ref_doc in ref_docs.items()
if doc_id in context_filter.docs_ids
]
def _summarize(
self,
use_context: bool = False,
stream: bool = False,
text: str | None = None,
instructions: str | None = None,
context_filter: ContextFilter | None = None,
prompt: str | None = None,
) -> str | TokenGen:
nodes_to_summarize = []
# Add text to summarize
if text:
text_documents = [Document(text=text)]
nodes_to_summarize += (
SentenceSplitter.from_defaults().get_nodes_from_documents(
text_documents
)
)
# Add context documents to summarize
if use_context:
# 1. Recover all ref docs
ref_docs: dict[str, RefDocInfo] | None = (
self.storage_context.docstore.get_all_ref_doc_info()
)
if ref_docs is None:
raise ValueError("No documents have been ingested yet.")
# 2. Filter documents based on context_filter (if provided)
filtered_ref_docs = self._filter_ref_docs(ref_docs, context_filter)
# 3. Get all nodes from the filtered documents
filtered_node_ids = chain.from_iterable(
[ref_doc.node_ids for ref_doc in filtered_ref_docs]
)
filtered_nodes = self.storage_context.docstore.get_nodes(
node_ids=list(filtered_node_ids),
)
nodes_to_summarize += filtered_nodes
# Create a SummaryIndex to summarize the nodes
summary_index = SummaryIndex(
nodes=nodes_to_summarize,
storage_context=StorageContext.from_defaults(), # In memory SummaryIndex
show_progress=True,
)
# Make a tree summarization query
# above the set of all candidate nodes
query_engine = summary_index.as_query_engine(
llm=self.llm_component.llm,
response_mode=ResponseMode.TREE_SUMMARIZE,
streaming=stream,
use_async=self.settings.summarize.use_async,
)
prompt = prompt or DEFAULT_SUMMARIZE_PROMPT
summarize_query = prompt + "\n" + (instructions or "")
response = query_engine.query(summarize_query)
if isinstance(response, Response):
return response.response or ""
elif isinstance(response, StreamingResponse):
return response.response_gen
else:
raise TypeError(f"The result is not of a supported type: {type(response)}")
def summarize(
self,
use_context: bool = False,
text: str | None = None,
instructions: str | None = None,
context_filter: ContextFilter | None = None,
prompt: str | None = None,
) -> str:
return self._summarize(
use_context=use_context,
stream=False,
text=text,
instructions=instructions,
context_filter=context_filter,
prompt=prompt,
) # type: ignore
def stream_summarize(
self,
use_context: bool = False,
text: str | None = None,
instructions: str | None = None,
context_filter: ContextFilter | None = None,
prompt: str | None = None,
) -> TokenGen:
return self._summarize(
use_context=use_context,
stream=True,
text=text,
instructions=instructions,
context_filter=context_filter,
prompt=prompt,
) # type: ignore

View File

@ -59,6 +59,27 @@ class AuthSettings(BaseModel):
)
class IngestionSettings(BaseModel):
"""Ingestion configuration.
This configuration is used to control the ingestion of data into the system
using non-server methods. This is useful for local development and testing;
or to ingest in bulk from a folder.
Please note that this configuration is not secure and should be used in
a controlled environment only (setting right permissions, etc.).
"""
enabled: bool = Field(
description="Flag indicating if local ingestion is enabled or not.",
default=False,
)
allow_ingest_from: list[str] = Field(
description="A list of folders that should be permitted to make ingest requests.",
default=[],
)
class ServerSettings(BaseModel):
env_name: str = Field(
description="Name of the environment (prod, staging, local...)"
@ -74,6 +95,10 @@ class ServerSettings(BaseModel):
class DataSettings(BaseModel):
local_ingestion: IngestionSettings = Field(
description="Ingestion configuration",
default_factory=lambda: IngestionSettings(allow_ingest_from=["*"]),
)
local_data_folder: str = Field(
description="Path to local storage."
"It will be treated as an absolute path if it starts with /"
@ -111,16 +136,19 @@ class LLMSettings(BaseModel):
0.1,
description="The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual.",
)
prompt_style: Literal["default", "llama2", "tag", "mistral", "chatml"] = Field(
"llama2",
description=(
"The prompt style to use for the chat engine. "
"If `default` - use the default prompt style from the llama_index. It should look like `role: message`.\n"
"If `llama2` - use the llama2 prompt style from the llama_index. Based on `<s>`, `[INST]` and `<<SYS>>`.\n"
"If `tag` - use the `tag` prompt style. It should look like `<|role|>: message`. \n"
"If `mistral` - use the `mistral prompt style. It shoudl look like <s>[INST] {System Prompt} [/INST]</s>[INST] { UserInstructions } [/INST]"
"`llama2` is the historic behaviour. `default` might work better with your custom models."
),
prompt_style: Literal["default", "llama2", "llama3", "tag", "mistral", "chatml"] = (
Field(
"llama2",
description=(
"The prompt style to use for the chat engine. "
"If `default` - use the default prompt style from the llama_index. It should look like `role: message`.\n"
"If `llama2` - use the llama2 prompt style from the llama_index. Based on `<s>`, `[INST]` and `<<SYS>>`.\n"
"If `llama3` - use the llama3 prompt style from the llama_index."
"If `tag` - use the `tag` prompt style. It should look like `<|role|>: message`. \n"
"If `mistral` - use the `mistral prompt style. It shoudl look like <s>[INST] {System Prompt} [/INST]</s>[INST] { UserInstructions } [/INST]"
"`llama2` is the historic behaviour. `default` might work better with your custom models."
),
)
)
@ -161,11 +189,22 @@ class HuggingFaceSettings(BaseModel):
None,
description="Huggingface access token, required to download some models",
)
trust_remote_code: bool = Field(
False,
description="If set to True, the code from the remote model will be trusted and executed.",
)
class EmbeddingSettings(BaseModel):
mode: Literal[
"huggingface", "openai", "azopenai", "sagemaker", "ollama", "mock", "gemini"
"huggingface",
"openai",
"azopenai",
"sagemaker",
"ollama",
"mock",
"gemini",
"mistralai",
]
ingest_mode: Literal["simple", "batch", "parallel", "pipeline"] = Field(
"simple",
@ -290,6 +329,10 @@ class OllamaSettings(BaseModel):
120.0,
description="Time elapsed until ollama times out the request. Default is 120s. Format is float. ",
)
autopull_models: bool = Field(
False,
description="If set to True, the Ollama will automatically pull the models from the API base.",
)
class AzureOpenAISettings(BaseModel):
@ -314,6 +357,10 @@ class AzureOpenAISettings(BaseModel):
class UISettings(BaseModel):
enabled: bool
path: str
default_mode: Literal["RAG", "Search", "Basic", "Summarize"] = Field(
"RAG",
description="The default mode.",
)
default_chat_system_prompt: str = Field(
None,
description="The default system prompt to use for the chat mode.",
@ -321,6 +368,10 @@ class UISettings(BaseModel):
default_query_system_prompt: str = Field(
None, description="The default system prompt to use for the query mode."
)
default_summarization_system_prompt: str = Field(
None,
description="The default system prompt to use for the summarization mode.",
)
delete_file_button_enabled: bool = Field(
True, description="If the button to delete a file is enabled or not."
)
@ -356,6 +407,13 @@ class RagSettings(BaseModel):
rerank: RerankSettings
class SummarizeSettings(BaseModel):
use_async: bool = Field(
True,
description="If set to True, the summarization will be done asynchronously.",
)
class ClickHouseSettings(BaseModel):
host: str = Field(
"localhost",
@ -545,6 +603,7 @@ class Settings(BaseModel):
vectorstore: VectorstoreSettings
nodestore: NodeStoreSettings
rag: RagSettings
summarize: SummarizeSettings
qdrant: QdrantSettings | None = None
postgres: PostgresSettings | None = None
clickhouse: ClickHouseSettings | None = None

View File

@ -1,9 +1,10 @@
"""This file should be imported if and only if you want to run the UI locally."""
import itertools
import base64
import logging
import time
from collections.abc import Iterable
from enum import Enum
from pathlib import Path
from typing import Any
@ -12,6 +13,7 @@ from fastapi import FastAPI
from gradio.themes.utils.colors import slate # type: ignore
from injector import inject, singleton
from llama_index.core.llms import ChatMessage, ChatResponse, MessageRole
from llama_index.core.types import TokenGen
from pydantic import BaseModel
from private_gpt.constants import PROJECT_ROOT_PATH
@ -20,6 +22,7 @@ from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.server.chat.chat_service import ChatService, CompletionGen
from private_gpt.server.chunks.chunks_service import Chunk, ChunksService
from private_gpt.server.ingest.ingest_service import IngestService
from private_gpt.server.recipes.summarize.summarize_service import SummarizeService
from private_gpt.settings.settings import settings
from private_gpt.ui.images import logo_svg
@ -31,9 +34,22 @@ AVATAR_BOT = THIS_DIRECTORY_RELATIVE / "avatar-bot.ico"
UI_TAB_TITLE = "My Private GPT"
SOURCES_SEPARATOR = "\n\n Sources: \n"
SOURCES_SEPARATOR = "<hr>Sources: \n"
MODES = ["Query Files", "Search Files", "LLM Chat (no context from files)"]
class Modes(str, Enum):
RAG_MODE = "RAG"
SEARCH_MODE = "Search"
BASIC_CHAT_MODE = "Basic"
SUMMARIZE_MODE = "Summarize"
MODES: list[Modes] = [
Modes.RAG_MODE,
Modes.SEARCH_MODE,
Modes.BASIC_CHAT_MODE,
Modes.SUMMARIZE_MODE,
]
class Source(BaseModel):
@ -71,10 +87,12 @@ class PrivateGptUi:
ingest_service: IngestService,
chat_service: ChatService,
chunks_service: ChunksService,
summarizeService: SummarizeService,
) -> None:
self._ingest_service = ingest_service
self._chat_service = chat_service
self._chunks_service = chunks_service
self._summarize_service = summarizeService
# Cache the UI blocks
self._ui_block = None
@ -82,10 +100,15 @@ class PrivateGptUi:
self._selected_filename = None
# Initialize system prompt based on default mode
self.mode = MODES[0]
self._system_prompt = self._get_default_system_prompt(self.mode)
default_mode_map = {mode.value: mode for mode in Modes}
self._default_mode = default_mode_map.get(
settings().ui.default_mode, Modes.RAG_MODE
)
self._system_prompt = self._get_default_system_prompt(self._default_mode)
def _chat(self, message: str, history: list[list[str]], mode: str, *_: Any) -> Any:
def _chat(
self, message: str, history: list[list[str]], mode: Modes, *_: Any
) -> Any:
def yield_deltas(completion_gen: CompletionGen) -> Iterable[str]:
full_response: str = ""
stream = completion_gen.response
@ -109,25 +132,31 @@ class PrivateGptUi:
+ f"{index}. {source.file} (page {source.page}) \n\n"
)
used_files.add(f"{source.file}-{source.page}")
sources_text += "<hr>\n\n"
full_response += sources_text
yield full_response
def yield_tokens(token_gen: TokenGen) -> Iterable[str]:
full_response: str = ""
for token in token_gen:
full_response += str(token)
yield full_response
def build_history() -> list[ChatMessage]:
history_messages: list[ChatMessage] = list(
itertools.chain(
*[
[
ChatMessage(content=interaction[0], role=MessageRole.USER),
ChatMessage(
# Remove from history content the Sources information
content=interaction[1].split(SOURCES_SEPARATOR)[0],
role=MessageRole.ASSISTANT,
),
]
for interaction in history
]
history_messages: list[ChatMessage] = []
for interaction in history:
history_messages.append(
ChatMessage(content=interaction[0], role=MessageRole.USER)
)
)
if len(interaction) > 1 and interaction[1] is not None:
history_messages.append(
ChatMessage(
# Remove from history content the Sources information
content=interaction[1].split(SOURCES_SEPARATOR)[0],
role=MessageRole.ASSISTANT,
)
)
# max 20 messages to try to avoid context overflow
return history_messages[:20]
@ -144,8 +173,7 @@ class PrivateGptUi:
),
)
match mode:
case "Query Files":
case Modes.RAG_MODE:
# Use only the selected file for the query
context_filter = None
if self._selected_filename is not None:
@ -164,14 +192,14 @@ class PrivateGptUi:
context_filter=context_filter,
)
yield from yield_deltas(query_stream)
case "LLM Chat (no context from files)":
case Modes.BASIC_CHAT_MODE:
llm_stream = self._chat_service.stream_chat(
messages=all_messages,
use_context=False,
)
yield from yield_deltas(llm_stream)
case "Search Files":
case Modes.SEARCH_MODE:
response = self._chunks_service.retrieve_relevant(
text=message, limit=4, prev_next_chunks=0
)
@ -184,37 +212,76 @@ class PrivateGptUi:
f"{source.text}"
for index, source in enumerate(sources, start=1)
)
case Modes.SUMMARIZE_MODE:
# Summarize the given message, optionally using selected files
context_filter = None
if self._selected_filename:
docs_ids = []
for ingested_document in self._ingest_service.list_ingested():
if (
ingested_document.doc_metadata["file_name"]
== self._selected_filename
):
docs_ids.append(ingested_document.doc_id)
context_filter = ContextFilter(docs_ids=docs_ids)
summary_stream = self._summarize_service.stream_summarize(
use_context=True,
context_filter=context_filter,
instructions=message,
)
yield from yield_tokens(summary_stream)
# On initialization and on mode change, this function set the system prompt
# to the default prompt based on the mode (and user settings).
@staticmethod
def _get_default_system_prompt(mode: str) -> str:
def _get_default_system_prompt(mode: Modes) -> str:
p = ""
match mode:
# For query chat mode, obtain default system prompt from settings
case "Query Files":
case Modes.RAG_MODE:
p = settings().ui.default_query_system_prompt
# For chat mode, obtain default system prompt from settings
case "LLM Chat (no context from files)":
case Modes.BASIC_CHAT_MODE:
p = settings().ui.default_chat_system_prompt
# For summarization mode, obtain default system prompt from settings
case Modes.SUMMARIZE_MODE:
p = settings().ui.default_summarization_system_prompt
# For any other mode, clear the system prompt
case _:
p = ""
return p
@staticmethod
def _get_default_mode_explanation(mode: Modes) -> str:
match mode:
case Modes.RAG_MODE:
return "Get contextualized answers from selected files."
case Modes.SEARCH_MODE:
return "Find relevant chunks of text in selected files."
case Modes.BASIC_CHAT_MODE:
return "Chat with the LLM using its training data. Files are ignored."
case Modes.SUMMARIZE_MODE:
return "Generate a summary of the selected files. Prompt to customize the result."
case _:
return ""
def _set_system_prompt(self, system_prompt_input: str) -> None:
logger.info(f"Setting system prompt to: {system_prompt_input}")
self._system_prompt = system_prompt_input
def _set_current_mode(self, mode: str) -> Any:
def _set_explanatation_mode(self, explanation_mode: str) -> None:
self._explanation_mode = explanation_mode
def _set_current_mode(self, mode: Modes) -> Any:
self.mode = mode
self._set_system_prompt(self._get_default_system_prompt(mode))
# Update placeholder and allow interaction if default system prompt is set
if self._system_prompt:
return gr.update(placeholder=self._system_prompt, interactive=True)
# Update placeholder and disable interaction if no default system prompt is set
else:
return gr.update(placeholder=self._system_prompt, interactive=False)
self._set_explanatation_mode(self._get_default_mode_explanation(mode))
interactive = self._system_prompt is not None
return [
gr.update(placeholder=self._system_prompt, interactive=interactive),
gr.update(value=self._explanation_mode),
]
def _list_ingested_files(self) -> list[list[str]]:
files = set()
@ -314,17 +381,30 @@ class PrivateGptUi:
".contain { display: flex !important; flex-direction: column !important; }"
"#component-0, #component-3, #component-10, #component-8 { height: 100% !important; }"
"#chatbot { flex-grow: 1 !important; overflow: auto !important;}"
"#col { height: calc(100vh - 112px - 16px) !important; }",
"#col { height: calc(100vh - 112px - 16px) !important; }"
"hr { margin-top: 1em; margin-bottom: 1em; border: 0; border-top: 1px solid #FFF; }"
".avatar-image { background-color: antiquewhite; border-radius: 2px; }"
".footer { text-align: center; margin-top: 20px; font-size: 14px; display: flex; align-items: center; justify-content: center; }"
".footer-zylon-link { display:flex; margin-left: 5px; text-decoration: auto; color: var(--body-text-color); }"
".footer-zylon-link:hover { color: #C7BAFF; }"
".footer-zylon-ico { height: 20px; margin-left: 5px; background-color: antiquewhite; border-radius: 2px; }",
) as blocks:
with gr.Row():
gr.HTML(f"<div class='logo'/><img src={logo_svg} alt=PrivateGPT></div")
with gr.Row(equal_height=False):
with gr.Column(scale=3):
default_mode = self._default_mode
mode = gr.Radio(
MODES,
[mode.value for mode in MODES],
label="Mode",
value="Query Files",
value=default_mode,
)
explanation_mode = gr.Textbox(
placeholder=self._get_default_mode_explanation(default_mode),
show_label=False,
max_lines=3,
interactive=False,
)
upload_button = gr.components.UploadButton(
"Upload File(s)",
@ -408,9 +488,11 @@ class PrivateGptUi:
interactive=True,
render=False,
)
# When mode changes, set default system prompt
# When mode changes, set default system prompt, and other stuffs
mode.change(
self._set_current_mode, inputs=mode, outputs=system_prompt_input
self._set_current_mode,
inputs=mode,
outputs=[system_prompt_input, explanation_mode],
)
# On blur, set system prompt to use in queries
system_prompt_input.blur(
@ -441,6 +523,7 @@ class PrivateGptUi:
"llamacpp": config_settings.llamacpp.llm_hf_model_file,
"openai": config_settings.openai.model,
"openailike": config_settings.openai.model,
"azopenai": config_settings.azopenai.llm_model,
"sagemaker": config_settings.sagemaker.llm_endpoint_name,
"mock": llm_mode,
"ollama": config_settings.ollama.llm_model,
@ -477,6 +560,14 @@ class PrivateGptUi:
),
additional_inputs=[mode, upload_button, system_prompt_input],
)
with gr.Row():
avatar_byte = AVATAR_BOT.read_bytes()
f_base64 = f"data:image/png;base64,{base64.b64encode(avatar_byte).decode('utf-8')}"
gr.HTML(
f"<div class='footer'><a class='footer-zylon-link' href='https://zylon.ai/'>Maintained by Zylon <img class='footer-zylon-ico' src='{f_base64}' alt=Zylon></a></div>"
)
return blocks
def get_ui_blocks(self) -> gr.Blocks:
@ -488,7 +579,7 @@ class PrivateGptUi:
blocks = self.get_ui_blocks()
blocks.queue()
logger.info("Mounting the gradio UI, at path=%s", path)
gr.mount_gradio_app(app, blocks, path=path)
gr.mount_gradio_app(app, blocks, path=path, favicon_path=AVATAR_BOT)
if __name__ == "__main__":

View File

@ -0,0 +1,95 @@
import logging
from collections import deque
from collections.abc import Iterator, Mapping
from typing import Any
from httpx import ConnectError
from tqdm import tqdm # type: ignore
from private_gpt.utils.retry import retry
try:
from ollama import Client, ResponseError # type: ignore
except ImportError as e:
raise ImportError(
"Ollama dependencies not found, install with `poetry install --extras llms-ollama or embeddings-ollama`"
) from e
logger = logging.getLogger(__name__)
_MAX_RETRIES = 5
_JITTER = (3.0, 10.0)
@retry(
is_async=False,
exceptions=(ConnectError, ResponseError),
tries=_MAX_RETRIES,
jitter=_JITTER,
logger=logger,
)
def check_connection(client: Client) -> bool:
try:
client.list()
return True
except (ConnectError, ResponseError) as e:
raise e
except Exception as e:
logger.error(f"Failed to connect to Ollama: {type(e).__name__}: {e!s}")
return False
def process_streaming(generator: Iterator[Mapping[str, Any]]) -> None:
progress_bars = {}
queue = deque() # type: ignore
def create_progress_bar(dgt: str, total: int) -> Any:
return tqdm(
total=total, desc=f"Pulling model {dgt[7:17]}...", unit="B", unit_scale=True
)
current_digest = None
for chunk in generator:
digest = chunk.get("digest")
completed_size = chunk.get("completed", 0)
total_size = chunk.get("total")
if digest and total_size is not None:
if digest not in progress_bars and completed_size > 0:
progress_bars[digest] = create_progress_bar(digest, total=total_size)
if current_digest is None:
current_digest = digest
else:
queue.append(digest)
if digest in progress_bars:
progress_bar = progress_bars[digest]
progress = completed_size - progress_bar.n
if completed_size > 0 and total_size >= progress != progress_bar.n:
if digest == current_digest:
progress_bar.update(progress)
if progress_bar.n >= total_size:
progress_bar.close()
current_digest = queue.popleft() if queue else None
else:
# Store progress for later update
progress_bars[digest].total = total_size
progress_bars[digest].n = completed_size
# Close any remaining progress bars at the end
for progress_bar in progress_bars.values():
progress_bar.close()
def pull_model(client: Client, model_name: str, raise_error: bool = True) -> None:
try:
installed_models = [model["name"] for model in client.list().get("models", {})]
if model_name not in installed_models:
logger.info(f"Pulling model {model_name}. Please wait...")
process_streaming(client.pull(model_name, stream=True))
logger.info(f"Model {model_name} pulled successfully")
except Exception as e:
logger.error(f"Failed to pull model {model_name}: {e!s}")
if raise_error:
raise e

View File

@ -0,0 +1,31 @@
import logging
from collections.abc import Callable
from typing import Any
from retry_async import retry as retry_untyped # type: ignore
retry_logger = logging.getLogger(__name__)
def retry(
exceptions: Any = Exception,
*,
is_async: bool = False,
tries: int = -1,
delay: float = 0,
max_delay: float | None = None,
backoff: float = 1,
jitter: float | tuple[float, float] = 0,
logger: logging.Logger = retry_logger,
) -> Callable[..., Any]:
wrapped = retry_untyped(
exceptions=exceptions,
is_async=is_async,
tries=tries,
delay=delay,
max_delay=max_delay,
backoff=backoff,
jitter=jitter,
logger=logger,
)
return wrapped # type: ignore

View File

@ -1,80 +1,81 @@
[tool.poetry]
name = "private-gpt"
version = "0.5.0"
version = "0.6.2"
description = "Private GPT"
authors = ["Zylon <hi@zylon.ai>"]
[tool.poetry.dependencies]
python = ">=3.11,<3.12"
# PrivateGPT
fastapi = { extras = ["all"], version = "^0.111.0" }
python-multipart = "^0.0.9"
injector = "^0.21.0"
pyyaml = "^6.0.1"
fastapi = { extras = ["all"], version = "^0.115.0" }
python-multipart = "^0.0.10"
injector = "^0.22.0"
pyyaml = "^6.0.2"
watchdog = "^4.0.1"
transformers = "^4.42.3"
transformers = "^4.44.2"
docx2txt = "^0.8"
cryptography = "^3.1"
# LlamaIndex core libs
llama-index-core = "^0.10.52"
llama-index-readers-file = "^0.1.27"
llama-index-core = ">=0.11.2,<0.12.0"
llama-index-readers-file = "*"
# Optional LlamaIndex integration libs
llama-index-llms-llama-cpp = {version = "^0.1.4", optional = true}
llama-index-llms-openai = {version = "^0.1.25", optional = true}
llama-index-llms-openai-like = {version ="^0.1.3", optional = true}
llama-index-llms-ollama = {version ="^0.1.5", optional = true}
llama-index-llms-azure-openai = {version ="^0.1.8", optional = true}
llama-index-llms-gemini = {version ="^0.1.11", optional = true}
llama-index-embeddings-ollama = {version ="^0.1.2", optional = true}
llama-index-embeddings-huggingface = {version ="^0.2.2", optional = true}
llama-index-embeddings-openai = {version ="^0.1.10", optional = true}
llama-index-embeddings-azure-openai = {version ="^0.1.10", optional = true}
llama-index-embeddings-gemini = {version ="^0.1.8", optional = true}
llama-index-vector-stores-qdrant = {version ="^0.2.10", optional = true}
llama-index-vector-stores-milvus = {version ="^0.1.20", optional = true}
llama-index-vector-stores-chroma = {version ="^0.1.10", optional = true}
llama-index-vector-stores-postgres = {version ="^0.1.11", optional = true}
llama-index-vector-stores-clickhouse = {version ="^0.1.3", optional = true}
llama-index-storage-docstore-postgres = {version ="^0.1.3", optional = true}
llama-index-storage-index-store-postgres = {version ="^0.1.4", optional = true}
llama-index-llms-llama-cpp = {version = "*", optional = true}
llama-index-llms-openai = {version ="*", optional = true}
llama-index-llms-openai-like = {version ="*", optional = true}
llama-index-llms-ollama = {version ="*", optional = true}
llama-index-llms-azure-openai = {version ="*", optional = true}
llama-index-llms-gemini = {version ="*", optional = true}
llama-index-embeddings-ollama = {version ="*", optional = true}
llama-index-embeddings-huggingface = {version ="*", optional = true}
llama-index-embeddings-openai = {version ="*", optional = true}
llama-index-embeddings-azure-openai = {version ="*", optional = true}
llama-index-embeddings-gemini = {version ="*", optional = true}
llama-index-embeddings-mistralai = {version ="*", optional = true}
llama-index-vector-stores-qdrant = {version ="*", optional = true}
llama-index-vector-stores-milvus = {version ="*", optional = true}
llama-index-vector-stores-chroma = {version ="*", optional = true}
llama-index-vector-stores-postgres = {version ="*", optional = true}
llama-index-vector-stores-clickhouse = {version ="*", optional = true}
llama-index-storage-docstore-postgres = {version ="*", optional = true}
llama-index-storage-index-store-postgres = {version ="*", optional = true}
# Postgres
psycopg2-binary = {version ="^2.9.9", optional = true}
asyncpg = {version="^0.29.0", optional = true}
# ClickHouse
clickhouse-connect = {version = "^0.7.15", optional = true}
clickhouse-connect = {version = "^0.7.19", optional = true}
# Optional Sagemaker dependency
boto3 = {version ="^1.34.139", optional = true}
# Optional Qdrant client
qdrant-client = {version ="^1.9.0", optional = true}
boto3 = {version ="^1.35.26", optional = true}
# Optional Reranker dependencies
torch = {version ="^2.3.1", optional = true}
sentence-transformers = {version ="^3.0.1", optional = true}
torch = {version ="^2.4.1", optional = true}
sentence-transformers = {version ="^3.1.1", optional = true}
# Optional UI
gradio = {version ="^4.37.2", optional = true}
gradio = {version ="^4.44.0", optional = true}
ffmpy = {version ="^0.4.0", optional = true}
# Optional Google Gemini dependency
google-generativeai = {version ="^0.5.4", optional = true}
# Optional HF Transformers
einops = {version = "^0.8.0", optional = true}
retry-async = "^0.1.4"
[tool.poetry.extras]
ui = ["gradio"]
ui = ["gradio", "ffmpy"]
llms-llama-cpp = ["llama-index-llms-llama-cpp"]
llms-openai = ["llama-index-llms-openai"]
llms-openai-like = ["llama-index-llms-openai-like"]
llms-ollama = ["llama-index-llms-ollama"]
llms-sagemaker = ["boto3"]
llms-azopenai = ["llama-index-llms-azure-openai"]
llms-gemini = ["llama-index-llms-gemini", "google-generativeai"]
llms-gemini = ["llama-index-llms-gemini"]
embeddings-ollama = ["llama-index-embeddings-ollama"]
embeddings-huggingface = ["llama-index-embeddings-huggingface"]
embeddings-huggingface = ["llama-index-embeddings-huggingface", "einops"]
embeddings-openai = ["llama-index-embeddings-openai"]
embeddings-sagemaker = ["boto3"]
embeddings-azopenai = ["llama-index-embeddings-azure-openai"]
embeddings-gemini = ["llama-index-embeddings-gemini"]
embeddings-mistral = ["llama-index-embeddings-mistralai"]
vector-stores-qdrant = ["llama-index-vector-stores-qdrant"]
vector-stores-clickhouse = ["llama-index-vector-stores-clickhouse", "clickhouse_connect"]
vector-stores-chroma = ["llama-index-vector-stores-chroma"]
@ -84,14 +85,14 @@ storage-nodestore-postgres = ["llama-index-storage-docstore-postgres","llama-ind
rerank-sentence-transformers = ["torch", "sentence-transformers"]
[tool.poetry.group.dev.dependencies]
black = "^22"
mypy = "^1.2"
pre-commit = "^2"
pytest = "^7"
pytest-cov = "^3"
black = "^24"
mypy = "^1.11"
pre-commit = "^3"
pytest = "^8"
pytest-cov = "^5"
ruff = "^0"
pytest-asyncio = "^0.21.1"
types-pyyaml = "^6.0.12.12"
pytest-asyncio = "^0.24.0"
types-pyyaml = "^6.0.12.20240917"
[build-system]
requires = ["poetry-core>=1.0.0"]
@ -119,7 +120,7 @@ target-version = ['py311']
target-version = 'py311'
# See all rules at https://beta.ruff.rs/docs/rules/
select = [
lint.select = [
"E", # pycodestyle
"W", # pycodestyle
"F", # Pyflakes
@ -136,7 +137,7 @@ select = [
"RUF", # Ruff-specific rules
]
ignore = [
lint.ignore = [
"E501", # "Line too long"
# -> line length already regulated by black
"PT011", # "pytest.raises() should specify expected exception"
@ -154,24 +155,24 @@ ignore = [
# -> "Missing docstring in public function too restrictive"
]
[tool.ruff.pydocstyle]
[tool.ruff.lint.pydocstyle]
# Automatically disable rules that are incompatible with Google docstring convention
convention = "google"
[tool.ruff.pycodestyle]
[tool.ruff.lint.pycodestyle]
max-doc-length = 88
[tool.ruff.flake8-tidy-imports]
[tool.ruff.lint.flake8-tidy-imports]
ban-relative-imports = "all"
[tool.ruff.flake8-type-checking]
[tool.ruff.lint.flake8-type-checking]
strict = true
runtime-evaluated-base-classes = ["pydantic.BaseModel"]
# Pydantic needs to be able to evaluate types at runtime
# see https://pypi.org/project/flake8-type-checking/ for flake8-type-checking documentation
# see https://beta.ruff.rs/docs/settings/#flake8-type-checking-runtime-evaluated-base-classes for ruff documentation
[tool.ruff.per-file-ignores]
[tool.ruff.lint.per-file-ignores]
# Allow missing docstrings for tests
"tests/**/*.py" = ["D1"]

View File

@ -7,12 +7,13 @@ from pathlib import Path
from private_gpt.di import global_injector
from private_gpt.server.ingest.ingest_service import IngestService
from private_gpt.server.ingest.ingest_watcher import IngestWatcher
from private_gpt.settings.settings import Settings
logger = logging.getLogger(__name__)
class LocalIngestWorker:
def __init__(self, ingest_service: IngestService) -> None:
def __init__(self, ingest_service: IngestService, setting: Settings) -> None:
self.ingest_service = ingest_service
self.total_documents = 0
@ -20,6 +21,24 @@ class LocalIngestWorker:
self._files_under_root_folder: list[Path] = []
self.is_local_ingestion_enabled = setting.data.local_ingestion.enabled
self.allowed_local_folders = setting.data.local_ingestion.allow_ingest_from
def _validate_folder(self, folder_path: Path) -> None:
if not self.is_local_ingestion_enabled:
raise ValueError(
"Local ingestion is disabled."
"You can enable it in settings `ingestion.enabled`"
)
# Allow all folders if wildcard is present
if "*" in self.allowed_local_folders:
return
for allowed_folder in self.allowed_local_folders:
if not folder_path.is_relative_to(allowed_folder):
raise ValueError(f"Folder {folder_path} is not allowed for ingestion")
def _find_all_files_in_folder(self, root_path: Path, ignored: list[str]) -> None:
"""Search all files under the root folder recursively.
@ -28,6 +47,7 @@ class LocalIngestWorker:
for file_path in root_path.iterdir():
if file_path.is_file() and file_path.name not in ignored:
self.total_documents += 1
self._validate_folder(file_path)
self._files_under_root_folder.append(file_path)
elif file_path.is_dir() and file_path.name not in ignored:
self._find_all_files_in_folder(file_path, ignored)
@ -92,13 +112,13 @@ if args.log_file:
logger.addHandler(file_handler)
if __name__ == "__main__":
root_path = Path(args.folder)
if not root_path.exists():
raise ValueError(f"Path {args.folder} does not exist")
ingest_service = global_injector.get(IngestService)
worker = LocalIngestWorker(ingest_service)
settings = global_injector.get(Settings)
worker = LocalIngestWorker(ingest_service, settings)
worker.ingest_folder(root_path, args.ignored)
if args.ignored:

View File

@ -6,21 +6,21 @@ llm:
mode: ${PGPT_MODE:mock}
embedding:
mode: ${PGPT_MODE:sagemaker}
mode: ${PGPT_EMBED_MODE:mock}
llamacpp:
llm_hf_repo_id: ${PGPT_HF_REPO_ID:TheBloke/Mistral-7B-Instruct-v0.1-GGUF}
llm_hf_model_file: ${PGPT_HF_MODEL_FILE:mistral-7b-instruct-v0.1.Q4_K_M.gguf}
llm_hf_repo_id: ${PGPT_HF_REPO_ID:lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF}
llm_hf_model_file: ${PGPT_HF_MODEL_FILE:Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf}
huggingface:
embedding_hf_model_name: ${PGPT_EMBEDDING_HF_MODEL_NAME:BAAI/bge-small-en-v1.5}
embedding_hf_model_name: ${PGPT_EMBEDDING_HF_MODEL_NAME:nomic-ai/nomic-embed-text-v1.5}
sagemaker:
llm_endpoint_name: ${PGPT_SAGEMAKER_LLM_ENDPOINT_NAME:}
embedding_endpoint_name: ${PGPT_SAGEMAKER_EMBEDDING_ENDPOINT_NAME:}
ollama:
llm_model: ${PGPT_OLLAMA_LLM_MODEL:mistral}
llm_model: ${PGPT_OLLAMA_LLM_MODEL:llama3.1}
embedding_model: ${PGPT_OLLAMA_EMBEDDING_MODEL:nomic-embed-text}
api_base: ${PGPT_OLLAMA_API_BASE:http://ollama:11434}
embedding_api_base: ${PGPT_OLLAMA_EMBEDDING_API_BASE:http://ollama:11434}
@ -30,6 +30,7 @@ ollama:
repeat_last_n: ${PGPT_OLLAMA_REPEAT_LAST_N:64}
repeat_penalty: ${PGPT_OLLAMA_REPEAT_PENALTY:1.2}
request_timeout: ${PGPT_OLLAMA_REQUEST_TIMEOUT:600.0}
autopull_models: ${PGPT_OLLAMA_AUTOPULL_MODELS:true}
ui:
enabled: true

View File

@ -7,18 +7,18 @@ llm:
# Should be matching the selected model
max_new_tokens: 512
context_window: 3900
tokenizer: mistralai/Mistral-7B-Instruct-v0.2
prompt_style: "mistral"
tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct
prompt_style: "llama3"
llamacpp:
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
llm_hf_repo_id: lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF
llm_hf_model_file: Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
embedding:
mode: huggingface
huggingface:
embedding_hf_model_name: BAAI/bge-small-en-v1.5
embedding_hf_model_name: nomic-ai/nomic-embed-text-v1.5
vectorstore:
database: qdrant

View File

@ -14,7 +14,7 @@ embedding:
embed_dim: 768
ollama:
llm_model: mistral
llm_model: llama3.1
embedding_model: nomic-embed-text
api_base: http://localhost:11434

View File

@ -11,7 +11,7 @@ embedding:
mode: ollama
ollama:
llm_model: mistral
llm_model: llama3.1
embedding_model: nomic-embed-text
api_base: http://localhost:11434
embedding_api_base: http://localhost:11434 # change if your embedding model runs on another ollama

View File

@ -4,7 +4,7 @@ server:
llm:
mode: openailike
max_new_tokens: 512
tokenizer: mistralai/Mistral-7B-Instruct-v0.2
tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct
temperature: 0.1
embedding:
@ -12,7 +12,7 @@ embedding:
ingest_mode: simple
huggingface:
embedding_hf_model_name: BAAI/bge-small-en-v1.5
embedding_hf_model_name: nomic-ai/nomic-embed-text-v1.5
openai:
api_base: http://localhost:8000/v1

View File

@ -17,31 +17,43 @@ server:
secret: "Basic c2VjcmV0OmtleQ=="
data:
local_ingestion:
enabled: ${LOCAL_INGESTION_ENABLED:false}
allow_ingest_from: ["*"]
local_data_folder: local_data/private_gpt
ui:
enabled: true
path: /
# "RAG", "Search", "Basic", or "Summarize"
default_mode: "RAG"
default_chat_system_prompt: >
You are a helpful, respectful and honest assistant.
You are a helpful, respectful and honest assistant.
Always answer as helpfully as possible and follow ALL given instructions.
Do not speculate or make up information.
Do not reference any given instructions or context.
default_query_system_prompt: >
You can only answer questions about the provided context.
If you know the answer but it is not based in the provided context, don't provide
You can only answer questions about the provided context.
If you know the answer but it is not based in the provided context, don't provide
the answer, just state the answer is not in the context provided.
default_summarization_system_prompt: >
Provide a comprehensive summary of the provided context information.
The summary should cover all the key points and main ideas presented in
the original text, while also condensing the information into a concise
and easy-to-understand format. Please ensure that the summary includes
relevant details and examples that support the main ideas, while avoiding
any unnecessary information or repetition.
delete_file_button_enabled: true
delete_all_files_button_enabled: true
llm:
mode: llamacpp
prompt_style: "mistral"
prompt_style: "llama3"
# Should be matching the selected model
max_new_tokens: 512
context_window: 3900
# Select your tokenizer. Llama-index tokenizer is the default.
# tokenizer: mistralai/Mistral-7B-Instruct-v0.2
# tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct
temperature: 0.1 # The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)
rag:
@ -54,6 +66,9 @@ rag:
model: cross-encoder/ms-marco-MiniLM-L-2-v2
top_n: 1
summarize:
use_async: true
clickhouse:
host: localhost
port: 8443
@ -62,8 +77,8 @@ clickhouse:
database: embeddings
llamacpp:
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
llm_hf_repo_id: lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF
llm_hf_model_file: Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
tfs_z: 1.0 # Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting
top_k: 40 # Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)
top_p: 1.0 # Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
@ -73,11 +88,14 @@ embedding:
# Should be matching the value above in most cases
mode: huggingface
ingest_mode: simple
embed_dim: 384 # 384 is for BAAI/bge-small-en-v1.5
embed_dim: 768 # 768 is for nomic-ai/nomic-embed-text-v1.5
huggingface:
embedding_hf_model_name: BAAI/bge-small-en-v1.5
embedding_hf_model_name: nomic-ai/nomic-embed-text-v1.5
access_token: ${HF_TOKEN:}
# Warning: Enabling this option will allow the model to download and execute code from the internet.
# Nomic AI requires this option to be enabled to use the model, be aware if you are using a different model.
trust_remote_code: true
vectorstore:
database: qdrant
@ -111,12 +129,13 @@ openai:
embedding_api_key: ${OPENAI_API_KEY:}
ollama:
llm_model: llama2
llm_model: llama3.1
embedding_model: nomic-embed-text
api_base: http://localhost:11434
embedding_api_base: http://localhost:11434 # change if your embedding model runs on another ollama
keep_alive: 5m
request_timeout: 120.0
autopull_models: true
azopenai:
api_key: ${AZ_OPENAI_API_KEY:}

View File

@ -5,7 +5,7 @@ from private_gpt.launcher import create_app
from tests.fixtures.mock_injector import MockInjector
@pytest.fixture()
@pytest.fixture
def test_client(request: pytest.FixtureRequest, injector: MockInjector) -> TestClient:
if request is not None and hasattr(request, "param"):
injector.bind_settings(request.param or {})

View File

@ -19,6 +19,6 @@ class IngestHelper:
return ingest_result
@pytest.fixture()
@pytest.fixture
def ingest_helper(test_client: TestClient) -> IngestHelper:
return IngestHelper(test_client)

View File

@ -37,6 +37,6 @@ class MockInjector:
return self.test_injector.get(interface)
@pytest.fixture()
@pytest.fixture
def injector() -> MockInjector:
return MockInjector()

View File

@ -0,0 +1,74 @@
import os
import subprocess
from pathlib import Path
import pytest
from fastapi.testclient import TestClient
@pytest.fixture
def file_path() -> str:
return "test.txt"
def create_test_file(file_path: str) -> None:
with open(file_path, "w") as f:
f.write("test")
def clear_log_file(log_file_path: str) -> None:
if Path(log_file_path).exists():
os.remove(log_file_path)
def read_log_file(log_file_path: str) -> str:
with open(log_file_path) as f:
return f.read()
def init_structure(folder: str, file_path: str) -> None:
clear_log_file(file_path)
os.makedirs(folder, exist_ok=True)
create_test_file(f"{folder}/${file_path}")
def test_ingest_one_file_in_allowed_folder(
file_path: str, test_client: TestClient
) -> None:
allowed_folder = "local_data/tests/allowed_folder"
init_structure(allowed_folder, file_path)
test_env = os.environ.copy()
test_env["PGPT_PROFILES"] = "test"
test_env["LOCAL_INGESTION_ENABLED"] = "True"
result = subprocess.run(
["python", "scripts/ingest_folder.py", allowed_folder],
capture_output=True,
text=True,
env=test_env,
)
assert result.returncode == 0, f"Script failed with error: {result.stderr}"
response_after = test_client.get("/v1/ingest/list")
count_ingest_after = len(response_after.json()["data"])
assert count_ingest_after > 0, "No documents were ingested"
def test_ingest_disabled(file_path: str) -> None:
allowed_folder = "local_data/tests/allowed_folder"
init_structure(allowed_folder, file_path)
test_env = os.environ.copy()
test_env["PGPT_PROFILES"] = "test"
test_env["LOCAL_INGESTION_ENABLED"] = "False"
result = subprocess.run(
["python", "scripts/ingest_folder.py", allowed_folder],
capture_output=True,
text=True,
env=test_env,
)
assert result.returncode != 0, f"Script failed with error: {result.stderr}"

View File

@ -0,0 +1,159 @@
from fastapi.testclient import TestClient
from private_gpt.server.recipes.summarize.summarize_router import (
SummarizeBody,
SummarizeResponse,
)
def test_summarize_route_produces_a_stream(test_client: TestClient) -> None:
body = SummarizeBody(
text="Test",
stream=True,
)
response = test_client.post("/v1/summarize", json=body.model_dump())
raw_events = response.text.split("\n\n")
events = [
item.removeprefix("data: ") for item in raw_events if item.startswith("data: ")
]
assert response.status_code == 200
assert "text/event-stream" in response.headers["content-type"]
assert len(events) > 0
assert events[-1] == "[DONE]"
def test_summarize_route_produces_a_single_value(test_client: TestClient) -> None:
body = SummarizeBody(
text="test",
stream=False,
)
response = test_client.post("/v1/summarize", json=body.model_dump())
# No asserts, if it validates it's good
SummarizeResponse.model_validate(response.json())
assert response.status_code == 200
def test_summarize_with_document_context(test_client: TestClient) -> None:
# Ingest an document
ingest_response = test_client.post(
"/v1/ingest/text",
json={
"file_name": "file_name",
"text": "Lorem ipsum dolor sit amet",
},
)
assert ingest_response.status_code == 200
ingested_docs = ingest_response.json()["data"]
assert len(ingested_docs) == 1
body = SummarizeBody(
use_context=True,
context_filter={"docs_ids": [doc["doc_id"] for doc in ingested_docs]},
stream=False,
)
response = test_client.post("/v1/summarize", json=body.model_dump())
completion: SummarizeResponse = SummarizeResponse.model_validate(response.json())
assert response.status_code == 200
# We can check the content of the completion, because mock LLM used in tests
# always echoes the prompt. In the case of summary, the input context is passed.
assert completion.summary.find("Lorem ipsum dolor sit amet") != -1
def test_summarize_with_non_existent_document_context_not_fails(
test_client: TestClient,
) -> None:
body = SummarizeBody(
use_context=True,
context_filter={
"docs_ids": ["non-existent-doc-id"],
},
stream=False,
)
response = test_client.post("/v1/summarize", json=body.model_dump())
completion: SummarizeResponse = SummarizeResponse.model_validate(response.json())
assert response.status_code == 200
# We can check the content of the completion, because mock LLM used in tests
# always echoes the prompt. In the case of summary, the input context is passed.
assert completion.summary.find("Empty Response") != -1
def test_summarize_with_metadata_and_document_context(test_client: TestClient) -> None:
docs = []
# Ingest a first document
document_1_content = "Content of document 1"
ingest_response = test_client.post(
"/v1/ingest/text",
json={
"file_name": "file_name_1",
"text": document_1_content,
},
)
assert ingest_response.status_code == 200
ingested_docs = ingest_response.json()["data"]
assert len(ingested_docs) == 1
docs += ingested_docs
# Ingest a second document
document_2_content = "Text of document 2"
ingest_response = test_client.post(
"/v1/ingest/text",
json={
"file_name": "file_name_2",
"text": document_2_content,
},
)
assert ingest_response.status_code == 200
ingested_docs = ingest_response.json()["data"]
assert len(ingested_docs) == 1
docs += ingested_docs
# Completions with the first document's id and the second document's metadata
body = SummarizeBody(
use_context=True,
context_filter={"docs_ids": [doc["doc_id"] for doc in docs]},
stream=False,
)
response = test_client.post("/v1/summarize", json=body.model_dump())
completion: SummarizeResponse = SummarizeResponse.model_validate(response.json())
assert response.status_code == 200
# Assert both documents are part of the used sources
# We can check the content of the completion, because mock LLM used in tests
# always echoes the prompt. In the case of summary, the input context is passed.
assert completion.summary.find(document_1_content) != -1
assert completion.summary.find(document_2_content) != -1
def test_summarize_with_prompt(test_client: TestClient) -> None:
ingest_response = test_client.post(
"/v1/ingest/text",
json={
"file_name": "file_name",
"text": "Lorem ipsum dolor sit amet",
},
)
assert ingest_response.status_code == 200
ingested_docs = ingest_response.json()["data"]
assert len(ingested_docs) == 1
body = SummarizeBody(
use_context=True,
context_filter={
"docs_ids": [doc["doc_id"] for doc in ingested_docs],
},
prompt="This is a custom summary prompt, 54321",
stream=False,
)
response = test_client.post("/v1/summarize", json=body.model_dump())
completion: SummarizeResponse = SummarizeResponse.model_validate(response.json())
assert response.status_code == 200
# We can check the content of the completion, because mock LLM used in tests
# always echoes the prompt. In the case of summary, the input context is passed.
assert completion.summary.find("This is a custom summary prompt, 54321") != -1

View File

@ -5,6 +5,7 @@ from private_gpt.components.llm.prompt_helper import (
ChatMLPromptStyle,
DefaultPromptStyle,
Llama2PromptStyle,
Llama3PromptStyle,
MistralPromptStyle,
TagPromptStyle,
get_prompt_style,
@ -139,3 +140,57 @@ def test_llama2_prompt_style_with_system_prompt():
)
assert prompt_style.messages_to_prompt(messages) == expected_prompt
def test_llama3_prompt_style_format():
prompt_style = Llama3PromptStyle()
messages = [
ChatMessage(content="You are a helpful assistant", role=MessageRole.SYSTEM),
ChatMessage(content="Hello, how are you doing?", role=MessageRole.USER),
]
expected_prompt = (
"<|start_header_id|>system<|end_header_id|>\n\n"
"You are a helpful assistant<|eot_id|>"
"<|start_header_id|>user<|end_header_id|>\n\n"
"Hello, how are you doing?<|eot_id|>"
"<|start_header_id|>assistant<|end_header_id|>\n\n"
)
assert prompt_style.messages_to_prompt(messages) == expected_prompt
def test_llama3_prompt_style_with_default_system():
prompt_style = Llama3PromptStyle()
messages = [
ChatMessage(content="Hello!", role=MessageRole.USER),
]
expected = (
"<|start_header_id|>system<|end_header_id|>\n\n"
f"{prompt_style.DEFAULT_SYSTEM_PROMPT}<|eot_id|>"
"<|start_header_id|>user<|end_header_id|>\n\nHello!<|eot_id|>"
"<|start_header_id|>assistant<|end_header_id|>\n\n"
)
assert prompt_style._messages_to_prompt(messages) == expected
def test_llama3_prompt_style_with_assistant_response():
prompt_style = Llama3PromptStyle()
messages = [
ChatMessage(content="You are a helpful assistant", role=MessageRole.SYSTEM),
ChatMessage(content="What is the capital of France?", role=MessageRole.USER),
ChatMessage(
content="The capital of France is Paris.", role=MessageRole.ASSISTANT
),
]
expected_prompt = (
"<|start_header_id|>system<|end_header_id|>\n\n"
"You are a helpful assistant<|eot_id|>"
"<|start_header_id|>user<|end_header_id|>\n\n"
"What is the capital of France?<|eot_id|>"
"<|start_header_id|>assistant<|end_header_id|>\n\n"
"The capital of France is Paris.<|eot_id|>"
)
assert prompt_style.messages_to_prompt(messages) == expected_prompt

View File

@ -1 +1 @@
0.5.0
0.6.2