Next version of PrivateGPT (#1077)

* Dockerize private-gpt

* Use port 8001 for local development

* Add setup script

* Add CUDA Dockerfile

* Create README.md

* Make the API use OpenAI response format

* Truncate prompt

* refactor: add models and __pycache__ to .gitignore

* Better naming

* Update readme

* Move models ignore to it's folder

* Add scaffolding

* Apply formatting

* Fix tests

* Working sagemaker custom llm

* Fix linting

* Fix linting

* Enable streaming

* Allow all 3.11 python versions

* Use llama 2 prompt format and fix completion

* Restructure (#3)

Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local>

* Fix Dockerfile

* Use a specific build stage

* Cleanup

* Add FastAPI skeleton

* Cleanup openai package

* Fix DI and tests

* Split tests and tests with coverage

* Remove old scaffolding

* Add settings logic (#4)

* Add settings logic

* Add settings for sagemaker

---------

Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local>

* Local LLM (#5)

* Add settings logic

* Add settings for sagemaker

* Add settings-local-example.yaml

* Delete terraform files

* Refactor tests to use fixtures

* Join deltas

* Add local model support

---------

Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local>

* Update README.md

* Fix tests

* Version bump

* Enable simple llamaindex observability (#6)

* Enable simple llamaindex observability

* Improve code through linting

* Update README.md

* Move to async (#7)

* Migrate implementation to use asyncio

* Formatting

* Cleanup

* Linting

---------

Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local>

* Query Docs and gradio UI

* Remove unnecessary files

* Git ignore chromadb folder

* Async migration + DI Cleanup

* Fix tests

* Add integration test

* Use fastapi responses

* Retrieval service with partial implementation

* Cleanup

* Run formatter

* Fix types

* Fetch nodes asynchronously

* Install local dependencies in tests

* Install ui dependencies in tests

* Install dependencies for llama-cpp

* Fix sudo

* Attempt to fix cuda issues

* Attempt to fix cuda issues

* Try to reclaim some space from ubuntu machine

* Retrieval with context

* Fix lint and imports

* Fix mypy

* Make retrieval API a POST

* Make Completions body a dataclass

* Fix LLM chat message order

* Add Query Chunks to Gradio UI

* Improve rag query prompt

* Rollback CI Changes

* Move to sync code

* Using Llamaindex abstraction for query retrieval

* Fix types

* Default to CONDENSED chat mode for contextualized chat

* Rename route function

* Add Chat endpoint

* Remove webhooks

* Add IntelliJ run config to gitignore

* .gitignore applied

* Sync chat completion

* Refactor total

* Typo in context_files.py

* Add embeddings component and service

* Remove wrong dataclass from IngestService

* Filter by context file id implementation

* Fix typing

* Implement context_filter and separate from the bool use_context in the API

* Change chunks api to avoid conceptual class of the context concept

* Deprecate completions and fix tests

* Remove remaining dataclasses

* Use embedding component in ingest service

* Fix ingestion to have multipart and local upload

* Fix ingestion API

* Add chunk tests

* Add configurable paths

* Cleaning up

* Add more docs

* IngestResponse includes a list of IngestedDocs

* Use IngestedDoc in the Chunk document reference

* Rename ingest routes to ingest_router.py

* Fix test working directory for intellij

* Set testpaths for pytest

* Remove unused as_chat_engine

* Add .fleet ide to gitignore

* Make LLM and Embedding model configurable

* Fix imports and checks

* Let local_data folder exist empty in the repository

* Don't use certain metadata in LLM

* Remove long lines

* Fix windows installation

* Typos

* Update poetry.lock

* Add TODO for linux

* Script and first version of docs

* No jekill build

* Fix relative url to openapi json

* Change default docs values

* Move chromadb dependency to the general group

* Fix tests to use separate local_data

* Create CNAME

* Update CNAME

* Fix openapi.json relative path

* PrivateGPT logo

* WIP OpenAPI documentation metadata

* Add ingest script (#11)

* Add ingest script

* Fix broken name refactor

* Add ingest docs and Makefile script

* Linting

* Move transformers to main dependency

* Move torch to main dependencies

* Don't load HuggingFaceEmbedding in tests

* Fix lint

---------

Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local>

* Rename file to camel_case

* Commit settings-local.yaml

* Move documentation to public docs

* Fix docker image for linux

* Installation and Running the Server documentation

* Move back to docs folder, as it is the only supported by github pages

* Delete CNAME

* Create CNAME

* Delete CNAME

* Create CNAME

* Improved API documentation

* Fix lint

* Completions documentation

* Updated openapi scheme

* Ingestion API doc

* Minor doc changes

* Updated openapi scheme

* Chunks API documentation

* Embeddings and Health API, and homogeneous responses

* Revamp README with new skeleton of content

* More docs

* PrivateGPT logo

* Improve UI

* Update ingestion docu

* Update README with new sections

* Use context window in the retriever

* Gradio Documentation

* Add logo to UI

* Include Contributing and Community sections to README

* Update links to resources in the README

* Small README.md updates

* Wrap lines of README.md

* Don't put health under /v1

* Add copy button to Chat

* Architecture documentation

* Updated openapi.json

* Updated openapi.json

* Updated openapi.json

* Change UI label

* Update documentation

* Add releases link to README.md

* Gradio avatar and stop debug

* Readme update

* Clean old files

* Remove unused terraform checks

* Update twitter link.

* Disable minimum coverage

* Clean install message in README.md

---------

Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local>
Co-authored-by: Iván Martínez <ivanmartit@gmail.com>
Co-authored-by: RubenGuerrero <ruben.guerrero@boopos.com>
Co-authored-by: Daniel Gallego Vico <daniel.gallego@bq.com>
This commit is contained in:
Pablo Orgaz 2023-10-19 16:04:35 +02:00 committed by GitHub
parent 78d1ef44ad
commit 51cc638758
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
98 changed files with 7067 additions and 3397 deletions

9
.dockerignore Normal file
View File

@ -0,0 +1,9 @@
.venv
models
.github
.vscode
.DS_Store
terraform
tests
Dockerfile
Dockerfile.*

View File

@ -1,24 +0,0 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: bug
assignees: ''
---
Note: if you'd like to *ask a question* or *open a discussion*, head over to the [Discussions](https://github.com/imartinez/privateGPT/discussions) section and post it there.
**Describe the bug and how to reproduce it**
A clear and concise description of what the bug is and the steps to reproduce the behavior.
**Expected behavior**
A clear and concise description of what you expected to happen.
**Environment (please complete the following information):**
- OS / hardware: [e.g. macOS 12.6 / M1]
- Python version [e.g. 3.11.3]
- Other relevant information
**Additional context**
Add any other context about the problem here.

View File

@ -1,22 +0,0 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: enhancement
assignees: ''
---
Note: if you'd like to *ask a question* or *open a discussion*, head over to the [Discussions](https://github.com/imartinez/privateGPT/discussions) section and post it there.
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.

View File

@ -0,0 +1,30 @@
name: "Install Dependencies"
description: "Action to build the project dependencies from the main versions"
inputs:
python_version:
required: true
type: string
default: "3.11.4"
poetry_version:
required: true
type: string
default: "1.5.1"
runs:
using: composite
steps:
- name: Install Poetry
uses: snok/install-poetry@v1
with:
version: ${{ inputs.poetry_version }}
virtualenvs-create: true
virtualenvs-in-project: false
installer-parallel: true
- uses: actions/setup-python@v4
with:
python-version: ${{ inputs.python_version }}
cache: "poetry"
- name: Install Dependencies
run: poetry install --with ui --no-root
shell: bash

66
.github/workflows/tests.yml vendored Normal file
View File

@ -0,0 +1,66 @@
name: Tests
on:
push:
branches: [main]
pull_request:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.head_ref || github.ref }}
cancel-in-progress: ${{ github.event_name == 'pull_request' }}
jobs:
setup:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: ./.github/workflows/actions/install_dependencies
checks:
needs: setup
runs-on: ubuntu-latest
name: ${{ matrix.quality-command }}
strategy:
matrix:
quality-command:
- black
- ruff
- mypy
steps:
- uses: actions/checkout@v3
- uses: ./.github/workflows/actions/install_dependencies
- name: run ${{ matrix.quality-command }}
run: make ${{ matrix.quality-command }}
test:
needs: setup
runs-on: ubuntu-latest
name: test
steps:
- uses: actions/checkout@v3
- uses: ./.github/workflows/actions/install_dependencies
- name: run test
run: make test-coverage
# Run even if make test fails for coverage reports
# TODO: select a better xml results displayer
- name: Archive test results coverage results
uses: actions/upload-artifact@v3
if: always()
with:
name: test_results
path: tests-results.xml
- name: Archive code coverage results
uses: actions/upload-artifact@v3
if: always()
with:
name: code-coverage-report
path: htmlcov/
all_checks_passed:
# Used to easily force requirements checks in GitHub
needs:
- checks
- test
runs-on: ubuntu-latest
steps:
- run: echo "All checks passed"

183
.gitignore vendored
View File

@ -1,174 +1,29 @@
# OSX
.DS_STORE
.venv
# Models
models/
settings-me.yaml
# Local Chroma db
.chroma/
db/
persist_directory/chroma.sqlite
.ruff_cache
.pytest_cache
.mypy_cache
# Byte-compiled / optimized / DLL files
# byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# unit tests / coverage reports
/tests-results.xml
/.coverage
/coverage.xml
/htmlcov/
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
/.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# IDE
.idea/
.vscode/
/.run/
.fleet/
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
# vscode
.vscode/launch.json
# macOS
.DS_Store

View File

@ -1,44 +1,43 @@
---
files: ^(.*\.(py|json|md|sh|yaml|cfg|txt))$
exclude: ^(\.[^/]*cache/.*|.*/_user.py|source_documents/)$
default_install_hook_types:
# Mandatory to install both pre-commit and pre-push hooks (see https://pre-commit.com/#top_level-default_install_hook_types)
# Add new hook types here to ensure automatic installation when running `pre-commit install`
- pre-commit
- pre-push
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
#- id: no-commit-to-branch
# args: [--branch, main]
- id: check-yaml
args: [--unsafe]
# - id: debug-statements
- id: end-of-file-fixer
- id: trailing-whitespace
exclude-files: \.md$
- id: check-json
- id: mixed-line-ending
# - id: check-builtin-literals
# - id: check-ast
- id: check-merge-conflict
- id: check-executables-have-shebangs
- id: check-shebang-scripts-are-executable
- id: check-docstring-first
- id: fix-byte-order-marker
- id: check-case-conflict
# - id: check-toml
- repo: https://github.com/adrienverge/yamllint.git
rev: v1.29.0
hooks:
- id: yamllint
args:
- --no-warnings
- -d
- '{extends: relaxed, rules: {line-length: {max: 90}}}'
- repo: https://github.com/codespell-project/codespell
rev: v2.2.2
hooks:
- id: codespell
args:
# - --builtin=clear,rare,informal,usage,code,names,en-GB_to_en-US
- --builtin=clear,rare,informal,usage,code,names
- --ignore-words-list=hass,master
- --skip="./.*"
- --quiet-level=2
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-json
- id: check-added-large-files
- repo: local
hooks:
- id: black
name: Formatting (black)
entry: black
language: system
types: [python]
stages: [commit]
- id: ruff
name: Linter (ruff)
entry: ruff
language: system
types: [python]
stages: [commit]
- id: mypy
name: Type checking (mypy)
entry: make mypy
pass_filenames: false
language: system
types: [python]
stages: [commit]
- id: test
name: Unit tests (pytest)
entry: make test
pass_filenames: false
language: system
types: [python]
stages: [push]

47
Dockerfile Normal file
View File

@ -0,0 +1,47 @@
### IMPORTANT, THIS IMAGE CAN ONLY BE RUN IN LINUX DOCKER
### You will run into a segfault in mac
FROM python:3.11.6-slim-bookworm as base
# Install poetry
RUN pip install pipx
RUN python3 -m pipx ensurepath
RUN pipx install poetry
ENV PATH="/root/.local/bin:$PATH"
# Dependencies to build llama-cpp and wget
RUN apt update && apt install -y \
libopenblas-dev\
ninja-build\
build-essential\
pkg-config\
wget
# https://python-poetry.org/docs/configuration/#virtualenvsin-project
ENV POETRY_VIRTUALENVS_IN_PROJECT=true
FROM base as dependencies
WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./
RUN poetry install --with local
RUN poetry install --with ui
RUN CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"\
poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
FROM base as app
ENV PYTHONUNBUFFERED=1
ENV PORT=8080
ENV PGPT_PROFILES=docker
EXPOSE 8080
# Prepare a non-root user
RUN adduser --system worker
WORKDIR /home/worker/app
# Copy everything, including the virtual environment
COPY --chown=worker --from=dependencies /home/worker/app .
COPY --chown=worker . .
USER worker
ENTRYPOINT .venv/bin/python -m private_gpt

201
LICENSE
View File

@ -1,201 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

52
Makefile Normal file
View File

@ -0,0 +1,52 @@
# Any args passed to the make script, use with $(call args, default_value)
args = `arg="$(filter-out $@,$(MAKECMDGOALS))" && echo $${arg:-${1}}`
########################################################################################################################
# Quality checks
########################################################################################################################
test:
PYTHONPATH=. poetry run pytest tests
test-coverage:
PYTHONPATH=. poetry run pytest tests --cov private_gpt --cov-report term --cov-report=html --cov-report xml --junit-xml=tests-results.xml
black:
poetry run black . --check
ruff:
poetry run ruff check private_gpt tests
format:
poetry run black .
poetry run ruff check private_gpt tests --fix
mypy:
poetry run mypy private_gpt
check:
make format
make mypy
########################################################################################################################
# Run
########################################################################################################################
run:
poetry run python -m private_gpt
dev-windows:
(set PGPT_PROFILES=local & poetry run python -m uvicorn private_gpt.main:app --reload --port 8001)
dev:
PYTHONUNBUFFERED=1 PGPT_PROFILES=local poetry run python -m uvicorn private_gpt.main:app --reload --port 8001
########################################################################################################################
# Misc
########################################################################################################################
api-docs:
poetry run python scripts/extract_openapi.py private_gpt.main:app --out docs/openapi.json
ingest:
@poetry run python scripts/ingest_folder.py $(call args)

296
README.md
View File

@ -1,152 +1,158 @@
# privateGPT
Ask questions to your documents without an internet connection, using the power of LLMs. 100% private, no data leaves your execution environment at any point. You can ingest documents and ask questions without an internet connection!
# 🔒 PrivateGPT 📑
> :ear: **Need help applying PrivateGPT to your specific use case?** [Let us know more about it](https://forms.gle/4cSDmH13RZBHV9at7) and we'll try to help! We are refining PrivateGPT through your feedback.
<img width="900" alt="demo" src="https://lh3.googleusercontent.com/drive-viewer/AK7aPaBasLxbp49Hrwnmi_Ctii1oIM18nFJrBO0ERSE3wpkS-syjiQBE32_tUSdqnjn6etUDjUSkdJeFa8acqRb0lZbkZ6CyAw=s1600">
<img width="902" alt="demo" src="https://user-images.githubusercontent.com/721666/236942256-985801c9-25b9-48ef-80be-3acbb4575164.png">
PrivateGPT is a production-ready AI project that allows you to ask questions to your documents using the power
of Large Language Models (LLMs), even in scenarios without Internet connection. 100% private, no data leaves your
execution environment at any point.
Built with [LangChain](https://github.com/hwchase17/langchain), [LlamaIndex](https://www.llamaindex.ai/), [GPT4All](https://github.com/nomic-ai/gpt4all), [LlamaCpp](https://github.com/ggerganov/llama.cpp), [Chroma](https://www.trychroma.com/) and [SentenceTransformers](https://www.sbert.net/).
The project provides an API offering all the primitives required to build private, context-aware AI applications.
It follows and extends [OpenAI API standard](https://openai.com/blog/openai-api),
and supports both normal and streaming responses.
# Environment Setup
In order to set your environment up to run the code here, first install all requirements:
The API is divided into two logical blocks:
**High-level API**, which abstracts all the complexity of a RAG (Retrieval Augmented Generation)
pipeline implementation:
- Ingestion of documents: internally managing document parsing,
splitting, metadata extraction, embedding generation and storage.
- Chat & Completions using context from ingested documents:
abstracting the retrieval of context, the prompt engineering and the response generation.
**Low-level API**, which allows advanced users to implement their own complex pipelines:
- Embeddings generation: based on a piece of text.
- Contextual chunks retrieval: given a query, returns the most relevant chunks of text from the ingested documents.
In addition to this, a working [Gradio UI](https://www.gradio.app/)
client is provided to test the API, together with a set of useful tools such as bulk model
download script, ingestion script, documents folder watch, etc.
> 👂 **Need help applying PrivateGPT to your specific use case?**
> [Let us know more about it](https://forms.gle/4cSDmH13RZBHV9at7)
> and we'll try to help! We are refining PrivateGPT through your feedback.
## 🎞️ Overview
DISCLAIMER: This README is not updated as frequently as the [documentation](https://docs.privategpt.dev/).
Please check it out for the latest updates!
### Motivation behind PrivateGPT
Generative AI is a game changer for our society, but adoption in companies of all size and data-sensitive
domains like healthcare or legal is limited by a clear concern: **privacy**.
Not being able to ensure that your data is fully under your control when using third-party AI tools
is a risk those industries cannot take.
### Primordial version
The first version of PrivateGPT was launched in May 2023 as a novel approach to address the privacy
concern by using LLMs in a complete offline way.
This was done by leveraging existing technologies developed by the thriving Open Source AI community:
[LangChain](https://github.com/hwchase17/langchain), [LlamaIndex](https://www.llamaindex.ai/),
[GPT4All](https://github.com/nomic-ai/gpt4all),
[LlamaCpp](https://github.com/ggerganov/llama.cpp),
[Chroma](https://www.trychroma.com/)
and [SentenceTransformers](https://www.sbert.net/).
That version, which rapidly became a go-to project for privacy-sensitive setups and served as the seed
for thousands of local-focused generative AI projects, was the foundation of what PrivateGPT is becoming nowadays;
thus a simpler and more educational implementation to understand the basic concepts required
to build a fully local -and therefore, private- chatGPT-like tool.
If you want to keep experimenting with it, we have saved it in the
[primordial branch](https://github.com/imartinez/privateGPT/branches) of the project.
> It is strongly recommended to do a clean clone and install of this new version of
PrivateGPT if you come from the previous, primordial version.
### Present and Future of PrivateGPT
PrivateGPT is now evolving towards becoming a gateway to generative AI models and primitives, including
completions, document ingestion, RAG pipelines and other low-level building blocks.
We want to make easier for any developer to build AI applications and experiences, as well as providing
a suitable extensive architecture for the community to keep contributing.
Stay tuned to our [releases](https://github.com/imartinez/privateGPT/releases) to check all the new features and changes included.
## 📄 Documentation
Full documentation on installation, dependencies, configuration, running the server, deployment options,
ingesting local documents, API details and UI features can be found here: https://docs.privategpt.dev/
## 🧩 Architecture
Conceptually, PrivateGPT is an API that wraps a RAG pipeline and exposes its
primitives.
* The API is built using [FastAPI](https://fastapi.tiangolo.com/) and follows
[OpenAI's API scheme](https://platform.openai.com/docs/api-reference).
* The RAG pipeline is based on [LlamaIndex](https://www.llamaindex.ai/).
The design of PrivateGPT allows to easily extend and adapt both the API and the
RAG implementation. Some key architectural decisions are:
* Dependency Injection, decoupling the different componentes and layers.
* Usage of LlamaIndex abstractions such as `LLM`, `BaseEmbedding` or `VectorStore`,
making it immediate to change the actual implementations of those abstractions.
* Simplicity, adding as few layers and new abstractions as possible.
* Ready to use, providing a full implementation of the API and RAG
pipeline.
Main building blocks:
* APIs are defined in `private_gpt:server:<api>`. Each package contains an
`<api>_router.py` (FastAPI layer) and an `<api>_service.py` (the
service implementation). Each *Service* uses LlamaIndex base abstractions instead
of specific implementations,
decoupling the actual implementation from its usage.
* Components are placed in
`private_gpt:components:<component>`. Each *Component* is in charge of providing
actual implementations to the base abstractions used in the Services - for example
`LLMComponent` is in charge of providing an actual implementation of an `LLM`
(for example `LlamaCPP` or `OpenAI`).
## 💡 Contributing
Contributions are welcomed! To ensure code quality we have enabled several format and
typing checks, just run `make check` before committing to make sure your code is ok.
Remember to test your code! You'll find a tests folder with helpers, and you can run
tests using `make test` command.
Interested in contributing to PrivateGPT? We have the following challenges ahead of us in case
you want to give a hand:
### Improvements
- Better RAG pipeline implementation (improvements to both indexing and querying stages)
- Code documentation
- Expose execution parameters such as top_p, temperature, max_tokens... in Completions and Chat Completions
- Expose chunk size in Ingest API
- Implement Update and Delete document in Ingest API
- Add information about tokens consumption in each response
- Add to Completion APIs (chat and completion) the context docs used to answer the question
- In “model” field return the actual LLM or Embeddings model name used
### Features
- Implement concurrency lock to avoid errors when there are several calls to the local LlamaCPP model
- API key-based request control to the API
- CORS support
- Support for Sagemaker
- Support Function calling
- Add md5 to check files already ingested
- Select a document to query in the UI
- Better observability of the RAG pipeline
### Project Infrastructure
- Create a “wipe” shortcut in `make` to remove all contents of local_data folder except .gitignore
- Packaged version as a local desktop app (windows executable, mac app, linux app)
- Dockerize the application for platforms outside linux (Docker Desktop for Mac and Windows)
- Document how to deploy to AWS, GCP and Azure.
##
## 💬 Community
Join the conversation around PrivateGPT on our:
- [Twitter (aka X)](https://twitter.com/PrivateGPT_AI)
- [Discord](https://discord.gg/bK6mRVpErU)
## 📖 Citation
Reference to cite if you use PrivateGPT in a paper:
```shell
pip3 install -r requirements.txt
```
*Alternative requirements installation with poetry*
1. Install [poetry](https://python-poetry.org/docs/#installation)
2. Run this commands
```shell
cd privateGPT
poetry install
poetry shell
@software{PrivateGPT_2023,
authors = {Martinez, I., Gallego, D. Orgaz, P.},
month = {5},
title = {PrivateGPT},
url = {https://github.com/imartinez/privateGPT},
year = {2023}
}
```
Then, download the LLM model and place it in a directory of your choice:
- LLM: default to [ggml-gpt4all-j-v1.3-groovy.bin](https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin). If you prefer a different GPT4All-J compatible model, just download it and reference it in your `.env` file.
Copy the `example.env` template into `.env`
```shell
cp example.env .env
```
and edit the variables appropriately in the `.env` file.
```
MODEL_TYPE: supports LlamaCpp or GPT4All
PERSIST_DIRECTORY: is the folder you want your vectorstore in
MODEL_PATH: Path to your GPT4All or LlamaCpp supported LLM
MODEL_N_CTX: Maximum token limit for the LLM model
MODEL_N_BATCH: Number of tokens in the prompt that are fed into the model at a time. Optimal value differs a lot depending on the model (8 works well for GPT4All, and 1024 is better for LlamaCpp)
EMBEDDINGS_MODEL_NAME: SentenceTransformers embeddings model name (see https://www.sbert.net/docs/pretrained_models.html)
TARGET_SOURCE_CHUNKS: The amount of chunks (sources) that will be used to answer a question
```
Note: because of the way `langchain` loads the `SentenceTransformers` embeddings, the first time you run the script it will require internet connection to download the embeddings model itself.
## Test dataset
This repo uses a [state of the union transcript](https://github.com/imartinez/privateGPT/blob/main/source_documents/state_of_the_union.txt) as an example.
## Instructions for ingesting your own dataset
Put any and all your files into the `source_documents` directory
The supported extensions are:
- `.csv`: CSV,
- `.docx`: Word Document,
- `.doc`: Word Document,
- `.enex`: EverNote,
- `.eml`: Email,
- `.epub`: EPub,
- `.html`: HTML File,
- `.md`: Markdown,
- `.msg`: Outlook Message,
- `.odt`: Open Document Text,
- `.pdf`: Portable Document Format (PDF),
- `.pptx` : PowerPoint Document,
- `.ppt` : PowerPoint Document,
- `.txt`: Text file (UTF-8),
Run the following command to ingest all the data.
```shell
python ingest.py
```
Output should look like this:
```shell
Creating new vectorstore
Loading documents from source_documents
Loading new documents: 100%|██████████████████████| 1/1 [00:01<00:00, 1.73s/it]
Loaded 1 new documents from source_documents
Split into 90 chunks of text (max. 500 tokens each)
Creating embeddings. May take some minutes...
Using embedded DuckDB with persistence: data will be stored in: db
Ingestion complete! You can now run privateGPT.py to query your documents
```
It will create a `db` folder containing the local vectorstore. Will take 20-30 seconds per document, depending on the size of the document.
You can ingest as many documents as you want, and all will be accumulated in the local embeddings database.
If you want to start from an empty database, delete the `db` folder.
Note: during the ingest process no data leaves your local environment. You could ingest without an internet connection, except for the first time you run the ingest script, when the embeddings model is downloaded.
## Ask questions to your documents, locally!
In order to ask a question, run a command like:
```shell
python privateGPT.py
```
And wait for the script to require your input.
```plaintext
> Enter a query:
```
Hit enter. You'll need to wait 20-30 seconds (depending on your machine) while the LLM model consumes the prompt and prepares the answer. Once done, it will print the answer and the 4 sources it used as context from your documents; you can then ask another question without re-running the script, just wait for the prompt again.
Note: you could turn off your internet connection, and the script inference would still work. No data gets out of your local environment.
Type `exit` to finish the script.
### CLI
The script also supports optional command-line arguments to modify its behavior. You can see a full list of these arguments by running the command ```python privateGPT.py --help``` in your terminal.
# How does it work?
Selecting the right local models and the power of `LangChain` you can run the entire pipeline locally, without any data leaving your environment, and with reasonable performance.
- `ingest.py` uses `LangChain` tools to parse the document and create embeddings locally using `HuggingFaceEmbeddings` (`SentenceTransformers`). It then stores the result in a local vector database using `Chroma` vector store.
- `privateGPT.py` uses a local LLM based on `GPT4All-J` or `LlamaCpp` to understand questions and create answers. The context for the answers is extracted from the local vector store using a similarity search to locate the right piece of context from the docs.
- `GPT4All-J` wrapper was introduced in LangChain 0.0.162.
# System Requirements
## Python Version
To use this software, you must have Python 3.10 or later installed. Earlier versions of Python will not compile.
## C++ Compiler
If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++ compiler on your computer.
### For Windows 10/11
To install a C++ compiler on Windows 10/11, follow these steps:
1. Install Visual Studio 2022.
2. Make sure the following components are selected:
* Universal Windows Platform development
* C++ CMake tools for Windows
3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/).
4. Run the installer and select the `gcc` component.
## Mac Running Intel
When running a Mac with Intel hardware (not M1), you may run into _clang: error: the clang compiler does not support '-march=native'_ during pip install.
If so set your archflags during pip install. eg: _ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt_
# Disclaimer
This is a test project to validate the feasibility of a fully private solution for question answering using LLMs and Vector embeddings. It is not production ready, and it is not meant to be used in production. The models selection is not optimized for performance, but for privacy; but it is possible to use different models and vectorstores to improve performance.

View File

@ -1,16 +0,0 @@
import os
from dotenv import load_dotenv
from chromadb.config import Settings
load_dotenv()
# Define the folder for storing database
PERSIST_DIRECTORY = os.environ.get('PERSIST_DIRECTORY')
if PERSIST_DIRECTORY is None:
raise Exception("Please set the PERSIST_DIRECTORY environment variable")
# Define the Chroma settings
CHROMA_SETTINGS = Settings(
persist_directory=PERSIST_DIRECTORY,
anonymized_telemetry=False
)

0
docs/.nojekyll Normal file
View File

1
docs/CNAME Normal file
View File

@ -0,0 +1 @@
docs.privategpt.dev

389
docs/description.md Normal file
View File

@ -0,0 +1,389 @@
## Introduction
PrivateGPT provides an **API** containing all the building blocks required to build
**private, context-aware AI applications**. The API follows and extends OpenAI API standard, and supports
both normal and streaming responses.
The API is divided in two logical blocks:
- High-level API, abstracting all the complexity of a RAG (Retrieval Augmented Generation) pipeline implementation:
- Ingestion of documents: internally managing document parsing, splitting, metadata extraction,
embedding generation and storage.
- Chat & Completions using context from ingested documents: abstracting the retrieval of context, the prompt
engineering and the response generation.
- Low-level API, allowing advanced users to implement their own complex pipelines:
- Embeddings generation: based on a piece of text.
- Contextual chunks retrieval: given a query, returns the most relevant chunks of text from the ingested
documents.
> A working **Gradio UI client** is provided to test the API, together with a set of
> useful tools such as bulk model download script, ingestion script, documents folder
> watch, etc.
## Quick Local Installation steps
The steps in `Installation and Settings` section are better explained and cover more
setup scenarios. But if you are looking for a quick setup guide, here it is:
```
# Clone the repo
git clone https://github.com/imartinez/privateGPT
cd privateGPT
# Install Python 3.11
pyenv install 3.11
pyenv local 3.11
# Install dependencies
poetry install --with ui,local
# Download Embedding and LLM models
poetry run python scripts/setup
# (Optional) For Mac with Metal GPU, enable it. Check Installation and Settings section
to know how to enable GPU on other platforms
CMAKE_ARGS="-DLLAMA_METAL=on" pip install --force-reinstall --no-cache-dir llama-cpp-python
# Run the local server
PGPT_PROFILES=local make run
# Note: on Mac with Metal you should see a ggml_metal_add_buffer log, stating GPU is
being used
# Navigate to the UI and try it out!
http://localhost:8001/
```
## Installation and Settings
### Base requirements to run PrivateGPT
* Git clone PrivateGPT repository, and navigate to it:
```
git clone https://github.com/imartinez/privateGPT
cd privateGPT
```
* Install Python 3.11. Ideally through a python version manager like `pyenv`.
Python 3.12
should work too. Earlier python versions are not supported.
* osx/linux: [pyenv](https://github.com/pyenv/pyenv)
* windows: [pyenv-win](https://github.com/pyenv-win/pyenv-win)
```
pyenv install 3.11
pyenv local 3.11
```
* Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) for dependency management:
* Install `make` for scripts:
* osx: (Using homebrew): `brew install make`
* windows: (Using chocolatey) `choco install make`
### Install dependencies
Install the dependencies:
```bash
poetry install --with ui
```
Verify everything is working by running `make run` (or `poetry run python -m private_gpt`) and navigate to
http://localhost:8001. You should see a [Gradio UI](https://gradio.app/) **configured with a mock LLM** that will
echo back the input. Later we'll see how to configure a real LLM.
### Settings
> Note: the default settings of PrivateGPT work out-of-the-box for a 100% local setup. Skip this section if you just
> want to test PrivateGPT locally, and come back later to learn about more configuration options.
PrivateGPT is configured through *profiles* that are defined using yaml files, and selected through env variables.
The full list of properties configurable can be found in `settings.yaml`
#### env var `PGPT_SETTINGS_FOLDER`
The location of the settings folder. Defaults to the root of the project.
Should contain the default `settings.yaml` and any other `settings-{profile}.yaml`.
#### env var `PGPT_PROFILES`
By default, the profile definition in `settings.yaml` is loaded.
Using this env var you can load additional profiles; format is a comma separated list of profile names.
This will merge `settings-{profile}.yaml` on top of the base settings file.
For example:
`PGPT_PROFILES=local,cuda` will load `settings-local.yaml`
and `settings-cuda.yaml`, their contents will be merged with
later profiles properties overriding values of earlier ones like `settings.yaml`.
During testing, the `test` profile will be active along with the default, therefore `settings-test.yaml`
file is required.
#### Environment variables expansion
Configuration files can contain environment variables,
they will be expanded at runtime.
Expansion must follow the pattern `${VARIABLE_NAME:default_value}`.
For example, the following configuration will use the value of the `PORT`
environment variable or `8001` if it's not set.
Missing variables with no default will produce an error.
```yaml
server:
port: ${PORT:8001}
```
### Local LLM requirements
Install extra dependencies for local execution:
```bash
poetry install --with local
```
For PrivateGPT to run fully locally GPU acceleration is required
(CPU execution is possible, but very slow), however,
typical Macbook laptops or window desktops with mid-range GPUs lack VRAM to run
even the smallest LLMs. For that reason
**local execution is only supported for models compatible with [llama.cpp](https://github.com/ggerganov/llama.cpp)**
These two models are known to work well:
* https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF
* https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF (recommended)
To ease the installation process, use the `setup` script that will download both
the embedding and the LLM model and place them in the correct location (under `models` folder):
```bash
poetry run python scripts/setup
```
If you are ok with CPU execution, you can skip the rest of this section.
As stated before, llama.cpp is required and in
particular [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
is used.
> It's highly encouraged that you fully read llama-cpp and llama-cpp-python documentation relevant to your platform.
> Running into installation issues is very likely, and you'll need to troubleshoot them yourself.
#### OSX GPU support
You will need to build [llama.cpp](https://github.com/ggerganov/llama.cpp) with
metal support. To do that run:
```bash
CMAKE_ARGS="-DLLAMA_METAL=on" pip install --force-reinstall --no-cache-dir llama-cpp-python
```
#### Windows GPU support
Windows GPU support is done through CUDA or similar open source technologies.
Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
dependencies.
Some tips to get it working with an NVIDIA card and CUDA (Tested on Windows 10 with CUDA 11.5 RTX 3070):
* Install latest VS2022 (and build tools) https://visualstudio.microsoft.com/vs/community/
* Install CUDA toolkit https://developer.nvidia.com/cuda-downloads
* [Optional] Install CMake to troubleshoot building issues by compiling llama.cpp directly https://cmake.org/download/
If you have all required dependencies properly configured running the
following powershell command should succeed.
```powershell
$env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
```
If your installation was correct, you should see a message similar to the following next
time you start the server `BLAS = 1`.
```
llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
```
Note that llama.cpp offloads matrix calculations to the GPU but the performance is
still hit heavily due to latency between CPU and GPU communication. You might need to tweak
batch sizes and other parameters to get the best performance for your particular system.
#### Linux GPU support
🚧 Under construction 🚧
#### Known issues and Troubleshooting
Execution of LLMs locally still has a lot of sharp edges, specially when running on non Linux platforms.
You might encounter several issues:
* Performance: RAM or VRAM usage is very high, your computer might experience slowdowns or even crashes.
* GPU Virtualization on Windows and OSX: Simply not possible with docker desktop, you have to run the server directly on
the host.
* Building errors: Some of PrivateGPT dependencies need to build native code, and they might fail on some platforms.
Most likely you are missing some dev tools in your machine (updated C++ compiler, CUDA is not on PATH, etc.).
If you encounter any of these issues, please open an issue and we'll try to help.
#### Troubleshooting: C++ Compiler
If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++ compiler on your computer.
**For Windows 10/11**
To install a C++ compiler on Windows 10/11, follow these steps:
1. Install Visual Studio 2022.
2. Make sure the following components are selected:
* Universal Windows Platform development
* C++ CMake tools for Windows
3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/).
4. Run the installer and select the `gcc` component.
#### Troubleshooting: Mac Running Intel
When running a Mac with Intel hardware (not M1), you may run into _clang: error: the clang compiler does not support '-march=native'_ during pip install.
If so set your archflags during pip install. eg: _ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt_
## Running the Server
After following the installation steps you should be ready to go. Here are some common run setups:
### Running 100% locally
Make sure you have followed the *Local LLM requirements* section before moving on.
This command will start PrivateGPT using the `settings.yaml` (default profile) together with the `settings-local.yaml`
configuration files. By default, it will enable both the API and the Gradio UI. Run:
```
PGPT_PROFILES=local make run
```
or
```
PGPT_PROFILES=local poetry run python -m private_gpt
```
When the server is started it will print a log *Application startup complete*.
Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API
using Swagger UI.
### Local server using OpenAI as LLM
If you cannot run a local model (because you don't have a GPU, for example) or for testing purposes, you may
decide to run PrivateGPT using OpenAI as the LLM.
In order to do so, create a profile `settings-openai.yaml` with the following contents:
```yaml
llm:
mode: openai
openai:
api_key: <your_openai_api_key> # You could skip this configuration and use the OPENAI_API_KEY env var instead
```
And run PrivateGPT loading that profile you just created:
```PGPT_PROFILES=openai make run```
or
```PGPT_PROFILES=openai poetry run python -m private_gpt```
> Note this will still use the local Embeddings model, as it is ok to use it on a CPU.
> We'll support using OpenAI embeddings in a future release.
When the server is started it will print a log *Application startup complete*.
Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API.
You'll notice the speed and quality of response is higher, given you are using OpenAI's servers for the heavy
computations.
### Use AWS's Sagemaker
🚧 Under construction 🚧
## Gradio UI user manual
Gradio UI is a ready to use way of testing most of PrivateGPT API functionalities.
![Gradio PrivateGPT](https://lh3.googleusercontent.com/drive-viewer/AK7aPaBTlIX9j5nsQ87XvcRgf3vhv6UG6pgy4j4IH5mYIo6dHcfJ5IUMiVHoqyQwjTnjRITxYTQ3TcF3pfPXyyWB3HS8hKMWDA=s1600)
### Execution Modes
It has 3 modes of execution (you can select in the top-left):
* Query Documents: uses the context from the
ingested documents to answer the questions posted in the chat. It also takes
into account previous chat messages as context.
* Makes use of `/chat/completions` API with `use_context=true` and no
`context_filter`.
* LLM Chat: simple, non-contextual chat with the LLM. The ingested documents won't
be taken into account, only the previous messages.
* Makes use of `/chat/completions` API with `use_context=false`.
* Context Chunks: returns the JSON representation of the 2 most related text
chunks, together with their metadata, source document and previous and next
chunks.
* Makes use of `/chunks` API with no `context_filter`, `limit=2` and
`prev_next_chunks=1`.
### Document Ingestion
Ingest documents by using the `Upload a File` button. You can check the progress of
the ingestion in the console logs of the server.
The list of ingested files is shown below the button.
If you want to delete the ingested documents, refer to *Reset Local documents
database* section in the documentation.
### Chat
Normal chat interface, self-explanatory ;)
You can check the actual prompt being passed to the LLM by looking at the logs of
the server. We'll add better observability in future releases.
## Deployment options
🚧 We are working on Dockerized deployment guidelines 🚧
## Observability
Basic logs are enabled using LlamaIndex
basic logging (for example ingestion progress or LLM prompts and answers).
🚧 We are working on improved Observability. 🚧
## Ingesting & Managing Documents
🚧 Document Update and Delete are still WIP. 🚧
The ingestion of documents can be done in different ways:
* Using the `/ingest` API
* Using the Gradio UI
* Using the Bulk Local Ingestion functionality (check next section)
### Bulk Local Ingestion
When you are running PrivateGPT in a fully local setup, you can ingest a complete folder for convenience (containing
pdf, text files, etc.)
and optionally watch changes on it with the command:
```bash
make ingest /path/to/folder -- --watch
```
After ingestion is complete, you should be able to chat with your documents
by navigating to http://localhost:8001 and using the option `Query documents`,
or using the completions / chat API.
### Reset Local documents database
When running in a local setup, you can remove all ingested documents by simply
deleting all contents of `local_data` folder (except .gitignore).
## API
As explained in the introduction, the API contains high level APIs (ingestion and chat/completions) and low level APIs
(embeddings and chunk retrieval). In this section the different specific API calls are explained.

22
docs/index.html Normal file
View File

@ -0,0 +1,22 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>PrivateGPT Docs</title>
<!-- needed for adaptive design -->
<meta name="viewport" content="width=device-width, initial-scale=1">
<link href="https://fonts.googleapis.com/css?family=Montserrat:300,400,700|Roboto:300,400,700" rel="stylesheet">
<link rel="shortcut icon" href="https://fastapi.tiangolo.com/img/favicon.png">
<!-- ReDoc doesn't change outer page styles -->
<style>
body {
margin: 0;
padding: 0;
}
</style>
</head>
<body>
<noscript> ReDoc requires Javascript to function. Please enable it to browse the documentation. </noscript>
<redoc spec-url="/openapi.json"></redoc>
<script src="https://cdn.jsdelivr.net/npm/redoc@next/bundles/redoc.standalone.js"></script>
</body>

BIN
docs/logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.6 KiB

959
docs/openapi.json Normal file

File diff suppressed because one or more lines are too long

View File

@ -1,7 +0,0 @@
PERSIST_DIRECTORY=db
MODEL_TYPE=GPT4All
MODEL_PATH=models/ggml-gpt4all-j-v1.3-groovy.bin
EMBEDDINGS_MODEL_NAME=all-MiniLM-L6-v2
MODEL_N_CTX=1000
MODEL_N_BATCH=8
TARGET_SOURCE_CHUNKS=4

185
ingest.py
View File

@ -1,185 +0,0 @@
#!/usr/bin/env python3
import os
import glob
from typing import List
from dotenv import load_dotenv
from multiprocessing import Pool
from tqdm import tqdm
from langchain.document_loaders import (
CSVLoader,
EverNoteLoader,
PyMuPDFLoader,
TextLoader,
UnstructuredEmailLoader,
UnstructuredEPubLoader,
UnstructuredHTMLLoader,
UnstructuredMarkdownLoader,
UnstructuredODTLoader,
UnstructuredPowerPointLoader,
UnstructuredWordDocumentLoader,
)
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.docstore.document import Document
if not load_dotenv():
print("Could not load .env file or it is empty. Please check if it exists and is readable.")
exit(1)
from constants import CHROMA_SETTINGS
import chromadb
from chromadb.api.segment import API
# Load environment variables
persist_directory = os.environ.get('PERSIST_DIRECTORY')
source_directory = os.environ.get('SOURCE_DIRECTORY', 'source_documents')
embeddings_model_name = os.environ.get('EMBEDDINGS_MODEL_NAME')
chunk_size = 500
chunk_overlap = 50
# Custom document loaders
class MyElmLoader(UnstructuredEmailLoader):
"""Wrapper to fallback to text/plain when default does not work"""
def load(self) -> List[Document]:
"""Wrapper adding fallback for elm without html"""
try:
try:
doc = UnstructuredEmailLoader.load(self)
except ValueError as e:
if 'text/html content not found in email' in str(e):
# Try plain text
self.unstructured_kwargs["content_source"]="text/plain"
doc = UnstructuredEmailLoader.load(self)
else:
raise
except Exception as e:
# Add file_path to exception message
raise type(e)(f"{self.file_path}: {e}") from e
return doc
# Map file extensions to document loaders and their arguments
LOADER_MAPPING = {
".csv": (CSVLoader, {}),
# ".docx": (Docx2txtLoader, {}),
".doc": (UnstructuredWordDocumentLoader, {}),
".docx": (UnstructuredWordDocumentLoader, {}),
".enex": (EverNoteLoader, {}),
".eml": (MyElmLoader, {}),
".epub": (UnstructuredEPubLoader, {}),
".html": (UnstructuredHTMLLoader, {}),
".md": (UnstructuredMarkdownLoader, {}),
".odt": (UnstructuredODTLoader, {}),
".pdf": (PyMuPDFLoader, {}),
".ppt": (UnstructuredPowerPointLoader, {}),
".pptx": (UnstructuredPowerPointLoader, {}),
".txt": (TextLoader, {"encoding": "utf8"}),
# Add more mappings for other file extensions and loaders as needed
}
def load_single_document(file_path: str) -> List[Document]:
ext = "." + file_path.rsplit(".", 1)[-1].lower()
if ext in LOADER_MAPPING:
loader_class, loader_args = LOADER_MAPPING[ext]
loader = loader_class(file_path, **loader_args)
return loader.load()
raise ValueError(f"Unsupported file extension '{ext}'")
def load_documents(source_dir: str, ignored_files: List[str] = []) -> List[Document]:
"""
Loads all documents from the source documents directory, ignoring specified files
"""
all_files = []
for ext in LOADER_MAPPING:
all_files.extend(
glob.glob(os.path.join(source_dir, f"**/*{ext.lower()}"), recursive=True)
)
all_files.extend(
glob.glob(os.path.join(source_dir, f"**/*{ext.upper()}"), recursive=True)
)
filtered_files = [file_path for file_path in all_files if file_path not in ignored_files]
with Pool(processes=os.cpu_count()) as pool:
results = []
with tqdm(total=len(filtered_files), desc='Loading new documents', ncols=80) as pbar:
for i, docs in enumerate(pool.imap_unordered(load_single_document, filtered_files)):
results.extend(docs)
pbar.update()
return results
def process_documents(ignored_files: List[str] = []) -> List[Document]:
"""
Load documents and split in chunks
"""
print(f"Loading documents from {source_directory}")
documents = load_documents(source_directory, ignored_files)
if not documents:
print("No new documents to load")
exit(0)
print(f"Loaded {len(documents)} new documents from {source_directory}")
text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
documents = text_splitter.split_documents(documents)
print(f"Split into {len(documents)} chunks of text (max. {chunk_size} tokens each)")
return documents
def batch_chromadb_insertions(chroma_client: API, documents: List[Document]) -> List[Document]:
"""
Split the total documents to be inserted into batches of documents that the local chroma client can process
"""
# Get max batch size.
max_batch_size = chroma_client.max_batch_size
for i in range(0, len(documents), max_batch_size):
yield documents[i:i + max_batch_size]
def does_vectorstore_exist(persist_directory: str, embeddings: HuggingFaceEmbeddings) -> bool:
"""
Checks if vectorstore exists
"""
db = Chroma(persist_directory=persist_directory, embedding_function=embeddings)
if not db.get()['documents']:
return False
return True
def main():
# Create embeddings
embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
# Chroma client
chroma_client = chromadb.PersistentClient(settings=CHROMA_SETTINGS , path=persist_directory)
if does_vectorstore_exist(persist_directory, embeddings):
# Update and store locally vectorstore
print(f"Appending to existing vectorstore at {persist_directory}")
db = Chroma(persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS, client=chroma_client)
collection = db.get()
documents = process_documents([metadata['source'] for metadata in collection['metadatas']])
print(f"Creating embeddings. May take some minutes...")
for batched_chromadb_insertion in batch_chromadb_insertions(chroma_client, documents):
db.add_documents(batched_chromadb_insertion)
else:
# Create and store locally vectorstore
print("Creating new vectorstore")
documents = process_documents()
print(f"Creating embeddings. May take some minutes...")
# Create the db with the first batch of documents to insert
batched_chromadb_insertions = batch_chromadb_insertions(chroma_client, documents)
first_insertion = next(batched_chromadb_insertions)
db = Chroma.from_documents(first_insertion, embeddings, persist_directory=persist_directory, client_settings=CHROMA_SETTINGS, client=chroma_client)
# Add the rest of batches of documents
for batched_chromadb_insertion in batched_chromadb_insertions:
db.add_documents(batched_chromadb_insertion)
print(f"Ingestion complete! You can now run privateGPT.py to query your documents")
if __name__ == "__main__":
main()

2
local_data/.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
*
!.gitignore

2
models/.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
*
!.gitignore

4729
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -1,87 +0,0 @@
#!/usr/bin/env python3
from dotenv import load_dotenv
from langchain.chains import RetrievalQA
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.vectorstores import Chroma
from langchain.llms import GPT4All, LlamaCpp
import chromadb
import os
import argparse
import time
if not load_dotenv():
print("Could not load .env file or it is empty. Please check if it exists and is readable.")
exit(1)
embeddings_model_name = os.environ.get("EMBEDDINGS_MODEL_NAME")
persist_directory = os.environ.get('PERSIST_DIRECTORY')
model_type = os.environ.get('MODEL_TYPE')
model_path = os.environ.get('MODEL_PATH')
model_n_ctx = os.environ.get('MODEL_N_CTX')
model_n_batch = int(os.environ.get('MODEL_N_BATCH',8))
target_source_chunks = int(os.environ.get('TARGET_SOURCE_CHUNKS',4))
from constants import CHROMA_SETTINGS
def main():
# Parse the command line arguments
args = parse_arguments()
embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
chroma_client = chromadb.PersistentClient(settings=CHROMA_SETTINGS , path=persist_directory)
db = Chroma(persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS, client=chroma_client)
retriever = db.as_retriever(search_kwargs={"k": target_source_chunks})
# activate/deactivate the streaming StdOut callback for LLMs
callbacks = [] if args.mute_stream else [StreamingStdOutCallbackHandler()]
# Prepare the LLM
match model_type:
case "LlamaCpp":
llm = LlamaCpp(model_path=model_path, max_tokens=model_n_ctx, n_batch=model_n_batch, callbacks=callbacks, verbose=False)
case "GPT4All":
llm = GPT4All(model=model_path, max_tokens=model_n_ctx, backend='gptj', n_batch=model_n_batch, callbacks=callbacks, verbose=False)
case _default:
# raise exception if model_type is not supported
raise Exception(f"Model type {model_type} is not supported. Please choose one of the following: LlamaCpp, GPT4All")
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents= not args.hide_source)
# Interactive questions and answers
while True:
query = input("\nEnter a query: ")
if query == "exit":
break
if query.strip() == "":
continue
# Get the answer from the chain
start = time.time()
res = qa(query)
answer, docs = res['result'], [] if args.hide_source else res['source_documents']
end = time.time()
# Print the result
print("\n\n> Question:")
print(query)
print(f"\n> Answer (took {round(end - start, 2)} s.):")
print(answer)
# Print the relevant sources used for the answer
for document in docs:
print("\n> " + document.metadata["source"] + ":")
print(document.page_content)
def parse_arguments():
parser = argparse.ArgumentParser(description='privateGPT: Ask questions to your documents without an internet connection, '
'using the power of LLMs.')
parser.add_argument("--hide-source", "-S", action='store_true',
help='Use this flag to disable printing of source documents used for answers.')
parser.add_argument("--mute-stream", "-M",
action='store_true',
help='Use this flag to disable the streaming StdOut callback for LLMs.')
return parser.parse_args()
if __name__ == "__main__":
main()

1
private_gpt/__init__.py Normal file
View File

@ -0,0 +1 @@
"""private-gpt."""

8
private_gpt/__main__.py Normal file
View File

@ -0,0 +1,8 @@
# start a fastapi server with uvicorn
import uvicorn
from private_gpt.main import app
from private_gpt.settings.settings import settings
uvicorn.run(app, host="0.0.0.0", port=settings.server.port)

View File

View File

@ -0,0 +1,26 @@
from injector import inject, singleton
from llama_index import MockEmbedding
from llama_index.embeddings.base import BaseEmbedding
from private_gpt.paths import models_cache_path
from private_gpt.settings.settings import settings
@singleton
class EmbeddingComponent:
embedding_model: BaseEmbedding
@inject
def __init__(self) -> None:
match settings.llm.mode:
case "mock":
# Not a random number, is the dimensionality used by
# the default embedding model
self.embedding_model = MockEmbedding(384)
case _:
from llama_index.embeddings import HuggingFaceEmbedding
self.embedding_model = HuggingFaceEmbedding(
model_name=settings.local.embedding_hf_model_name,
cache_folder=str(models_cache_path),
)

View File

@ -0,0 +1 @@
"""LLM implementations."""

View File

@ -0,0 +1,248 @@
# mypy: ignore-errors
from __future__ import annotations
import io
import json
from typing import TYPE_CHECKING, Any
import boto3 # type: ignore
from llama_index.bridge.pydantic import Field
from llama_index.llms import (
CompletionResponse,
CustomLLM,
LLMMetadata,
)
from llama_index.llms.base import llm_completion_callback
from llama_index.llms.llama_utils import (
completion_to_prompt as generic_completion_to_prompt,
)
from llama_index.llms.llama_utils import (
messages_to_prompt as generic_messages_to_prompt,
)
if TYPE_CHECKING:
from collections.abc import Callable
from llama_index.callbacks import CallbackManager
from llama_index.llms import (
CompletionResponseGen,
)
class LineIterator:
r"""A helper class for parsing the byte stream input from TGI container.
The output of the model will be in the following format:
```
b'data:{"token": {"text": " a"}}\n\n'
b'data:{"token": {"text": " challenging"}}\n\n'
b'data:{"token": {"text": " problem"
b'}}'
...
```
While usually each PayloadPart event from the event stream will contain a byte array
with a full json, this is not guaranteed and some of the json objects may be split
across PayloadPart events. For example:
```
{'PayloadPart': {'Bytes': b'{"outputs": '}}
{'PayloadPart': {'Bytes': b'[" problem"]}\n'}}
```
This class accounts for this by concatenating bytes written via the 'write' function
and then exposing a method which will return lines (ending with a '\n' character)
within the buffer via the 'scan_lines' function. It maintains the position of the
last read position to ensure that previous bytes are not exposed again. It will
also save any pending lines that doe not end with a '\n' to make sure truncations
are concatinated
"""
def __init__(self, stream: Any) -> None:
"""Line iterator initializer."""
self.byte_iterator = iter(stream)
self.buffer = io.BytesIO()
self.read_pos = 0
def __iter__(self) -> Any:
"""Self iterator."""
return self
def __next__(self) -> Any:
"""Next element from iterator."""
while True:
self.buffer.seek(self.read_pos)
line = self.buffer.readline()
if line and line[-1] == ord("\n"):
self.read_pos += len(line)
return line[:-1]
try:
chunk = next(self.byte_iterator)
except StopIteration:
if self.read_pos < self.buffer.getbuffer().nbytes:
continue
raise
if "PayloadPart" not in chunk:
print("Unknown event type:" + chunk)
continue
self.buffer.seek(0, io.SEEK_END)
self.buffer.write(chunk["PayloadPart"]["Bytes"])
class SagemakerLLM(CustomLLM):
"""Sagemaker Inference Endpoint models.
To use, you must supply the endpoint name from your deployed
Sagemaker model & the region where it is deployed.
To authenticate, the AWS client uses the following methods to
automatically load credentials:
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
If a specific credential profile should be used, you must pass
the name of the profile from the ~/.aws/credentials file that is to be used.
Make sure the credentials / roles used have the required policies to
access the Sagemaker endpoint.
See: https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html
"""
endpoint_name: str = Field(description="")
temperature: float = Field(description="The temperature to use for sampling.")
max_new_tokens: int = Field(description="The maximum number of tokens to generate.")
context_window: int = Field(
description="The maximum number of context tokens for the model."
)
messages_to_prompt: Callable[..., str] = Field(
description="The function to convert messages to a prompt.", exclude=True
)
completion_to_prompt: Callable[..., str] = Field(
description="The function to convert a completion to a prompt.", exclude=True
)
generate_kwargs: dict[str, Any] = Field(
default_factory=dict, description="Kwargs used for generation."
)
model_kwargs: dict[str, Any] = Field(
default_factory=dict, description="Kwargs used for model initialization."
)
verbose: bool = Field(description="Whether to print verbose output.")
_boto_client: Any = boto3.client(
"sagemaker-runtime",
) # TODO make it an optional field
def __init__(
self,
endpoint_name: str | None = "",
temperature: float = 0.1,
max_new_tokens: int = 512, # to review defaults
context_window: int = 2048, # to review defaults
messages_to_prompt: Any = None,
completion_to_prompt: Any = None,
callback_manager: CallbackManager | None = None,
generate_kwargs: dict[str, Any] | None = None,
model_kwargs: dict[str, Any] | None = None,
verbose: bool = True,
) -> None:
"""SagemakerLLM initializer."""
model_kwargs = model_kwargs or {}
model_kwargs.update({"n_ctx": context_window, "verbose": verbose})
messages_to_prompt = messages_to_prompt or generic_messages_to_prompt
completion_to_prompt = completion_to_prompt or generic_completion_to_prompt
generate_kwargs = generate_kwargs or {}
generate_kwargs.update(
{"temperature": temperature, "max_tokens": max_new_tokens}
)
super().__init__(
endpoint_name=endpoint_name,
temperature=temperature,
context_window=context_window,
max_new_tokens=max_new_tokens,
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
callback_manager=callback_manager,
generate_kwargs=generate_kwargs,
model_kwargs=model_kwargs,
verbose=verbose,
)
@property
def inference_params(self):
# TODO expose the rest of params
return {
"do_sample": True,
"top_p": 0.7,
"temperature": self.temperature,
"top_k": 50,
"max_new_tokens": self.max_new_tokens,
}
@property
def metadata(self) -> LLMMetadata:
"""Get LLM metadata."""
return LLMMetadata(
context_window=self.context_window,
num_output=self.max_new_tokens,
model_name="Sagemaker LLama 2",
)
@llm_completion_callback()
def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:
self.generate_kwargs.update({"stream": False})
is_formatted = kwargs.pop("formatted", False)
if not is_formatted:
prompt = self.completion_to_prompt(prompt)
request_params = {
"inputs": prompt,
"stream": False,
"parameters": self.inference_params,
}
resp = self._boto_client.invoke_endpoint(
EndpointName=self.endpoint_name,
Body=json.dumps(request_params),
ContentType="application/json",
)
response_body = resp["Body"]
response_str = response_body.read().decode("utf-8")
response_dict = eval(response_str)
return CompletionResponse(
text=response_dict[0]["generated_text"][len(prompt) :], raw=resp
)
@llm_completion_callback()
def stream_complete(self, prompt: str, **kwargs: Any) -> CompletionResponseGen:
def get_stream():
text = ""
request_params = {
"inputs": prompt,
"stream": True,
"parameters": self.inference_params,
}
resp = self._boto_client.invoke_endpoint_with_response_stream(
EndpointName=self.endpoint_name,
Body=json.dumps(request_params),
ContentType="application/json",
)
event_stream = resp["Body"]
start_json = b"{"
stop_token = "<|endoftext|>"
for line in LineIterator(event_stream):
if line != b"" and start_json in line:
data = json.loads(line[line.find(start_json) :].decode("utf-8"))
if data["token"]["text"] != stop_token:
delta = data["token"]["text"]
text += delta
yield CompletionResponse(delta=delta, text=text, raw=data)
return get_stream()

View File

@ -0,0 +1,47 @@
from injector import inject, singleton
from llama_index.llms import MockLLM
from llama_index.llms.base import LLM
from llama_index.llms.llama_utils import completion_to_prompt, messages_to_prompt
from private_gpt.paths import models_path
from private_gpt.settings.settings import settings
@singleton
class LLMComponent:
llm: LLM
@inject
def __init__(self) -> None:
match settings.llm.mode:
case "local":
from llama_index.llms import LlamaCPP
self.llm = LlamaCPP(
model_path=str(models_path / settings.local.llm_hf_model_file),
temperature=0.1,
# llama2 has a context window of 4096 tokens,
# but we set it lower to allow for some wiggle room
context_window=3900,
generate_kwargs={},
# All to GPU
model_kwargs={"n_gpu_layers": -1},
# transform inputs into Llama2 format
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
verbose=True,
)
case "sagemaker":
from private_gpt.components.llm.custom.sagemaker import SagemakerLLM
self.llm = SagemakerLLM(
endpoint_name=settings.sagemaker.endpoint_name,
)
case "openai":
from llama_index.llms import OpenAI
openai_settings = settings.openai.api_key
self.llm = OpenAI(api_key=openai_settings)
case "mock":
self.llm = MockLLM()

View File

@ -0,0 +1,28 @@
from injector import inject, singleton
from llama_index.storage.docstore import BaseDocumentStore, SimpleDocumentStore
from llama_index.storage.index_store import SimpleIndexStore
from llama_index.storage.index_store.types import BaseIndexStore
from private_gpt.paths import local_data_path
@singleton
class NodeStoreComponent:
index_store: BaseIndexStore
doc_store: BaseDocumentStore
@inject
def __init__(self) -> None:
try:
self.index_store = SimpleIndexStore.from_persist_dir(
persist_dir=str(local_data_path)
)
except FileNotFoundError:
self.index_store = SimpleIndexStore()
try:
self.doc_store = SimpleDocumentStore.from_persist_dir(
persist_dir=str(local_data_path)
)
except FileNotFoundError:
self.doc_store = SimpleDocumentStore()

View File

@ -0,0 +1,61 @@
import typing
import chromadb
from injector import inject, singleton
from llama_index import VectorStoreIndex
from llama_index.indices.vector_store import VectorIndexRetriever
from llama_index.vector_stores import ChromaVectorStore
from llama_index.vector_stores.types import VectorStore
from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.paths import local_data_path
@typing.no_type_check
def _chromadb_doc_id_metadata_filter(
context_filter: ContextFilter | None,
) -> dict | None:
if context_filter is None or context_filter.docs_ids is None:
return {} # No filter
elif len(context_filter.docs_ids) < 1:
return {"doc_id": "-"} # Effectively filtering out all docs
else:
doc_filter_items = []
if len(context_filter.docs_ids) > 1:
doc_filter = {"$or": doc_filter_items}
for doc_id in context_filter.docs_ids:
doc_filter_items.append({"doc_id": doc_id})
else:
doc_filter = {"doc_id": context_filter.docs_ids[0]}
return doc_filter
@singleton
class VectorStoreComponent:
vector_store: VectorStore
@inject
def __init__(self) -> None:
db = chromadb.PersistentClient(
path=str((local_data_path / "chroma_db").absolute())
)
chroma_collection = db.get_or_create_collection(
"make_this_parameterizable_per_api_call"
) # TODO
self.vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
@staticmethod
def get_retriever(
index: VectorStoreIndex,
context_filter: ContextFilter | None = None,
similarity_top_k: int = 2,
) -> VectorIndexRetriever:
# TODO this 'where' is specific to chromadb. Implement other vector stores
return VectorIndexRetriever(
index=index,
similarity_top_k=similarity_top_k,
vector_store_kwargs={
"where": _chromadb_doc_id_metadata_filter(context_filter)
},
)

3
private_gpt/constants.py Normal file
View File

@ -0,0 +1,3 @@
from pathlib import Path
PROJECT_ROOT_PATH: Path = Path(__file__).parents[1]

9
private_gpt/di.py Normal file
View File

@ -0,0 +1,9 @@
from injector import Injector
def create_application_injector() -> Injector:
injector = Injector(auto_bind=True)
return injector
root_injector: Injector = create_application_injector()

125
private_gpt/main.py Normal file
View File

@ -0,0 +1,125 @@
"""FastAPI app creation, logger configuration and main API routes."""
import sys
from typing import Any
import llama_index
from fastapi import FastAPI
from fastapi.openapi.utils import get_openapi
from loguru import logger
from private_gpt.paths import docs_path
from private_gpt.server.chat.chat_router import chat_router
from private_gpt.server.chunks.chunks_router import chunks_router
from private_gpt.server.completions.completions_router import completions_router
from private_gpt.server.embeddings.embeddings_router import embeddings_router
from private_gpt.server.health.health_router import health_router
from private_gpt.server.ingest.ingest_router import ingest_router
from private_gpt.settings.settings import settings
# Remove pre-configured logging handler
logger.remove(0)
# Create a new logging handler same as the pre-configured one but with the extra
# attribute `request_id`
logger.add(
sys.stdout,
level="INFO",
format=(
"<green>{time:YYYY-MM-DD HH:mm:ss.SSS}</green> | "
"<level>{level: <8}</level> | "
"<cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> | "
"ID: {extra[request_id]} - <level>{message}</level>"
),
)
# Add LlamaIndex simple observability
llama_index.set_global_handler("simple")
# Start the API
with open(docs_path / "description.md") as description_file:
description = description_file.read()
tags_metadata = [
{
"name": "Ingestion",
"description": "High-level APIs covering document ingestion -internally "
"managing document parsing, splitting,"
"metadata extraction, embedding generation and storage- and ingested "
"documents CRUD."
"Each ingested document is identified by an ID that can be used to filter the "
"context"
"used in *Contextual Completions* and *Context Chunks* APIs.",
},
{
"name": "Contextual Completions",
"description": "High-level APIs covering contextual Chat and Completions. They "
"follow OpenAI's format, extending it to "
"allow using the context coming from ingested documents to create the "
"response. Internally"
"manage context retrieval, prompt engineering and the response generation.",
},
{
"name": "Context Chunks",
"description": "Low-level API that given a query return relevant chunks of "
"text coming from the ingested"
"documents.",
},
{
"name": "Embeddings",
"description": "Low-level API to obtain the vector representation of a given "
"text, using an Embeddings model."
"Follows OpenAI's embeddings API format.",
},
{
"name": "Health",
"description": "Simple health API to make sure the server is up and running.",
},
]
app = FastAPI()
def custom_openapi() -> dict[str, Any]:
if app.openapi_schema:
return app.openapi_schema
openapi_schema = get_openapi(
title="PrivateGPT",
description=description,
version="0.1.0",
summary="PrivateGPT is a production-ready AI project that allows you to "
"ask questions to your documents using the power of Large Language "
"Models (LLMs), even in scenarios without Internet connection. "
"100% private, no data leaves your execution environment at any point.",
contact={
"url": "https://github.com/imartinez/privateGPT",
},
license_info={
"name": "Apache 2.0",
"url": "https://www.apache.org/licenses/LICENSE-2.0.html",
},
routes=app.routes,
tags=tags_metadata,
)
openapi_schema["info"]["x-logo"] = {
"url": "https://lh3.googleusercontent.com/drive-viewer"
"/AK7aPaD_iNlMoTquOBsw4boh4tIYxyEuhz6EtEs8nzq3yNkNAK00xGj"
"E1KUCmPJSk3TYOjcs6tReG6w_cLu1S7L_gPgT9z52iw=s2560"
}
app.openapi_schema = openapi_schema
return app.openapi_schema
app.openapi = custom_openapi # type: ignore[method-assign]
app.include_router(completions_router)
app.include_router(chat_router)
app.include_router(chunks_router)
app.include_router(ingest_router)
app.include_router(embeddings_router)
app.include_router(health_router)
if settings.ui.enabled:
from private_gpt.ui.ui import mount_in_app
mount_in_app(app)

View File

@ -0,0 +1 @@
"""OpenAI compatibility utilities."""

View File

@ -0,0 +1 @@
"""OpenAI API extensions."""

View File

@ -0,0 +1,7 @@
from pydantic import BaseModel, Field
class ContextFilter(BaseModel):
docs_ids: list[str] | None = Field(
examples=[["c202d5e6-7b69-4869-81cc-dd574ee8ee11"]]
)

View File

@ -0,0 +1,103 @@
import time
import uuid
from collections.abc import Iterator
from llama_index.llms import ChatResponse, CompletionResponse
from pydantic import BaseModel, Field
class OpenAIDelta(BaseModel):
"""A piece of completion that needs to be concatenated to get the full message."""
content: str | None
class OpenAIMessage(BaseModel):
"""Inference result, with the source of the message.
Role could be the assistant or system
(providing a default response, not AI generated).
"""
role: str = Field(default="user", enum=["assistant", "system", "user"])
content: str | None
class OpenAIChoice(BaseModel):
"""Response from AI.
Either the delta or the message will be present, but never both.
"""
finish_reason: str | None = Field(examples=["stop"])
delta: OpenAIDelta | None = None
message: OpenAIMessage | None = None
index: int = 0
class OpenAICompletion(BaseModel):
"""Clone of OpenAI Completion model.
For more information see: https://platform.openai.com/docs/api-reference/chat/object
"""
id: str
object: str = Field("completion", enum=["completion", "completion.chunk"])
created: int = Field(..., examples=[1623340000])
model: str = Field(enum=["private-gpt"])
choices: list[OpenAIChoice]
@classmethod
def from_text(
cls, text: str | None, finish_reason: str | None = None
) -> "OpenAICompletion":
return OpenAICompletion(
id=str(uuid.uuid4()),
object="completion",
created=int(time.time()),
model="private-gpt",
choices=[
OpenAIChoice(
message=OpenAIMessage(role="assistant", content=text),
finish_reason=finish_reason,
)
],
)
@classmethod
def json_from_delta(
cls, *, text: str | None, finish_reason: str | None = None
) -> str:
chunk = OpenAICompletion(
id=str(uuid.uuid4()),
object="completion.chunk",
created=int(time.time()),
model="private-gpt",
choices=[
OpenAIChoice(
delta=OpenAIDelta(content=text),
finish_reason=finish_reason,
)
],
)
return chunk.model_dump_json()
def to_openai_response(response: str | ChatResponse) -> OpenAICompletion:
if isinstance(response, ChatResponse):
return OpenAICompletion.from_text(response.delta, finish_reason="stop")
else:
return OpenAICompletion.from_text(response, finish_reason="stop")
def to_openai_sse_stream(
response_generator: Iterator[str | CompletionResponse | ChatResponse],
) -> Iterator[str]:
for response in response_generator:
if isinstance(response, CompletionResponse | ChatResponse):
yield f"data: {OpenAICompletion.json_from_delta(text=response.delta)}\n\n"
else:
yield f"data: {OpenAICompletion.json_from_delta(text=response)}\n\n"
yield f"data: {OpenAICompletion.json_from_delta(text=None, finish_reason='stop')}\n\n"
yield "data: [DONE]\n\n"

16
private_gpt/paths.py Normal file
View File

@ -0,0 +1,16 @@
from pathlib import Path
from private_gpt.constants import PROJECT_ROOT_PATH
from private_gpt.settings.settings import settings
def _absolute_or_from_project_root(path: str) -> Path:
if path.startswith("/"):
return Path(path)
return PROJECT_ROOT_PATH / path
models_path: Path = PROJECT_ROOT_PATH / "models"
models_cache_path: Path = models_path / "cache"
docs_path: Path = PROJECT_ROOT_PATH / "docs"
local_data_path: Path = _absolute_or_from_project_root(settings.data.local_data_folder)

View File

@ -0,0 +1 @@
"""private-gpt server."""

View File

View File

@ -0,0 +1,82 @@
from fastapi import APIRouter
from llama_index.llms import ChatMessage, MessageRole
from pydantic import BaseModel
from starlette.responses import StreamingResponse
from private_gpt.di import root_injector
from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.open_ai.openai_models import (
OpenAICompletion,
OpenAIMessage,
to_openai_response,
to_openai_sse_stream,
)
from private_gpt.server.chat.chat_service import ChatService
chat_router = APIRouter(prefix="/v1")
class ChatBody(BaseModel):
messages: list[OpenAIMessage]
use_context: bool = False
context_filter: ContextFilter | None = None
stream: bool = False
model_config = {
"json_schema_extra": {
"examples": [
{
"messages": [
{
"role": "user",
"content": "How do you fry an egg?",
}
],
"stream": False,
"use_context": True,
"context_filter": {
"docs_ids": ["c202d5e6-7b69-4869-81cc-dd574ee8ee11"]
},
}
]
}
}
@chat_router.post(
"/chat/completions",
response_model=None,
responses={200: {"model": OpenAICompletion}},
tags=["Contextual Completions"],
)
def chat_completion(body: ChatBody) -> OpenAICompletion | StreamingResponse:
"""Given a list of messages comprising a conversation, return a response.
If `use_context` is set to `true`, the model will use context coming
from the ingested documents to create the response. The documents being used can
be filtered using the `context_filter` and passing the document IDs to be used.
Ingested documents IDs can be found using `/ingest/list` endpoint. If you want
all ingested documents to be used, remove `context_filter` altogether.
When using `'stream': true`, the API will return data chunks following [OpenAI's
streaming model](https://platform.openai.com/docs/api-reference/chat/streaming):
```
{"id":"12345","object":"completion.chunk","created":1694268190,
"model":"private-gpt","choices":[{"index":0,"delta":{"content":"Hello"},
"finish_reason":null}]}
```
"""
service = root_injector.get(ChatService)
all_messages = [
ChatMessage(content=m.content, role=MessageRole(m.role)) for m in body.messages
]
if body.stream:
stream = service.stream_chat(
all_messages, body.use_context, body.context_filter
)
return StreamingResponse(
to_openai_sse_stream(stream), media_type="text/event-stream"
)
else:
response = service.chat(all_messages, body.use_context, body.context_filter)
return to_openai_response(response)

View File

@ -0,0 +1,116 @@
from collections.abc import Sequence
from typing import TYPE_CHECKING, Any
from injector import inject, singleton
from llama_index import ServiceContext, StorageContext, VectorStoreIndex
from llama_index.chat_engine import ContextChatEngine
from llama_index.indices.postprocessor import MetadataReplacementPostProcessor
from llama_index.llm_predictor.utils import stream_chat_response_to_tokens
from llama_index.llms import ChatMessage
from llama_index.types import TokenGen
from private_gpt.components.embedding.embedding_component import EmbeddingComponent
from private_gpt.components.llm.llm_component import LLMComponent
from private_gpt.components.node_store.node_store_component import NodeStoreComponent
from private_gpt.components.vector_store.vector_store_component import (
VectorStoreComponent,
)
from private_gpt.open_ai.extensions.context_filter import ContextFilter
if TYPE_CHECKING:
from llama_index.chat_engine.types import (
AgentChatResponse,
StreamingAgentChatResponse,
)
@singleton
class ChatService:
@inject
def __init__(
self,
llm_component: LLMComponent,
vector_store_component: VectorStoreComponent,
embedding_component: EmbeddingComponent,
node_store_component: NodeStoreComponent,
) -> None:
self.llm_service = llm_component
self.vector_store_component = vector_store_component
self.storage_context = StorageContext.from_defaults(
vector_store=vector_store_component.vector_store,
docstore=node_store_component.doc_store,
index_store=node_store_component.index_store,
)
self.service_context = ServiceContext.from_defaults(
llm=llm_component.llm, embed_model=embedding_component.embedding_model
)
self.index = VectorStoreIndex.from_vector_store(
vector_store_component.vector_store,
storage_context=self.storage_context,
service_context=self.service_context,
show_progress=True,
)
def _chat_with_contex(
self,
message: str,
context_filter: ContextFilter | None = None,
chat_history: Sequence[ChatMessage] | None = None,
streaming: bool = False,
) -> Any:
vector_index_retriever = self.vector_store_component.get_retriever(
index=self.index, context_filter=context_filter
)
chat_engine = ContextChatEngine.from_defaults(
retriever=vector_index_retriever,
service_context=self.service_context,
node_postprocessors=[
MetadataReplacementPostProcessor(target_metadata_key="window"),
],
)
if streaming:
result = chat_engine.stream_chat(message, chat_history)
else:
result = chat_engine.chat(message, chat_history)
return result
def stream_chat(
self,
messages: list[ChatMessage],
use_context: bool = False,
context_filter: ContextFilter | None = None,
) -> TokenGen:
if use_context:
last_message = messages[-1].content
response: StreamingAgentChatResponse = self._chat_with_contex(
message=last_message if last_message is not None else "",
chat_history=messages[:-1],
context_filter=context_filter,
streaming=True,
)
response_gen = response.response_gen
else:
stream = self.llm_service.llm.stream_chat(messages)
response_gen = stream_chat_response_to_tokens(stream)
return response_gen
def chat(
self,
messages: list[ChatMessage],
use_context: bool = False,
context_filter: ContextFilter | None = None,
) -> str:
if use_context:
last_message = messages[-1].content
wrapped_response: AgentChatResponse = self._chat_with_contex(
message=last_message if last_message is not None else "",
chat_history=messages[:-1],
context_filter=context_filter,
streaming=False,
)
response = wrapped_response.response
else:
chat_response = self.llm_service.llm.chat(messages)
response_content = chat_response.message.content
response = response_content if response_content is not None else ""
return response

View File

View File

@ -0,0 +1,53 @@
from fastapi import APIRouter
from pydantic import BaseModel, Field
from private_gpt.di import root_injector
from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.server.chunks.chunks_service import Chunk, ChunksService
chunks_router = APIRouter(prefix="/v1")
class ChunksBody(BaseModel):
text: str = Field(examples=["Q3 2023 sales"])
context_filter: ContextFilter | None = None
limit: int = 10
prev_next_chunks: int = Field(default=0, examples=[2])
class ChunksResponse(BaseModel):
object: str = Field(enum=["list"])
model: str = Field(enum=["private-gpt"])
data: list[Chunk]
@chunks_router.post("/chunks", tags=["Context Chunks"])
def chunks_retrieval(body: ChunksBody) -> ChunksResponse:
"""Given a `text`, returns the most relevant chunks from the ingested documents.
The returned information can be used to generate prompts that can be
passed to `/completions` or `/chat/completions` APIs. Note: it is usually a very
fast API, because only the Embeddings model is involved, not the LLM. The
returned information contains the relevant chunk `text` together with the source
`document` it is coming from. It also contains a score that can be used to
compare different results.
The max number of chunks to be returned is set using the `limit` param.
Previous and next chunks (pieces of text that appear right before or after in the
document) can be fetched by using the `prev_next_chunks` field.
The documents being used can be filtered using the `context_filter` and passing
the document IDs to be used. Ingested documents IDs can be found using
`/ingest/list` endpoint. If you want all ingested documents to be used,
remove `context_filter` altogether.
"""
service = root_injector.get(ChunksService)
results = service.retrieve_relevant(
body.text, body.context_filter, body.limit, body.prev_next_chunks
)
return ChunksResponse(
object="list",
model="private-gpt",
data=results,
)

View File

@ -0,0 +1,119 @@
from typing import TYPE_CHECKING
from injector import inject, singleton
from llama_index import ServiceContext, StorageContext, VectorStoreIndex
from llama_index.schema import NodeWithScore
from pydantic import BaseModel, Field
from private_gpt.components.embedding.embedding_component import EmbeddingComponent
from private_gpt.components.llm.llm_component import LLMComponent
from private_gpt.components.node_store.node_store_component import NodeStoreComponent
from private_gpt.components.vector_store.vector_store_component import (
VectorStoreComponent,
)
from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.server.ingest.ingest_service import IngestedDoc
if TYPE_CHECKING:
from llama_index.schema import RelatedNodeInfo
class Chunk(BaseModel):
object: str = Field(enum=["context.chunk"])
score: float = Field(examples=[0.023])
document: IngestedDoc
text: str = Field(examples=["Outbound sales increased 20%, driven by new leads."])
previous_texts: list[str] | None = Field(
examples=[["SALES REPORT 2023", "Inbound didn't show major changes."]]
)
next_texts: list[str] | None = Field(
examples=[
[
"New leads came from Google Ads campaign.",
"The campaign was run by the Marketing Department",
]
]
)
@singleton
class ChunksService:
@inject
def __init__(
self,
llm_component: LLMComponent,
vector_store_component: VectorStoreComponent,
embedding_component: EmbeddingComponent,
node_store_component: NodeStoreComponent,
) -> None:
self.vector_store_component = vector_store_component
self.storage_context = StorageContext.from_defaults(
vector_store=vector_store_component.vector_store,
docstore=node_store_component.doc_store,
index_store=node_store_component.index_store,
)
self.query_service_context = ServiceContext.from_defaults(
llm=llm_component.llm, embed_model=embedding_component.embedding_model
)
def _get_sibling_nodes_text(
self, node_with_score: NodeWithScore, related_number: int, forward: bool = True
) -> list[str]:
explored_nodes_texts = []
current_node = node_with_score.node
for _ in range(related_number):
explored_node_info: RelatedNodeInfo | None = (
current_node.next_node if forward else current_node.prev_node
)
if explored_node_info is None:
break
explored_node = self.storage_context.docstore.get_node(
explored_node_info.node_id
)
explored_nodes_texts.append(explored_node.get_content())
current_node = explored_node
return explored_nodes_texts
def retrieve_relevant(
self,
text: str,
context_filter: ContextFilter | None = None,
limit: int = 10,
prev_next_chunks: int = 0,
) -> list[Chunk]:
index = VectorStoreIndex.from_vector_store(
self.vector_store_component.vector_store,
storage_context=self.storage_context,
service_context=self.query_service_context,
show_progress=True,
)
vector_index_retriever = self.vector_store_component.get_retriever(
index=index, context_filter=context_filter, similarity_top_k=limit
)
nodes = vector_index_retriever.retrieve(text)
nodes.sort(key=lambda n: n.score or 0.0, reverse=True)
retrieved_nodes = []
for node in nodes:
doc_id = node.node.ref_doc_id if node.node.ref_doc_id is not None else "-"
retrieved_nodes.append(
Chunk(
object="context.chunk",
score=node.score or 0.0,
document=IngestedDoc(
object="ingest.document",
doc_id=doc_id,
doc_metadata=node.metadata,
),
text=node.get_content(),
previous_texts=self._get_sibling_nodes_text(
node, prev_next_chunks, False
),
next_texts=self._get_sibling_nodes_text(node, prev_next_chunks),
)
)
return retrieved_nodes

View File

@ -0,0 +1 @@
"""Deprecated Openai compatibility endpoint."""

View File

@ -0,0 +1,66 @@
from fastapi import APIRouter
from pydantic import BaseModel
from starlette.responses import StreamingResponse
from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.open_ai.openai_models import (
OpenAICompletion,
OpenAIMessage,
)
from private_gpt.server.chat.chat_router import ChatBody, chat_completion
completions_router = APIRouter(prefix="/v1")
class CompletionsBody(BaseModel):
prompt: str
use_context: bool = False
context_filter: ContextFilter | None = None
stream: bool = False
model_config = {
"json_schema_extra": {
"examples": [
{
"prompt": "How do you fry an egg?",
"stream": False,
"use_context": False,
}
]
}
}
@completions_router.post(
"/completions",
response_model=None,
summary="Completion",
responses={200: {"model": OpenAICompletion}},
tags=["Contextual Completions"],
)
def prompt_completion(body: CompletionsBody) -> OpenAICompletion | StreamingResponse:
"""We recommend most users use our Chat completions API.
Given a prompt, the model will return one predicted completion. If `use_context`
is set to `true`, the model will use context coming from the ingested documents
to create the response. The documents being used can be filtered using the
`context_filter` and passing the document IDs to be used. Ingested documents IDs
can be found using `/ingest/list` endpoint. If you want all ingested documents to
be used, remove `context_filter` altogether.
When using `'stream': true`, the API will return data chunks following [OpenAI's
streaming model](https://platform.openai.com/docs/api-reference/chat/streaming):
```
{"id":"12345","object":"completion.chunk","created":1694268190,
"model":"private-gpt","choices":[{"index":0,"delta":{"content":"Hello"},
"finish_reason":null}]}
```
"""
message = OpenAIMessage(content=body.prompt, role="user")
chat_body = ChatBody(
messages=[message],
use_context=body.use_context,
stream=body.stream,
context_filter=body.context_filter,
)
return chat_completion(chat_body)

View File

@ -0,0 +1,33 @@
from fastapi import APIRouter
from pydantic import BaseModel, Field
from private_gpt.di import root_injector
from private_gpt.server.embeddings.embeddings_service import (
Embedding,
EmbeddingsService,
)
embeddings_router = APIRouter(prefix="/v1")
class EmbeddingsBody(BaseModel):
input: str | list[str]
class EmbeddingsResponse(BaseModel):
object: str = Field(enum=["list"])
model: str = Field(enum=["private-gpt"])
data: list[Embedding]
@embeddings_router.post("/embeddings", tags=["Embeddings"])
def embeddings_generation(body: EmbeddingsBody) -> EmbeddingsResponse:
"""Get a vector representation of a given input.
That vector representation can be easily consumed
by machine learning models and algorithms.
"""
service = root_injector.get(EmbeddingsService)
input_texts = body.input if isinstance(body.input, list) else [body.input]
embeddings = service.texts_embeddings(input_texts)
return EmbeddingsResponse(object="list", model="private-gpt", data=embeddings)

View File

@ -0,0 +1,28 @@
from injector import inject, singleton
from pydantic import BaseModel, Field
from private_gpt.components.embedding.embedding_component import EmbeddingComponent
class Embedding(BaseModel):
index: int
object: str = Field(enum=["embedding"])
embedding: list[float] = Field(examples=[[0.0023064255, -0.009327292]])
@singleton
class EmbeddingsService:
@inject
def __init__(self, embedding_component: EmbeddingComponent) -> None:
self.embedding_model = embedding_component.embedding_model
def texts_embeddings(self, texts: list[str]) -> list[Embedding]:
texts_embeddings = self.embedding_model.get_text_embedding_batch(texts)
return [
Embedding(
index=texts_embeddings.index(embedding),
object="embedding",
embedding=embedding,
)
for embedding in texts_embeddings
]

View File

View File

@ -0,0 +1,14 @@
from fastapi import APIRouter
from pydantic import BaseModel, Field
health_router = APIRouter()
class HealthResponse(BaseModel):
status: str = Field(enum=["ok"])
@health_router.get("/health", tags=["Health"])
def health() -> HealthResponse:
"""Return ok if the system is up."""
return HealthResponse(status="ok")

View File

View File

@ -0,0 +1,49 @@
from fastapi import APIRouter, HTTPException, UploadFile
from pydantic import BaseModel, Field
from private_gpt.di import root_injector
from private_gpt.server.ingest.ingest_service import IngestedDoc, IngestService
ingest_router = APIRouter(prefix="/v1")
class IngestResponse(BaseModel):
object: str = Field(enum=["list"])
model: str = Field(enum=["private-gpt"])
data: list[IngestedDoc]
@ingest_router.post("/ingest", tags=["Ingestion"])
def ingest(file: UploadFile) -> IngestResponse:
"""Ingests and processes a file, storing its chunks to be used as context.
The context obtained from files is later used in
`/chat/completions`, `/completions`, and `/chunks` APIs.
Most common document
formats are supported, but you may be prompted to install an extra dependency to
manage a specific file type.
A file can generate different Documents (for example a PDF generates one Document
per page). All Documents IDs are returned in the response, together with the
extracted Metadata (which is later used to improve context retrieval). Those IDs
can be used to filter the context used to create responses in
`/chat/completions`, `/completions`, and `/chunks` APIs.
"""
service = root_injector.get(IngestService)
if file.filename is None:
raise HTTPException(400, "No file name provided")
ingested_documents = service.ingest(file.filename, file.file.read())
return IngestResponse(object="list", model="private-gpt", data=ingested_documents)
@ingest_router.get("/ingest/list", tags=["Ingestion"])
def list_ingested() -> IngestResponse:
"""Lists already ingested Documents including their Document ID and metadata.
Those IDs can be used to filter the context used to create responses
in `/chat/completions`, `/completions`, and `/chunks` APIs.
"""
service = root_injector.get(IngestService)
ingested_documents = service.list_ingested()
return IngestResponse(object="list", model="private-gpt", data=ingested_documents)

View File

@ -0,0 +1,159 @@
import tempfile
from pathlib import Path
from typing import TYPE_CHECKING, Any, AnyStr
from injector import inject, singleton
from llama_index import (
Document,
ServiceContext,
StorageContext,
StringIterableReader,
VectorStoreIndex,
)
from llama_index.node_parser import SentenceWindowNodeParser
from llama_index.readers.file.base import DEFAULT_FILE_READER_CLS
from pydantic import BaseModel, Field
from private_gpt.components.embedding.embedding_component import EmbeddingComponent
from private_gpt.components.llm.llm_component import LLMComponent
from private_gpt.components.node_store.node_store_component import NodeStoreComponent
from private_gpt.components.vector_store.vector_store_component import (
VectorStoreComponent,
)
from private_gpt.paths import local_data_path
if TYPE_CHECKING:
from llama_index.readers.base import BaseReader
class IngestedDoc(BaseModel):
object: str = Field(enum=["ingest.document"])
doc_id: str = Field(examples=["c202d5e6-7b69-4869-81cc-dd574ee8ee11"])
doc_metadata: dict[str, Any] | None = Field(
examples=[
{
"page_label": "2",
"file_name": "Sales Report Q3 2023.pdf",
}
]
)
@staticmethod
def curate_metadata(metadata: dict[str, Any]) -> dict[str, Any]:
"""Remove unwanted metadata keys."""
metadata.pop("doc_id", None)
metadata.pop("window", None)
metadata.pop("original_text", None)
return metadata
@singleton
class IngestService:
@inject
def __init__(
self,
llm_component: LLMComponent,
vector_store_component: VectorStoreComponent,
embedding_component: EmbeddingComponent,
node_store_component: NodeStoreComponent,
) -> None:
self.llm_service = llm_component
self.storage_context = StorageContext.from_defaults(
vector_store=vector_store_component.vector_store,
docstore=node_store_component.doc_store,
index_store=node_store_component.index_store,
)
self.ingest_service_context = ServiceContext.from_defaults(
llm=self.llm_service.llm,
embed_model=embedding_component.embedding_model,
node_parser=SentenceWindowNodeParser.from_defaults(),
)
def ingest(self, file_name: str, file_data: AnyStr | Path) -> list[IngestedDoc]:
extension = Path(file_name).suffix
reader_cls = DEFAULT_FILE_READER_CLS.get(extension)
documents: list[Document]
if reader_cls is None:
# Read as a plain text
string_reader = StringIterableReader()
if isinstance(file_data, Path):
text = file_data.read_text()
documents = string_reader.load_data([text])
elif isinstance(file_data, bytes):
documents = string_reader.load_data([file_data.decode("utf-8")])
elif isinstance(file_data, str):
documents = string_reader.load_data([file_data])
else:
raise ValueError(f"Unsupported data type {type(file_data)}")
else:
reader: BaseReader = reader_cls()
if isinstance(file_data, Path):
# Already a path, nothing to do
documents = reader.load_data(file_data)
else:
# llama-index mainly supports reading from files, so
# we have to create a tmp file to read for it to work
with tempfile.NamedTemporaryFile() as tmp:
path_to_tmp = Path(tmp.name)
if isinstance(file_data, bytes):
path_to_tmp.write_bytes(file_data)
else:
path_to_tmp.write_text(str(file_data))
documents = reader.load_data(path_to_tmp)
for document in documents:
document.metadata["file_name"] = file_name
return self._save_docs(documents)
def _save_docs(self, documents: list[Document]) -> list[IngestedDoc]:
for document in documents:
document.metadata["doc_id"] = document.doc_id
# We don't want the Embeddings search to receive this metadata
document.excluded_embed_metadata_keys = ["doc_id"]
# We don't want the LLM to receive these metadata in the context
document.excluded_llm_metadata_keys = ["file_name", "doc_id", "page_label"]
# create vectorStore index
VectorStoreIndex.from_documents(
documents,
storage_context=self.storage_context,
service_context=self.ingest_service_context,
store_nodes_override=True, # Force store nodes in index and document stores
show_progress=True,
)
# persist the index and nodes
self.storage_context.persist(persist_dir=local_data_path)
return [
IngestedDoc(
object="ingest.document",
doc_id=document.doc_id,
doc_metadata=IngestedDoc.curate_metadata(document.metadata),
)
for document in documents
]
def list_ingested(self) -> list[IngestedDoc]:
ingested_docs = []
try:
docstore = self.storage_context.docstore
ingested_docs_ids: set[str] = set()
for node in docstore.docs.values():
if node.ref_doc_id is not None:
ingested_docs_ids.add(node.ref_doc_id)
for doc_id in ingested_docs_ids:
ref_doc_info = docstore.get_ref_doc_info(ref_doc_id=doc_id)
doc_metadata = None
if ref_doc_info is not None and ref_doc_info.metadata is not None:
doc_metadata = IngestedDoc.curate_metadata(ref_doc_info.metadata)
ingested_docs.append(
IngestedDoc(
object="ingest.document",
doc_id=doc_id,
doc_metadata=doc_metadata,
)
)
return ingested_docs
except ValueError:
pass
return ingested_docs

View File

@ -0,0 +1,46 @@
from collections.abc import Callable
from pathlib import Path
from typing import Any
from watchdog.events import (
DirCreatedEvent,
DirModifiedEvent,
FileCreatedEvent,
FileModifiedEvent,
FileSystemEventHandler,
)
from watchdog.observers import Observer
class IngestWatcher:
def __init__(
self, watch_path: Path, on_file_changed: Callable[[Path], None]
) -> None:
self.watch_path = watch_path
self.on_file_changed = on_file_changed
class Handler(FileSystemEventHandler):
def on_modified(self, event: DirModifiedEvent | FileModifiedEvent) -> None:
if isinstance(event, FileModifiedEvent):
on_file_changed(Path(event.src_path))
def on_created(self, event: DirCreatedEvent | FileCreatedEvent) -> None:
if isinstance(event, FileCreatedEvent):
on_file_changed(Path(event.src_path))
event_handler = Handler()
observer: Any = Observer()
self._observer = observer
self._observer.schedule(event_handler, str(watch_path), recursive=True)
def start(self) -> None:
self._observer.start()
while self._observer.is_alive():
try:
self._observer.join(1)
except KeyboardInterrupt:
break
def stop(self) -> None:
self._observer.stop()
self._observer.join()

View File

@ -0,0 +1 @@
"""Settings."""

View File

@ -0,0 +1,53 @@
from pydantic import BaseModel, Field
from private_gpt.settings.settings_loader import load_active_profiles
class ServerSettings(BaseModel):
env_name: str = Field(
description="Name of the environment (prod, staging, local...)"
)
port: int = Field("Port of PrivateGPT FastAPI server, defaults to 8001")
class DataSettings(BaseModel):
local_data_folder: str = Field(
description="Path to local storage."
"It will be treated as an absolute path if it starts with /"
)
class LLMSettings(BaseModel):
mode: str = Field(enum=["local", "open_ai", "sagemaker", "mock"])
class LocalSettings(BaseModel):
llm_hf_repo_id: str
llm_hf_model_file: str
embedding_hf_model_name: str
class SagemakerSettings(BaseModel):
endpoint_name: str
class OpenAISettings(BaseModel):
api_key: str
class UISettings(BaseModel):
enabled: bool
path: str
class Settings(BaseModel):
server: ServerSettings
data: DataSettings
ui: UISettings
llm: LLMSettings
local: LocalSettings
sagemaker: SagemakerSettings
openai: OpenAISettings
settings = Settings(**load_active_profiles())

View File

@ -0,0 +1,47 @@
import functools
import os
import sys
from pathlib import Path
from typing import Any
from pydantic.v1.utils import deep_update, unique_list
from private_gpt.constants import PROJECT_ROOT_PATH
from private_gpt.settings.yaml import load_yaml_with_envvars
_settings_folder = os.environ.get("PGPT_SETTINGS_FOLDER", PROJECT_ROOT_PATH)
# if running in unittest, use the test profile
_test_profile = ["test"] if "unittest" in sys.modules else []
active_profiles: list[str] = unique_list(
["default"]
+ [
item.strip()
for item in os.environ.get("PGPT_PROFILES", "").split(",")
if item.strip()
]
+ _test_profile
)
def load_profile(profile: str) -> dict[str, Any]:
if profile == "default":
profile_file_name = "settings.yaml"
else:
profile_file_name = f"settings-{profile}.yaml"
path = Path(_settings_folder) / profile_file_name
with Path(path).open("r") as f:
config = load_yaml_with_envvars(f)
if not isinstance(config, dict):
raise TypeError(f"Config file has no top-level mapping: {path}")
return config
def load_active_profiles() -> dict[str, Any]:
"""Load active profiles and merge them."""
print(f"Starting application with profiles: {active_profiles}")
loaded_profiles = [load_profile(profile) for profile in active_profiles]
merged: dict[str, Any] = functools.reduce(deep_update, loaded_profiles, {})
return merged

View File

@ -0,0 +1,41 @@
import os
import re
import typing
from typing import Any, TextIO
from yaml import SafeLoader
_env_replace_matcher = re.compile(r"\$\{(\w|_)+(:(\w|_)*)?}")
@typing.no_type_check # pyaml does not have good hints, everything is Any
def load_yaml_with_envvars(
stream: TextIO, environ: dict[str, Any] = os.environ
) -> dict[str, Any]:
"""Load yaml file with environment variable expansion.
The pattern ${VAR} or ${VAR:default} will be replaced with
the value of the environment variable.
"""
loader = SafeLoader(stream)
def load_env_var(_, node) -> str:
"""Extract the matched value, expand env variable, and replace the match."""
value = str(node.value).removeprefix("${").removesuffix("}")
split = value.split(":")
env_var = split[0]
value = environ.get(env_var)
default = None if len(split) == 1 else split[1]
if value is None and default is None:
raise ValueError(
f"Environment variable {env_var} is not set and not default was provided"
)
return value or default
loader.add_implicit_resolver("env_var_replacer", _env_replace_matcher, None)
loader.add_constructor("env_var_replacer", load_env_var)
try:
return loader.get_single_data()
finally:
loader.dispose()

View File

@ -0,0 +1 @@
"""Gradio based UI."""

1
private_gpt/ui/images.py Normal file
View File

@ -0,0 +1 @@
logo_svg = ""

166
private_gpt/ui/ui.py Normal file
View File

@ -0,0 +1,166 @@
import itertools
import json
from collections.abc import Iterable
from pathlib import Path
from typing import Any, TextIO
import gradio as gr # type: ignore
from fastapi import FastAPI
from gradio.themes.utils.colors import slate # type: ignore
from llama_index.llms import ChatMessage, ChatResponse, MessageRole
from private_gpt.di import root_injector
from private_gpt.server.chat.chat_service import ChatService
from private_gpt.server.chunks.chunks_service import ChunksService
from private_gpt.server.ingest.ingest_service import IngestService
from private_gpt.settings.settings import settings
from private_gpt.ui.images import logo_svg
ingest_service = root_injector.get(IngestService)
chat_service = root_injector.get(ChatService)
chunks_service = root_injector.get(ChunksService)
def _chat(message: str, history: list[list[str]], mode: str, *_: Any) -> Any:
def yield_deltas(stream: Iterable[ChatResponse | str]) -> Iterable[str]:
full_response: str = ""
for delta in stream:
if isinstance(delta, str):
full_response += str(delta)
elif isinstance(delta, ChatResponse):
full_response += delta.delta or ""
yield full_response
def build_history() -> list[ChatMessage]:
history_messages: list[ChatMessage] = list(
itertools.chain(
*[
[
ChatMessage(content=interaction[0], role=MessageRole.USER),
ChatMessage(content=interaction[1], role=MessageRole.ASSISTANT),
]
for interaction in history
]
)
)
# max 20 messages to try to avoid context overflow
return history_messages[:20]
new_message = ChatMessage(content=message, role=MessageRole.USER)
all_messages = [*build_history(), new_message]
match mode:
case "Query Documents":
query_stream = chat_service.stream_chat(
messages=all_messages,
use_context=True,
)
yield from yield_deltas(query_stream)
case "LLM Chat":
llm_stream = chat_service.stream_chat(
messages=all_messages,
use_context=False,
)
yield from yield_deltas(llm_stream)
case "Context Chunks":
response = chunks_service.retrieve_relevant(
text=message,
limit=2,
prev_next_chunks=1,
).__iter__()
yield "```" + json.dumps(
[node.__dict__ for node in response],
default=lambda o: o.__dict__,
indent=2,
)
def _list_ingested_files() -> list[str]:
files = set()
for ingested_document in ingest_service.list_ingested():
if ingested_document.doc_metadata is not None:
files.add(
ingested_document.doc_metadata.get("file_name") or "[FILE NAME MISSING]"
)
return list(files)
# Global state
_uploaded_file_list = [[row] for row in _list_ingested_files()]
def _upload_file(file: TextIO) -> list[list[str]]:
path = Path(file.name)
ingest_service.ingest(file_name=path.name, file_data=path)
_uploaded_file_list.append([path.name])
return _uploaded_file_list
with gr.Blocks(
theme=gr.themes.Soft(primary_hue=slate),
css=".logo { "
"display:flex;"
"background-color: #C7BAFF;"
"height: 80px;"
"border-radius: 8px;"
"align-content: center;"
"justify-content: center;"
"align-items: center;"
"}"
".logo img { height: 25% }",
) as blocks:
with gr.Blocks(), gr.Row():
gr.HTML(f"<div class='logo'/><img src={logo_svg} alt=PrivateGPT></div")
with gr.Row():
with gr.Column(scale=3, variant="compact"):
mode = gr.Radio(
["Query Documents", "LLM Chat", "Context Chunks"],
label="Mode",
value="Query Documents",
)
upload_button = gr.components.UploadButton(
"Upload a File",
type="file",
file_count="single",
size="sm",
)
ingested_dataset = gr.List(
_uploaded_file_list,
headers=["File name"],
label="Ingested Files",
interactive=False,
render=False, # Rendered under the button
)
upload_button.upload(
_upload_file, inputs=upload_button, outputs=ingested_dataset
)
ingested_dataset.render()
with gr.Column(scale=7):
chatbot = gr.ChatInterface(
_chat,
chatbot=gr.Chatbot(
label="Chat",
show_copy_button=True,
render=False,
avatar_images=(
None,
"https://lh3.googleusercontent.com/drive-viewer/AK7aPa"
"AicXck0k68nsscyfKrb18o9ak3BSaWM_Qzm338cKoQlw72Bp0UKN84"
"IFZjXjZApY01mtnUXDeL4qzwhkALoe_53AhwCg=s2560",
),
),
additional_inputs=[mode, upload_button],
)
def mount_in_app(app: FastAPI) -> None:
blocks.queue()
gr.mount_gradio_app(app, blocks, path=settings.ui.path)
if __name__ == "__main__":
blocks.queue()
blocks.launch(debug=False, show_api=False)

View File

@ -0,0 +1 @@
"""general utils."""

View File

@ -0,0 +1,5 @@
from typing import TypeVar
T = TypeVar("T")
K = TypeVar("K")
V = TypeVar("V")

View File

@ -1,29 +1,141 @@
[tool.poetry]
name = "privategpt"
name = "private-gpt"
version = "0.1.0"
description = ""
authors = ["Ivan Martinez <ivanmartit@gmail.com>"]
license = "Apache Version 2.0"
readme = "README.md"
description = "Private GPT"
authors = ["Zylon <hi@zylon.ai>"]
[tool.poetry.dependencies]
python = "^3.10"
langchain = "0.0.274"
gpt4all = "1.0.8"
chromadb = "0.4.12"
llama-cpp-python = "0.1.81"
urllib3 = "2.0.4"
PyMuPDF = "1.23.1"
python-dotenv = "^1.0.0"
unstructured = "0.10.8"
extract-msg = "0.45.0"
tabulate = "^0.9.0"
pandoc = "^2.3"
pypandoc = "^1.11"
tqdm = "4.66.1"
sentence-transformers = "2.2.2"
python = ">=3.11,<3.13"
fastapi = { extras = ["all"], version = "^0.103.1" }
loguru = "^0.7.2"
boto3 = "^1.28.56"
injector = "^0.21.0"
pyyaml = "^6.0.1"
python-multipart = "^0.0.6"
pypdf = "^3.16.2"
llama-index = "v0.8.35"
chromadb = "^0.4.13"
watchdog = "^3.0.0"
transformers = "^4.34.0"
torch = "^2.1.0"
[tool.poetry.group.dev.dependencies]
black = "^22"
mypy = "^1.2"
pre-commit = "^2"
pytest = "^7"
pytest-cov = "^3"
ruff = "^0"
pytest-asyncio = "^0.21.1"
types-pyyaml = "^6.0.12.12"
# Dependencies for gradio UI
[tool.poetry.group.ui]
optional = true
[tool.poetry.group.ui.dependencies]
gradio = "^3.45.2"
[tool.poetry.group.local]
optional = true
[tool.poetry.group.local.dependencies]
sentence-transformers = "^2.2.2"
numpy = "1.26.0"
llama-cpp-python = "^0.2.11"
[build-system]
requires = ["poetry-core"]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
# Packages configs
## coverage
[tool.coverage.run]
branch = true
[tool.coverage.report]
skip_empty = true
precision = 2
## black
[tool.black]
target-version = ['py311']
## ruff
# Recommended ruff config for now, to be updated as we go along.
[tool.ruff]
target-version = 'py311'
# See all rules at https://beta.ruff.rs/docs/rules/
select = [
"E", # pycodestyle
"W", # pycodestyle
"F", # Pyflakes
"B", # flake8-bugbear
"C4", # flake8-comprehensions
"D", # pydocstyle
"I", # isort
"SIM", # flake8-simplify
"TCH", # flake8-type-checking
"TID", # flake8-tidy-imports
"Q", # flake8-quotes
"UP", # pyupgrade
"PT", # flake8-pytest-style
"RUF", # Ruff-specific rules
]
ignore = [
"E501", # "Line too long"
# -> line length already regulated by black
"PT011", # "pytest.raises() should specify expected exception"
# -> would imply to update tests every time you update exception message
"SIM102", # "Use a single `if` statement instead of nested `if` statements"
# -> too restrictive,
"D100",
"D101",
"D102",
"D103",
"D104",
"D105",
"D106",
"D107"
# -> "Missing docstring in public function too restrictive"
]
[tool.ruff.pydocstyle]
# Automatically disable rules that are incompatible with Google docstring convention
convention = "google"
[tool.ruff.pycodestyle]
max-doc-length = 88
[tool.ruff.flake8-tidy-imports]
ban-relative-imports = "all"
[tool.ruff.flake8-type-checking]
strict = true
runtime-evaluated-base-classes = ["pydantic.BaseModel"]
# Pydantic needs to be able to evaluate types at runtime
# see https://pypi.org/project/flake8-type-checking/ for flake8-type-checking documentation
# see https://beta.ruff.rs/docs/settings/#flake8-type-checking-runtime-evaluated-base-classes for ruff documentation
[tool.ruff.per-file-ignores]
# Allow missing docstrings for tests
"tests/**/*.py" = ["D1"]
## mypy
[tool.mypy]
python_version = "3.11"
strict = true
check_untyped_defs = false
explicit_package_bases = true
exclude = ["tests"]
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
addopts = [
"--import-mode=importlib",
]

View File

@ -1,14 +0,0 @@
langchain==0.0.274
gpt4all==1.0.8
chromadb==0.4.12
llama-cpp-python==0.1.81
urllib3==2.0.4
PyMuPDF==1.23.1
python-dotenv==1.0.0
unstructured==0.10.8
extract-msg==0.45.0
tabulate==0.9.0
pandoc==2.3
pypandoc==1.11
tqdm==4.66.1
sentence_transformers==2.2.2

1
scripts/__init__.py Normal file
View File

@ -0,0 +1 @@
"""PrivateGPT scripts."""

View File

@ -0,0 +1,33 @@
import argparse
import json
import sys
import yaml
from uvicorn.importer import import_from_string
parser = argparse.ArgumentParser(prog="extract_openapi.py")
parser.add_argument("app", help='App import string. Eg. "main:app"', default="main:app")
parser.add_argument("--app-dir", help="Directory containing the app", default=None)
parser.add_argument(
"--out", help="Output file ending in .json or .yaml", default="openapi.yaml"
)
if __name__ == "__main__":
args = parser.parse_args()
if args.app_dir is not None:
print(f"adding {args.app_dir} to sys.path")
sys.path.insert(0, args.app_dir)
print(f"importing app from {args.app}")
app = import_from_string(args.app)
openapi = app.openapi()
version = openapi.get("openapi", "unknown version")
print(f"writing openapi spec v{version}")
with open(args.out, "w") as f:
if args.out.endswith(".json"):
json.dump(openapi, f, indent=2)
else:
yaml.dump(openapi, f, sort_keys=False)
print(f"spec written to {args.out}")

45
scripts/ingest_folder.py Normal file
View File

@ -0,0 +1,45 @@
import argparse
import sys
from pathlib import Path
from private_gpt.di import root_injector
from private_gpt.server.ingest.ingest_service import IngestService
from private_gpt.server.ingest.ingest_watcher import IngestWatcher
ingest_service = root_injector.get(IngestService)
parser = argparse.ArgumentParser(prog="ingest_folder.py")
parser.add_argument("folder", help="Folder to ingest")
parser.add_argument(
"--watch",
help="Watch for changes",
action=argparse.BooleanOptionalAction,
default=False,
)
args = parser.parse_args()
def _recursive_ingest_folder(folder_path: Path) -> None:
for file_path in folder_path.iterdir():
if file_path.is_file():
_do_ingest(file_path)
elif file_path.is_dir():
_recursive_ingest_folder(file_path)
def _do_ingest(changed_path: Path) -> None:
if changed_path.exists():
print(f"\nIngesting {changed_path}")
ingest_service.ingest(changed_path.name, changed_path)
path = Path(args.folder)
if not path.exists():
raise ValueError(f"Path {args.folder} does not exist")
_recursive_ingest_folder(path)
if args.watch:
print(f"Watching {args.folder} for changes, press Ctrl+C to stop...")
watcher = IngestWatcher(args.folder, _do_ingest)
watcher.start()

30
scripts/setup Executable file
View File

@ -0,0 +1,30 @@
#!/usr/bin/env python3
import os
from huggingface_hub import hf_hub_download, snapshot_download
from private_gpt.paths import models_path, models_cache_path
from private_gpt.settings.settings import settings
os.makedirs(models_path, exist_ok=True)
embedding_path = models_path / "embedding"
print(f"Downloading embedding {settings.local.embedding_hf_model_name}")
snapshot_download(
repo_id=settings.local.embedding_hf_model_name,
cache_dir=models_cache_path,
local_dir=embedding_path,
)
print("Embedding model downloaded!")
print("Downloading models for local execution...")
# Download LLM and create a symlink to the model file
hf_hub_download(
repo_id=settings.local.llm_hf_repo_id,
filename=settings.local.llm_hf_model_file,
cache_dir=models_cache_path,
local_dir=models_path,
)
print("LLM model downloaded!")
print("Setup done")

15
settings-docker.yaml Normal file
View File

@ -0,0 +1,15 @@
server:
env_name: ${APP_ENV:prod}
port: ${PORT:8080}
llm:
mode: local
local:
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.1-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.1.Q4_K_M.gguf
embedding_hf_model_name: BAAI/bge-small-en-v1.5
ui:
enabled: true
path: /

5
settings-local.yaml Normal file
View File

@ -0,0 +1,5 @@
server:
env_name: ${APP_ENV:local}
llm:
mode: local

8
settings-test.yaml Normal file
View File

@ -0,0 +1,8 @@
server:
env_name: test
data:
local_data_folder: local_data/tests
llm:
mode: mock

24
settings.yaml Normal file
View File

@ -0,0 +1,24 @@
server:
env_name: ${APP_ENV:prod}
port: ${PORT:8001}
data:
local_data_folder: local_data/private_gpt
ui:
enabled: true
path: /
llm:
mode: mock
local:
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.1-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.1.Q4_K_M.gguf
embedding_hf_model_name: BAAI/bge-small-en-v1.5
sagemaker:
endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140
openai:
api_key: ${OPENAI_API_KEY:}

View File

@ -1,723 +0,0 @@
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.
Last year COVID-19 kept us apart. This year we are finally together again.
Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans.
With a duty to one another to the American people to the Constitution.
And with an unwavering resolve that freedom will always triumph over tyranny.
Six days ago, Russias Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated.
He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined.
He met the Ukrainian people.
From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.
Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.
In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight.
Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world.
Please rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people.
Throughout our history weve learned this lesson when dictators do not pay a price for their aggression they cause more chaos.
They keep moving.
And the costs and the threats to America and the world keep rising.
Thats why the NATO Alliance was created to secure peace and stability in Europe after World War 2.
The United States is a member along with 29 other nations.
It matters. American diplomacy matters. American resolve matters.
Putins latest attack on Ukraine was premeditated and unprovoked.
He rejected repeated efforts at diplomacy.
He thought the West and NATO wouldnt respond. And he thought he could divide us at home. Putin was wrong. We were ready. Here is what we did.
We prepared extensively and carefully.
We spent months building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin.
I spent countless hours unifying our European allies. We shared with the world in advance what we knew Putin was planning and precisely how he would try to falsely justify his aggression.
We countered Russias lies with truth.
And now that he has acted the free world is holding him accountable.
Along with twenty-seven members of the European Union including France, Germany, Italy, as well as countries like the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland.
We are inflicting pain on Russia and supporting the people of Ukraine. Putin is now isolated from the world more than ever.
Together with our allies we are right now enforcing powerful economic sanctions.
We are cutting off Russias largest banks from the international financial system.
Preventing Russias central bank from defending the Russian Ruble making Putins $630 Billion “war fund” worthless.
We are choking off Russias access to technology that will sap its economic strength and weaken its military for years to come.
Tonight I say to the Russian oligarchs and corrupt leaders who have bilked billions of dollars off this violent regime no more.
The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs.
We are joining with our European allies to find and seize your yachts your luxury apartments your private jets. We are coming for your ill-begotten gains.
And tonight I am announcing that we will join our allies in closing off American air space to all Russian flights further isolating Russia and adding an additional squeeze on their economy. The Ruble has lost 30% of its value.
The Russian stock market has lost 40% of its value and trading remains suspended. Russias economy is reeling and Putin alone is to blame.
Together with our allies we are providing support to the Ukrainians in their fight for freedom. Military assistance. Economic assistance. Humanitarian assistance.
We are giving more than $1 Billion in direct assistance to Ukraine.
And we will continue to aid the Ukrainian people as they defend their country and to help ease their suffering.
Let me be clear, our forces are not engaged and will not engage in conflict with Russian forces in Ukraine.
Our forces are not going to Europe to fight in Ukraine, but to defend our NATO Allies in the event that Putin decides to keep moving west.
For that purpose weve mobilized American ground forces, air squadrons, and ship deployments to protect NATO countries including Poland, Romania, Latvia, Lithuania, and Estonia.
As I have made crystal clear the United States and our Allies will defend every inch of territory of NATO countries with the full force of our collective power.
And we remain clear-eyed. The Ukrainians are fighting back with pure courage. But the next few days weeks, months, will be hard on them.
Putin has unleashed violence and chaos. But while he may make gains on the battlefield he will pay a continuing high price over the long run.
And a proud Ukrainian people, who have known 30 years of independence, have repeatedly shown that they will not tolerate anyone who tries to take their country backwards.
To all Americans, I will be honest with you, as Ive always promised. A Russian dictator, invading a foreign country, has costs around the world.
And Im taking robust action to make sure the pain of our sanctions is targeted at Russias economy. And I will use every tool at our disposal to protect American businesses and consumers.
Tonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world.
America will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies.
These steps will help blunt gas prices here at home. And I know the news about whats happening can seem alarming.
But I want you to know that we are going to be okay.
When the history of this era is written Putins war on Ukraine will have left Russia weaker and the rest of the world stronger.
While it shouldnt have taken something so terrible for people around the world to see whats at stake now everyone sees it clearly.
We see the unity among leaders of nations and a more unified Europe a more unified West. And we see unity among the people who are gathering in cities in large crowds around the world even in Russia to demonstrate their support for Ukraine.
In the battle between democracy and autocracy, democracies are rising to the moment, and the world is clearly choosing the side of peace and security.
This is a real test. Its going to take time. So let us continue to draw inspiration from the iron will of the Ukrainian people.
To our fellow Ukrainian Americans who forge a deep bond that connects our two nations we stand with you.
Putin may circle Kyiv with tanks, but he will never gain the hearts and souls of the Ukrainian people.
He will never extinguish their love of freedom. He will never weaken the resolve of the free world.
We meet tonight in an America that has lived through two of the hardest years this nation has ever faced.
The pandemic has been punishing.
And so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more.
I understand.
I remember when my Dad had to leave our home in Scranton, Pennsylvania to find work. I grew up in a family where if the price of food went up, you felt it.
Thats why one of the first things I did as President was fight to pass the American Rescue Plan.
Because people were hurting. We needed to act, and we did.
Few pieces of legislation have done more in a critical moment in our history to lift us out of crisis.
It fueled our efforts to vaccinate the nation and combat COVID-19. It delivered immediate economic relief for tens of millions of Americans.
Helped put food on their table, keep a roof over their heads, and cut the cost of health insurance.
And as my Dad used to say, it gave people a little breathing room.
And unlike the $2 Trillion tax cut passed in the previous administration that benefitted the top 1% of Americans, the American Rescue Plan helped working people—and left no one behind.
And it worked. It created jobs. Lots of jobs.
In fact—our economy created over 6.5 Million new jobs just last year, more jobs created in one year
than ever before in the history of America.
Our economy grew at a rate of 5.7% last year, the strongest growth in nearly 40 years, the first step in bringing fundamental change to an economy that hasnt worked for the working people of this nation for too long.
For the past 40 years we were told that if we gave tax breaks to those at the very top, the benefits would trickle down to everyone else.
But that trickle-down theory led to weaker economic growth, lower wages, bigger deficits, and the widest gap between those at the top and everyone else in nearly a century.
Vice President Harris and I ran for office with a new economic vision for America.
Invest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up
and the middle out, not from the top down.
Because we know that when the middle class grows, the poor have a ladder up and the wealthy do very well.
America used to have the best roads, bridges, and airports on Earth.
Now our infrastructure is ranked 13th in the world.
We wont be able to compete for the jobs of the 21st Century if we dont fix that.
Thats why it was so important to pass the Bipartisan Infrastructure Law—the most sweeping investment to rebuild America in history.
This was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen.
Were done talking about infrastructure weeks.
Were going to have an infrastructure decade.
It is going to transform America and put us on a path to win the economic competition of the 21st Century that we face with the rest of the world—particularly with China.
As Ive told Xi Jinping, it is never a good bet to bet against the American people.
Well create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America.
And well do it all to withstand the devastating effects of the climate crisis and promote environmental justice.
Well build a national network of 500,000 electric vehicle charging stations, begin to replace poisonous lead pipes—so every child—and every American—has clean water to drink at home and at school, provide affordable high-speed internet for every American—urban, suburban, rural, and tribal communities.
4,000 projects have already been announced.
And tonight, Im announcing that this year we will start fixing over 65,000 miles of highway and 1,500 bridges in disrepair.
When we use taxpayer dollars to rebuild America we are going to Buy American: buy American products to support American jobs.
The federal government spends about $600 Billion a year to keep the country safe and secure.
Theres been a law on the books for almost a century
to make sure taxpayers dollars support American jobs and businesses.
Every Administration says theyll do it, but we are actually doing it.
We will buy American to make sure everything from the deck of an aircraft carrier to the steel on highway guardrails are made in America.
But to compete for the best jobs of the future, we also need to level the playing field with China and other competitors.
Thats why it is so important to pass the Bipartisan Innovation Act sitting in Congress that will make record investments in emerging technologies and American manufacturing.
Let me give you one example of why its so important to pass it.
If you travel 20 miles east of Columbus, Ohio, youll find 1,000 empty acres of land.
It wont look like much, but if you stop and look closely, youll see a “Field of dreams,” the ground on which Americas future will be built.
This is where Intel, the American company that helped build Silicon Valley, is going to build its $20 billion semiconductor “mega site”.
Up to eight state-of-the-art factories in one place. 10,000 new good-paying jobs.
Some of the most sophisticated manufacturing in the world to make computer chips the size of a fingertip that power the world and our everyday lives.
Smartphones. The Internet. Technology we have yet to invent.
But thats just the beginning.
Intels CEO, Pat Gelsinger, who is here tonight, told me they are ready to increase their investment from
$20 billion to $100 billion.
That would be one of the biggest investments in manufacturing in American history.
And all theyre waiting for is for you to pass this bill.
So lets not wait any longer. Send it to my desk. Ill sign it.
And we will really take off.
And Intel is not alone.
Theres something happening in America.
Just look around and youll see an amazing story.
The rebirth of the pride that comes from stamping products “Made In America.” The revitalization of American manufacturing.
Companies are choosing to build new factories here, when just a few years ago, they would have built them overseas.
Thats what is happening. Ford is investing $11 billion to build electric vehicles, creating 11,000 jobs across the country.
GM is making the largest investment in its history—$7 billion to build electric vehicles, creating 4,000 jobs in Michigan.
All told, we created 369,000 new manufacturing jobs in America just last year.
Powered by people Ive met like JoJo Burgess, from generations of union steelworkers from Pittsburgh, whos here with us tonight.
As Ohio Senator Sherrod Brown says, “Its time to bury the label “Rust Belt.”
Its time.
But with all the bright spots in our economy, record job growth and higher wages, too many families are struggling to keep up with the bills.
Inflation is robbing them of the gains they might otherwise feel.
I get it. Thats why my top priority is getting prices under control.
Look, our economy roared back faster than most predicted, but the pandemic meant that businesses had a hard time hiring enough workers to keep up production in their factories.
The pandemic also disrupted global supply chains.
When factories close, it takes longer to make goods and get them from the warehouse to the store, and prices go up.
Look at cars.
Last year, there werent enough semiconductors to make all the cars that people wanted to buy.
And guess what, prices of automobiles went up.
So—we have a choice.
One way to fight inflation is to drive down wages and make Americans poorer.
I have a better plan to fight inflation.
Lower your costs, not your wages.
Make more cars and semiconductors in America.
More infrastructure and innovation in America.
More goods moving faster and cheaper in America.
More jobs where you can earn a good living in America.
And instead of relying on foreign supply chains, lets make it in America.
Economists call it “increasing the productive capacity of our economy.”
I call it building a better America.
My plan to fight inflation will lower your costs and lower the deficit.
17 Nobel laureates in economics say my plan will ease long-term inflationary pressures. Top business leaders and most Americans support my plan. And heres the plan:
First cut the cost of prescription drugs. Just look at insulin. One in ten Americans has diabetes. In Virginia, I met a 13-year-old boy named Joshua Davis.
He and his Dad both have Type 1 diabetes, which means they need insulin every day. Insulin costs about $10 a vial to make.
But drug companies charge families like Joshua and his Dad up to 30 times more. I spoke with Joshuas mom.
Imagine what its like to look at your child who needs insulin and have no idea how youre going to pay for it.
What it does to your dignity, your ability to look your child in the eye, to be the parent you expect to be.
Joshua is here with us tonight. Yesterday was his birthday. Happy birthday, buddy.
For Joshua, and for the 200,000 other young people with Type 1 diabetes, lets cap the cost of insulin at $35 a month so everyone can afford it.
Drug companies will still do very well. And while were at it let Medicare negotiate lower prices for prescription drugs, like the VA already does.
Look, the American Rescue Plan is helping millions of families on Affordable Care Act plans save $2,400 a year on their health care premiums. Lets close the coverage gap and make those savings permanent.
Second cut energy costs for families an average of $500 a year by combatting climate change.
Lets provide investments and tax credits to weatherize your homes and businesses to be energy efficient and you get a tax credit; double Americas clean energy production in solar, wind, and so much more; lower the price of electric vehicles, saving you another $80 a month because youll never have to pay at the gas pump again.
Third cut the cost of child care. Many families pay up to $14,000 a year for child care per child.
Middle-class and working families shouldnt have to pay more than 7% of their income for care of young children.
My plan will cut the cost in half for most families and help parents, including millions of women, who left the workforce during the pandemic because they couldnt afford child care, to be able to get back to work.
My plan doesnt stop there. It also includes home and long-term care. More affordable housing. And Pre-K for every 3- and 4-year-old.
All of these will lower costs.
And under my plan, nobody earning less than $400,000 a year will pay an additional penny in new taxes. Nobody.
The one thing all Americans agree on is that the tax system is not fair. We have to fix it.
Im not looking to punish anyone. But lets make sure corporations and the wealthiest Americans start paying their fair share.
Just last year, 55 Fortune 500 corporations earned $40 billion in profits and paid zero dollars in federal income tax.
Thats simply not fair. Thats why Ive proposed a 15% minimum tax rate for corporations.
We got more than 130 countries to agree on a global minimum tax rate so companies cant get out of paying their taxes at home by shipping jobs and factories overseas.
Thats why Ive proposed closing loopholes so the very wealthy dont pay a lower tax rate than a teacher or a firefighter.
So thats my plan. It will grow the economy and lower costs for families.
So what are we waiting for? Lets get this done. And while youre at it, confirm my nominees to the Federal Reserve, which plays a critical role in fighting inflation.
My plan will not only lower costs to give families a fair shot, it will lower the deficit.
The previous Administration not only ballooned the deficit with tax cuts for the very wealthy and corporations, it undermined the watchdogs whose job was to keep pandemic relief funds from being wasted.
But in my administration, the watchdogs have been welcomed back.
Were going after the criminals who stole billions in relief money meant for small businesses and millions of Americans.
And tonight, Im announcing that the Justice Department will name a chief prosecutor for pandemic fraud.
By the end of this year, the deficit will be down to less than half what it was before I took office.
The only president ever to cut the deficit by more than one trillion dollars in a single year.
Lowering your costs also means demanding more competition.
Im a capitalist, but capitalism without competition isnt capitalism.
Its exploitation—and it drives up prices.
When corporations dont have to compete, their profits go up, your prices go up, and small businesses and family farmers and ranchers go under.
We see it happening with ocean carriers moving goods in and out of America.
During the pandemic, these foreign-owned companies raised prices by as much as 1,000% and made record profits.
Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers.
And as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up.
That ends on my watch.
Medicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect.
Well also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees.
Lets pass the Paycheck Fairness Act and paid leave.
Raise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty.
Lets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.
And lets pass the PRO Act when a majority of workers want to form a union—they shouldnt be stopped.
When we invest in our workers, when we build the economy from the bottom up and the middle out together, we can do something we havent done in a long time: build a better America.
For more than two years, COVID-19 has impacted every decision in our lives and the life of the nation.
And I know youre tired, frustrated, and exhausted.
But I also know this.
Because of the progress weve made, because of your resilience and the tools we have, tonight I can say
we are moving forward safely, back to more normal routines.
Weve reached a new moment in the fight against COVID-19, with severe cases down to a level not seen since last July.
Just a few days ago, the Centers for Disease Control and Prevention—the CDC—issued new mask guidelines.
Under these new guidelines, most Americans in most of the country can now be mask free.
And based on the projections, more of the country will reach that point across the next couple of weeks.
Thanks to the progress we have made this past year, COVID-19 need no longer control our lives.
I know some are talking about “living with COVID-19”. Tonight I say that we will never just accept living with COVID-19.
We will continue to combat the virus as we do other diseases. And because this is a virus that mutates and spreads, we will stay on guard.
Here are four common sense steps as we move forward safely.
First, stay protected with vaccines and treatments. We know how incredibly effective vaccines are. If youre vaccinated and boosted you have the highest degree of protection.
We will never give up on vaccinating more Americans. Now, I know parents with kids under 5 are eager to see a vaccine authorized for their children.
The scientists are working hard to get that done and well be ready with plenty of vaccines when they do.
Were also ready with anti-viral treatments. If you get COVID-19, the Pfizer pill reduces your chances of ending up in the hospital by 90%.
Weve ordered more of these pills than anyone in the world. And Pfizer is working overtime to get us 1 Million pills this month and more than double that next month.
And were launching the “Test to Treat” initiative so people can get tested at a pharmacy, and if theyre positive, receive antiviral pills on the spot at no cost.
If youre immunocompromised or have some other vulnerability, we have treatments and free high-quality masks.
Were leaving no one behind or ignoring anyones needs as we move forward.
And on testing, we have made hundreds of millions of tests available for you to order for free.
Even if you already ordered free tests tonight, I am announcing that you can order more from covidtests.gov starting next week.
Second we must prepare for new variants. Over the past year, weve gotten much better at detecting new variants.
If necessary, well be able to deploy new vaccines within 100 days instead of many more months or years.
And, if Congress provides the funds we need, well have new stockpiles of tests, masks, and pills ready if needed.
I cannot promise a new variant wont come. But I can promise you well do everything within our power to be ready if it does.
Third we can end the shutdown of schools and businesses. We have the tools we need.
Its time for Americans to get back to work and fill our great downtowns again. People working from home can feel safe to begin to return to the office.
Were doing that here in the federal government. The vast majority of federal workers will once again work in person.
Our schools are open. Lets keep it that way. Our kids need to be in school.
And with 75% of adult Americans fully vaccinated and hospitalizations down by 77%, most Americans can remove their masks, return to work, stay in the classroom, and move forward safely.
We achieved this because we provided free vaccines, treatments, tests, and masks.
Of course, continuing this costs money.
I will soon send Congress a request.
The vast majority of Americans have used these tools and may want to again, so I expect Congress to pass it quickly.
Fourth, we will continue vaccinating the world.
Weve sent 475 Million vaccine doses to 112 countries, more than any other nation.
And we wont stop.
We have lost so much to COVID-19. Time with one another. And worst of all, so much loss of life.
Lets use this moment to reset. Lets stop looking at COVID-19 as a partisan dividing line and see it for what it is: A God-awful disease.
Lets stop seeing each other as enemies, and start seeing each other for who we really are: Fellow Americans.
We cant change how divided weve been. But we can change how we move forward—on COVID-19 and other issues we must face together.
I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera.
They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun.
Officer Mora was 27 years old.
Officer Rivera was 22.
Both Dominican Americans whod grown up on the same streets they later chose to patrol as police officers.
I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves.
Ive worked on these issues a long time.
I know what works: Investing in crime preventionand community police officers wholl walk the beat, wholl know the neighborhood, and who can restore trust and safety.
So lets not abandon our streets. Or choose between safety and equal justice.
Lets come together to protect our communities, restore trust, and hold law enforcement accountable.
Thats why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers.
Thats why the American Rescue Plan provided $350 Billion that cities, states, and counties can use to hire more police and invest in proven strategies like community violence interruption—trusted messengers breaking the cycle of violence and trauma and giving young people hope.
We should all agree: The answer is not to Defund the police. The answer is to FUND the police with the resources and training they need to protect our communities.
I ask Democrats and Republicans alike: Pass my budget and keep our neighborhoods safe.
And I will keep doing everything in my power to crack down on gun trafficking and ghost guns you can buy online and make at home—they have no serial numbers and cant be traced.
And I ask Congress to pass proven measures to reduce gun violence. Pass universal background checks. Why should anyone on a terrorist list be able to purchase a weapon?
Ban assault weapons and high-capacity magazines.
Repeal the liability shield that makes gun manufacturers the only industry in America that cant be sued.
These laws dont infringe on the Second Amendment. They save lives.
The most fundamental right in America is the right to vote and to have it counted. And its under assault.
In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections.
We cannot let this happen.
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.
A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.
And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.
We can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling.
Weve set up joint patrols with Mexico and Guatemala to catch more human traffickers.
Were putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.
Were securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.
We can do all this while keeping lit the torch of liberty that has led generations of immigrants to this land—my forefathers and so many of yours.
Provide a pathway to citizenship for Dreamers, those on temporary status, farm workers, and essential workers.
Revise our laws so businesses have the workers they need and families dont wait decades to reunite.
Its not only the right thing to do—its the economically smart thing to do.
Thats why immigration reform is supported by everyone from labor unions to religious leaders to the U.S. Chamber of Commerce.
Lets get it done once and for all.
Advancing liberty and justice also requires protecting the rights of women.
The constitutional right affirmed in Roe v. Wade—standing precedent for half a century—is under attack as never before.
If we want to go forward—not backward—we must protect access to health care. Preserve a womans right to choose. And lets continue to advance maternal health care in America.
And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong.
As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential.
While it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.
And soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things.
So tonight Im offering a Unity Agenda for the Nation. Four big things we can do together.
First, beat the opioid epidemic.
There is so much we can do. Increase funding for prevention, treatment, harm reduction, and recovery.
Get rid of outdated rules that stop doctors from prescribing treatments. And stop the flow of illicit drugs by working with state and local law enforcement to go after traffickers.
If youre suffering from addiction, know you are not alone. I believe in recovery, and I celebrate the 23 million Americans in recovery.
Second, lets take on mental health. Especially among our children, whose lives and education have been turned upside down.
The American Rescue Plan gave schools money to hire teachers and help students make up for lost learning.
I urge every parent to make sure your school does just that. And we can all play a part—sign up to be a tutor or a mentor.
Children were also struggling before the pandemic. Bullying, violence, trauma, and the harms of social media.
As Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment theyre conducting on our children for profit.
Its time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children.
And lets get all Americans the mental health services they need. More people they can turn to for help, and full parity between physical and mental health care.
Third, support our veterans.
Veterans are the best of us.
Ive always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home.
My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.
Our troops in Iraq and Afghanistan faced many dangers.
One was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more.
When they came home, many of the worlds fittest and best trained warriors were never the same.
Headaches. Numbness. Dizziness.
A cancer that would put them in a flag-draped coffin.
I know.
One of those soldiers was my son Major Beau Biden.
We dont know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops.
But Im committed to finding out everything we can.
Committed to military families like Danielle Robinson from Ohio.
The widow of Sergeant First Class Heath Robinson.
He was born a soldier. Army National Guard. Combat medic in Kosovo and Iraq.
Stationed near Baghdad, just yards from burn pits the size of football fields.
Heaths widow Danielle is here with us tonight. They loved going to Ohio State football games. He loved building Legos with their daughter.
But cancer from prolonged exposure to burn pits ravaged Heaths lungs and body.
Danielle says Heath was a fighter to the very end.
He didnt know how to stop fighting, and neither did she.
Through her pain she found purpose to demand we do better.
Tonight, Danielle—we are.
The VA is pioneering new ways of linking toxic exposures to diseases, already helping more veterans get benefits.
And tonight, Im announcing were expanding eligibility to veterans suffering from nine respiratory cancers.
Im also calling on Congress: pass a law to make sure veterans devastated by toxic exposures in Iraq and Afghanistan finally get the benefits and comprehensive health care they deserve.
And fourth, lets end cancer as we know it.
This is personal to me and Jill, to Kamala, and to so many of you.
Cancer is the #2 cause of death in Americasecond only to heart disease.
Last month, I announced our plan to supercharge
the Cancer Moonshot that President Obama asked me to lead six years ago.
Our goal is to cut the cancer death rate by at least 50% over the next 25 years, turn more cancers from death sentences into treatable diseases.
More support for patients and families.
To get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health.
Its based on DARPA—the Defense Department project that led to the Internet, GPS, and so much more.
ARPA-H will have a singular purpose—to drive breakthroughs in cancer, Alzheimers, diabetes, and more.
A unity agenda for the nation.
We can do this.
My fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy.
In this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things.
We have fought for freedom, expanded liberty, defeated totalitarianism and terror.
And built the strongest, freest, and most prosperous nation the world has ever known.
Now is the hour.
Our moment of responsibility.
Our test of resolve and conscience, of history itself.
It is in this moment that our character is formed. Our purpose is found. Our future is forged.
Well I know this nation.
We will meet the test.
To protect freedom and liberty, to expand fairness and opportunity.
We will save democracy.
As hard as these times have been, I am more optimistic about America today than I have been my whole life.
Because I see the future that is within our grasp.
Because I know there is simply nothing beyond our capacity.
We are the only nation on Earth that has always turned every crisis we have faced into an opportunity.
The only nation that can be defined by a single word: possibilities.
So on this night, in our 245th year as a nation, I have come to report on the State of the Union.
And my report is this: the State of the Union is strong—because you, the American people, are strong.
We are stronger today than we were a year ago.
And we will be stronger a year from now than we are today.
Now is our moment to meet and overcome the challenges of our time.
And we will, as one people.
One America.
The United States of America.
May God bless you all. May God protect our troops.

1
tests/__init__.py Normal file
View File

@ -0,0 +1 @@
"""Tests."""

14
tests/conftest.py Normal file
View File

@ -0,0 +1,14 @@
import os
import pathlib
from glob import glob
root_path = pathlib.Path(__file__).parents[1]
# This is to prevent a bug in intellij that uses the wrong working directory
os.chdir(root_path)
def _as_module(fixture_path: str) -> str:
return fixture_path.replace("/", ".").replace("\\", ".").replace(".py", "")
pytest_plugins = [_as_module(fixture) for fixture in glob("tests/fixtures/[!_]*.py")]

1
tests/fixtures/__init__.py vendored Normal file
View File

@ -0,0 +1 @@
"""Global fixtures."""

View File

@ -0,0 +1,9 @@
import pytest
from fastapi.testclient import TestClient
from private_gpt.main import app
@pytest.fixture()
def test_client() -> TestClient:
return TestClient(app)

24
tests/fixtures/ingest_helper.py vendored Normal file
View File

@ -0,0 +1,24 @@
from pathlib import Path
import pytest
from fastapi.testclient import TestClient
from private_gpt.server.ingest.ingest_router import IngestResponse
class IngestHelper:
def __init__(self, test_client: TestClient):
self.test_client = test_client
def ingest_file(self, path: Path) -> IngestResponse:
files = {"file": (path.name, path.open("rb"))}
response = self.test_client.post("/v1/ingest", files=files)
assert response.status_code == 200
ingest_result = IngestResponse.model_validate(response.json())
return ingest_result
@pytest.fixture()
def ingest_helper(test_client: TestClient) -> IngestHelper:
return IngestHelper(test_client)

33
tests/fixtures/mock_injector.py vendored Normal file
View File

@ -0,0 +1,33 @@
from collections.abc import Callable
from unittest.mock import MagicMock
import pytest
from injector import Provider, ScopeDecorator, singleton
from private_gpt.di import create_application_injector
from private_gpt.utils.typing import T
class MockInjector:
def __init__(self) -> None:
self.test_injector = create_application_injector()
def bind_mock(
self,
interface: type[T],
mock: (T | (Callable[..., T] | Provider[T])) | None = None,
*,
scope: ScopeDecorator = singleton,
) -> T:
if mock is None:
mock = MagicMock()
self.test_injector.binder.bind(interface, to=mock, scope=scope)
return mock # type: ignore
def get(self, interface: type[T]) -> T:
return self.test_injector.get(interface)
@pytest.fixture()
def injector() -> MockInjector:
return MockInjector()

View File

@ -0,0 +1,35 @@
from fastapi.testclient import TestClient
from private_gpt.open_ai.openai_models import OpenAICompletion, OpenAIMessage
from private_gpt.server.chat.chat_router import ChatBody
def test_chat_route_produces_a_stream(test_client: TestClient) -> None:
body = ChatBody(
messages=[OpenAIMessage(content="test", role="user")],
use_context=False,
stream=True,
)
response = test_client.post("/v1/chat/completions", json=body.model_dump())
raw_events = response.text.split("\n\n")
events = [
item.removeprefix("data: ") for item in raw_events if item.startswith("data: ")
]
assert response.status_code == 200
assert "text/event-stream" in response.headers["content-type"]
assert len(events) > 0
assert events[-1] == "[DONE]"
def test_chat_route_produces_a_single_value(test_client: TestClient) -> None:
body = ChatBody(
messages=[OpenAIMessage(content="test", role="user")],
use_context=False,
stream=False,
)
response = test_client.post("/v1/chat/completions", json=body.model_dump())
# No asserts, if it validates it's good
OpenAICompletion.model_validate(response.json())
assert response.status_code == 200

View File

@ -0,0 +1,7 @@
e88c1005-637d-4cb4-ae79-9b8eb58cab97
b483dd15-78c4-4d67-b546-21a0d690bf43
a8080238-b294-4598-ac9c-7abf4c8e0552
14208dac-c600-4a18-872b-5e45354cfff2

View File

@ -0,0 +1,18 @@
from pathlib import Path
from fastapi.testclient import TestClient
from private_gpt.server.chunks.chunks_router import ChunksBody, ChunksResponse
from tests.fixtures.ingest_helper import IngestHelper
def test_chunks_retrieval(test_client: TestClient, ingest_helper: IngestHelper) -> None:
# Make sure there is at least some chunk to query in the database
path = Path(__file__).parents[0] / "chunk_test.txt"
ingest_helper.ingest_file(path)
body = ChunksBody(text="b483dd15-78c4-4d67-b546-21a0d690bf43")
response = test_client.post("/v1/chunks", json=body.model_dump())
assert response.status_code == 200
chunk_response = ChunksResponse.model_validate(response.json())
assert len(chunk_response.data) > 0

View File

@ -0,0 +1,16 @@
from fastapi.testclient import TestClient
from private_gpt.server.embeddings.embeddings_router import (
EmbeddingsBody,
EmbeddingsResponse,
)
def test_embeddings_generation(test_client: TestClient) -> None:
body = EmbeddingsBody(input="Embed me")
response = test_client.post("/v1/embeddings", json=body.model_dump())
assert response.status_code == 200
embedding_response = EmbeddingsResponse.model_validate(response.json())
assert len(embedding_response.data) > 0
assert len(embedding_response.data[0].embedding) > 0

Binary file not shown.

View File

@ -0,0 +1,19 @@
Once upon a time, in a magical forest called Enchantia, lived a young and cheerful deer named Zumi. Zumi was no ordinary deer; she was bright-eyed, intelligent, and had a heart full of curiosity. One sunny morning, as the forest came alive with the sweet melodies of chirping birds and rustling leaves, Zumi eagerly pranced through the woods on her way to school.
Enchantia Forest School was a unique place, where all the woodland creatures gathered to learn and grow together. The school was nestled in a clearing surrounded by tall, ancient trees. Zumi loved the feeling of anticipation as she approached the school, her hooves barely touching the ground in excitement.
As she arrived at the school, her dear friend and classmate, Oliver the wise old owl, greeted her with a friendly hoot. "Good morning, Zumi! Are you ready for another day of adventure and learning?"
Zumi's eyes sparkled with enthusiasm as she nodded, "Absolutely, Oliver! I can't wait to see what we'll discover today."
In their classroom, Teacher Willow, a gentle and nurturing willow tree, welcomed the students. The classroom was adorned with vibrant leaves and twinkling fireflies, creating a magical and cozy atmosphere. Today's lesson was about the history of the forest and the importance of living harmoniously with nature.
The students listened attentively as Teacher Willow recounted stories of ancient times when the forest thrived in unity and peace. Zumi was particularly enthralled by the tales of forest guardians and how they protected the magical balance of Enchantia.
After the lesson, it was time for recess. Zumi joined her friends in a lively game of tag, where they darted and danced playfully among the trees. Zumi's speed and agility made her an excellent tagger, and laughter filled the air as they played.
Later, they gathered for an art class, where they expressed themselves through painting and sculpting with clay. Zumi chose to paint a mural of the forest, portraying the beauty and magic they were surrounded by every day.
As the day came to an end, the students sat in a circle to share stories and reflections. Zumi shared her excitement for the day and how she learned to appreciate the interconnectedness of all creatures in the forest.
As the sun set, casting a golden glow across the forest, Zumi made her way back home, her heart brimming with happiness and newfound knowledge. Each day at Enchantia Forest School was an adventure, and Zumi couldn't wait to learn more and grow with her friends, for the magic of learning was as boundless as the forest itself. And so, under the canopy of stars and the watchful eyes of the forest, Zumi drifted into dreams filled with wonder and anticipation for the adventures that awaited her on the morrow.

View File

@ -0,0 +1,15 @@
from pathlib import Path
from tests.fixtures.ingest_helper import IngestHelper
def test_ingest_accepts_txt_files(ingest_helper: IngestHelper) -> None:
path = Path(__file__).parents[0] / "test.txt"
ingest_result = ingest_helper.ingest_file(path)
assert len(ingest_result.data) == 1
def test_ingest_accepts_pdf_files(ingest_helper: IngestHelper) -> None:
path = Path(__file__).parents[0] / "test.pdf"
ingest_result = ingest_helper.ingest_file(path)
assert len(ingest_result.data) == 1

View File

@ -0,0 +1,5 @@
from private_gpt.settings.settings import settings
def test_settings_are_loaded_and_merged() -> None:
assert settings.server.env_name == "test"

View File

@ -0,0 +1,33 @@
import io
import os
import pytest
from private_gpt.settings.yaml import load_yaml_with_envvars
def test_environment_variables_are_loaded() -> None:
sample_yaml = """
replaced: ${TEST_REPLACE_ME}
"""
env = {"TEST_REPLACE_ME": "replaced"}
loaded = load_yaml_with_envvars(io.StringIO(sample_yaml), env)
os.environ.copy()
assert loaded["replaced"] == "replaced"
def test_environment_defaults_variables_are_loaded() -> None:
sample_yaml = """
replaced: ${TEST_REPLACE_ME:default}
"""
loaded = load_yaml_with_envvars(io.StringIO(sample_yaml), {})
assert loaded["replaced"] == "default"
def test_environment_without_defaults_fails() -> None:
sample_yaml = """
replaced: ${TEST_REPLACE_ME}
"""
with pytest.raises(ValueError) as error:
load_yaml_with_envvars(io.StringIO(sample_yaml), {})
assert error is not None