118 Commits

Author SHA1 Message Date
Javier Martinez
4be94b558d chore: clean github action image before to generate 2024-08-02 17:58:05 +02:00
Javier Martinez
c8aeb09c57 Revert "chore: re-enable pre-commit"
This reverts commit a45c07d2ad.
2024-08-02 09:07:39 +02:00
Javier Martinez
29cf41a23f chore: remove mayor tag 2024-08-02 09:07:32 +02:00
Javier Martinez
f457101b87 Merge remote-tracking branch 'origin/main' into j_main 2024-08-02 09:00:19 +02:00
Javier Martinez
75e02b3ed6 feat: add custom tags 2024-08-02 08:59:49 +02:00
Javier Martinez
cf61bf780f feat(llm): add progress bar when ollama is pulling models (#2031)
* fix: add ollama progress bar when pulling models

* feat: add ollama queue

* fix: mypy
2024-08-01 19:14:26 +02:00
Javier Martinez
50b3027a24 docs: update docs and capture (#2029)
* docs: update Readme

* style: refactor image

* docs: change important to tip
2024-08-01 10:01:22 +02:00
Javier Martinez
54659588b5 fix: nomic embeddings (#2030)
* fix: allow to configure trust_remote_code

based on: https://github.com/zylon-ai/private-gpt/issues/1893#issuecomment-2118629391

* fix: nomic hf embeddings
2024-08-01 09:43:30 +02:00
Javier Martinez
8119842ae6 feat(recipe): add our first recipe Summarize (#2028)
* feat: add summary recipe

* test: add summary tests

* docs: move all recipes docs

* docs: add recipes and summarize doc

* docs: update openapi reference

* refactor: split method in two method (summary)

* feat: add initial summarize ui

* feat: add mode explanation

* fix: mypy

* feat: allow to configure async property in summarize

* refactor: move modes to enum and update mode explanations

* docs: fix url

* docs: remove list-llm pages

* docs: remove double header

* fix: summary description
2024-07-31 16:53:27 +02:00
Javier Martinez
40638a18a5 fix: unify embedding models (#2027)
* feat: unify embedding model to nomic

* docs: add embedding dimensions mismatch

* docs: fix fern
2024-07-31 14:35:46 +02:00
Javier Martinez
9027d695c1 feat: make llama3.1 as default (#2022)
* feat: change ollama default model to llama3.1

* chore: bump versions

* feat: Change default model in local mode to llama3.1

* chore: make sure last poetry version is used

* fix: mypy

* fix: do not add BOS (with last llamacpp-python version)
2024-07-31 14:35:36 +02:00
Javier Martinez
e54a8fe043 fix: prevent to ingest local files (by default) (#2010)
* feat: prevent to local ingestion (by default) and add white-list

* docs: add local ingestion warning

* docs: add missing comment

* fix: update exception error

* fix: black
2024-07-31 14:33:46 +02:00
Javier Martinez
1020cd5328 fix: light mode (#2025) 2024-07-31 12:59:31 +02:00
Quentin McGaw
65c5a1708b chore(docker): dockerfiles improvements and fixes (#1792)
* `UID` and `GID` build arguments for `worker` user

* `POETRY_EXTRAS` build argument with default values

* Copy `Makefile` for `make ingest` command

* Do NOT copy markdown files
I doubt anyone reads a markdown file within a Docker image

* Fix PYTHONPATH value

* Set home directory to `/home/worker` when creating user

* Combine `ENV` instructions together

* Define environment variables with their defaults
- For documentation purposes
- Reflect defaults set in settings-docker.yml

* `PGPT_EMBEDDING_MODE` to define embedding mode

* Remove ineffective `python3 -m pipx ensurepath`

* Use `&&` instead of `;` to chain commands to detect failure better

* Add `--no-root` flag to poetry install commands

* Set PGPT_PROFILES to docker

* chore: remove envs

* chore: update to use ollama in docker-compose

* chore: don't copy makefile

* chore: don't copy fern

* fix: tiktoken cache

* fix: docker compose port

* fix: ffmpy dependency (#2020)

* fix: ffmpy dependency

* fix: block ffmpy to commit sha

* feat(llm): autopull ollama models (#2019)

* chore: update ollama (llm)

* feat: allow to autopull ollama models

* fix: mypy

* chore: install always ollama client

* refactor: check connection and pull ollama method to utils

* docs: update ollama config with autopulling info

...

* chore: autopull ollama models

* chore: add GID/UID comment

...

---------

Co-authored-by: Javier Martinez <javiermartinezalvarez98@gmail.com>
2024-07-30 17:59:38 +02:00
Robert Hirsch
d080969407 added llama3 prompt (#1962)
* added llama3 prompt

* more fixes to pass tests; changed type VectorStore -> BasePydanticVectorStore, see https://github.com/run-llama/llama_index/blob/main/CHANGELOG.md#2024-05-14

* fix: new llama3 prompt

---------

Co-authored-by: Javier Martinez <javiermartinezalvarez98@gmail.com>
2024-07-29 17:28:00 +02:00
Javier Martinez
d4375d078f fix(ui): gradio bug fixes (#2021)
* fix: when two user messages were sent

* fix: add source divider

* fix: add favicon

* fix: add zylon link

* refactor: update label
2024-07-29 16:48:16 +02:00
Javier Martinez
20bad17c98 feat(llm): autopull ollama models (#2019)
* chore: update ollama (llm)

* feat: allow to autopull ollama models

* fix: mypy

* chore: install always ollama client

* refactor: check connection and pull ollama method to utils

* docs: update ollama config with autopulling info
2024-07-29 13:25:42 +02:00
Javier Martinez
dabf556dae fix: ffmpy dependency (#2020)
* fix: ffmpy dependency

* fix: block ffmpy to commit sha
2024-07-29 11:56:57 +02:00
Iván Martínez
05a986231c Add proper param to demo urls (#2007) 2024-07-22 14:44:03 +02:00
Javier Martinez
b62669784b docs: update welcome page (#2004) 2024-07-18 14:42:39 +02:00
Javier Martinez
2c78bb2958 docs: add PR and issue templates (#2002)
* chore: add pull request template

* chore: add issue templates

* chore: require more information in bugs
2024-07-18 12:56:10 +02:00
Iván Martínez
90d211c5cd Update README.md (#2003)
* Update README.md

Remove the outdated contact form and point to Zylon website for those looking for a ready-to-use enterprise solution built on top of PrivateGPT

* Update README.md

Update text to address the comments

* Update README.md

Improve text
2024-07-18 12:11:24 +02:00
Javier Martinez
c66ef93873 Merge branch 'docs/collaboration-model' into j_main 2024-07-18 11:50:48 +02:00
Javier Martinez
ae4592c5aa Merge branch 'chore/colaboration-model' into j_main 2024-07-18 11:49:59 +02:00
Javier Martinez
92b39a4e9d chore: require more information in bugs 2024-07-18 11:43:23 +02:00
Jackson
43cc31f740 feat(vectordb): Milvus vector db Integration (#1996)
* integrate Milvus into Private GPT

* adjust milvus settings

* update doc info and reformat

* adjust milvus initialization

* adjust import error

* mionr update

* adjust format

* adjust the db storing path

* update doc
2024-07-18 10:55:45 +02:00
Javier Martinez
4523a30c8f feat(docs): update documentation and fix preview-docs (#2000)
* docs: add missing configurations

* docs: change HF embeddings by ollama

* docs: add disclaimer about Gradio UI

* docs: improve readability in concepts

* docs: reorder `Fully Local Setups`

* docs: improve setup instructions

* docs: prevent have duplicate documentation and use table to show different options

* docs: rename privateGpt to PrivateGPT

* docs: update ui image

* docs: remove useless header

* docs: convert to alerts ingestion disclaimers

* docs: add UI alternatives

* docs: reference UI alternatives in disclaimers

* docs: fix table

* chore: update doc preview version

* chore: add permissions

* chore: remove useless line

* docs: fixes

...
2024-07-18 10:06:51 +02:00
Javier Martinez
24a9b119a2 chore: more docker tags 2024-07-17 17:10:12 +02:00
Javier Martinez
51ea8407c3 chore: remove old docker action 2024-07-17 17:10:02 +02:00
Javier Martinez
b11524ceba chore: add docker image to docker hub 2024-07-17 16:24:18 +02:00
Javier Martinez
a45c07d2ad chore: re-enable pre-commit 2024-07-17 14:13:52 +02:00
Javier Martinez
bdd0bb7425 chore: add issue templates 2024-07-17 13:38:45 +02:00
Javier Martinez
22748bff9a chore: add pull request template 2024-07-17 13:38:37 +02:00
Javier Martinez
01b7ccd064 fix(config): make tokenizer optional and include a troubleshooting doc (#1998)
* docs: add troubleshooting

* fix: pass HF token to setup script and prevent to download tokenizer when it is empty

* fix: improve log and disable specific tokenizer by default

* chore: change HF_TOKEN environment to be aligned with default config

* ifx: mypy
2024-07-17 10:06:27 +02:00
Javier Martinez
15f73dbc48 docs: update repo links, citations (#1990)
* docs: update project links

...

* docs: update citation
2024-07-09 10:03:57 +02:00
fern
187bc9320e (feat): add github button (#1989)
Co-authored-by: chdeskur <chdeskur@gmail.com>
2024-07-09 08:48:47 +02:00
Marco Braga
dde02245bc fix(docs): Fix concepts.mdx referencing to installation page (#1779)
* Fix/update concepts.mdx referencing to installation page

The link for `/installation` is broken in the "Main Concepts" page.

The correct path would be `./installation` or  maybe `/installation/getting-started/installation`

* fix: docs

---------

Co-authored-by: Javier Martinez <javiermartinezalvarez98@gmail.com>
2024-07-08 16:19:50 +02:00
Mart
067a5f144c feat(docs): Fix setup docu (#1926)
* Update settings.mdx

* docs: add cmd

---------

Co-authored-by: Javier Martinez <javiermartinezalvarez98@gmail.com>
2024-07-08 16:19:16 +02:00
Proger666
2612928839 feat(vectorstore): Add clickhouse support as vectore store (#1883)
* Added ClickHouse vector sotre support

* port fix

* updated lock file

* fix: mypy

* fix: mypy

---------

Co-authored-by: Valery Denisov <valerydenisov@double.cloud>
Co-authored-by: Javier Martinez <javiermartinezalvarez98@gmail.com>
2024-07-08 16:18:22 +02:00
uw4
fc13368bc7 feat(llm): Support for Google Gemini LLMs and Embeddings (#1965)
* Support for Google Gemini LLMs and Embeddings

Initial support for Gemini, enables usage of Google LLMs and embedding models (see settings-gemini.yaml)

Install via
poetry install --extras "llms-gemini embeddings-gemini"

Notes:
* had to bump llama-index-core to later version that supports Gemini
* poetry --no-update did not work: Gemini/llama_index seem to require more (transient) updates to make it work...

* fix: crash when gemini is not selected

* docs: add gemini llm

---------

Co-authored-by: Javier Martinez <javiermartinezalvarez98@gmail.com>
2024-07-08 11:47:36 +02:00
Shengsheng Huang
19a7c065ef feat(docs): update doc for ipex-llm (#1968) 2024-07-08 09:42:44 +02:00
Javier Martinez
b687dc8524 feat: bump dependencies (#1987) 2024-07-05 16:31:13 +02:00
Pablo Orgaz
c7212ac7cc fix(LLM): mistral ignoring assistant messages (#1954)
* fix: mistral ignoring assistant messages

* fix: typing

* fix: fix tests
2024-05-30 15:41:16 +02:00
Yevhenii Semendiak
3b3e96ad6c Allow parameterizing OpenAI embeddings component (api_base, key, model) (#1920)
* Allow parameterizing OpenAI embeddings component (api_base, key, model)

* Update settings

* Update description
2024-05-17 09:52:50 +02:00
jcbonnet-fwd
45df99feb7 Add timeout parameter for better support of openailike LLM tools on local computer (like LM Studio). (#1858)
feat(llm): Improve settings of the OpenAILike LLM
2024-05-10 16:44:08 +02:00
Fran García
966af4771d fix(settings): enable cors by default so it will work when using ts sdk (spa) (#1925) 2024-05-10 14:13:46 +02:00
Fran García
d13029a046 feat(docs): add privategpt-ts sdk (#1924) 2024-05-10 14:13:15 +02:00
Patrick Peng
9d0d614706 fix: Replacing unsafe eval() with json.loads() (#1890) 2024-04-30 09:58:19 +02:00
icsy7867
e21bf20c10 feat: prompt_style applied to all LLMs + extra LLM params. (#1835)
* Updated prompt_style to be moved to the main LLM setting since all LLMs from llama_index can utilize this.  I also included temperature, context window size, max_tokens, max_new_tokens into the openailike to help ensure the settings are consistent from the other implementations.

* Removed prompt_style from llamacpp entirely

* Fixed settings-local.yaml to include prompt_style in the LLM settings instead of llamacpp.
2024-04-30 09:53:10 +02:00
Daniel Gallego Vico
c1802e7cf0 fix(docs): Update installation.mdx (#1866)
Update repo url
2024-04-19 17:10:58 +02:00
Marco Repetto
2a432bf9c5 fix: make embedding_api_base match api_base when on docker (#1859) 2024-04-19 15:42:19 +02:00
dividebysandwich
947e737f30 fix: "no such group" error in Dockerfile, added docx2txt and cryptography deps (#1841)
* Fixed "no such group" error in Dockerfile, added docx2txt to poetry so docx parsing works out of the box for docker containers

* added cryptography dependency for pdf parsing
2024-04-19 15:40:00 +02:00
imartinez
49ef729abc Allow passing HF access token to download tokenizer. Fallback to default tokenizer. 2024-04-19 15:38:25 +02:00
Pablo Orgaz
347be643f7 fix(llm): special tokens and leading space (#1831) 2024-04-04 14:37:29 +02:00
imartinez
08c4ab175e Fix version in poetry 2024-04-03 10:59:35 +02:00
imartinez
f469b4619d Add required Ollama setting 2024-04-02 18:27:57 +02:00
github-actions[bot]
94ef38cbba chore(main): release 0.5.0 (#1708)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-04-02 17:45:15 +02:00
Иван
8a836e4651 feat(docs): Add guide Llama-CPP Linux AMD GPU support (#1782) 2024-04-02 16:55:05 +02:00
Ingrid Stevens
f0b174c097 feat(ui): Add Model Information to ChatInterface label 2024-04-02 16:52:27 +02:00
igeni
bac818add5 feat(code): improve concat of strings in ui (#1785) 2024-04-02 16:42:40 +02:00
Brett England
ea153fb92f feat(scripts): Wipe qdrant and obtain db Stats command (#1783) 2024-04-02 16:41:42 +02:00
Robin Boone
b3b0140e24 feat(llm): Ollama LLM-Embeddings decouple + longer keep_alive settings (#1800) 2024-04-02 16:23:10 +02:00
machatschek
83adc12a8e feat(RAG): Introduce SentenceTransformer Reranker (#1810) 2024-04-02 10:29:51 +02:00
Marco Repetto
f83abff8bc feat(docker): set default Docker to use Ollama (#1812) 2024-04-01 13:08:48 +02:00
icsy7867
087cb0b7b7 feat(rag): expose similarity_top_k and similarity_score to settings (#1771)
* Added RAG settings to settings.py, vector_store and chat_service to add similarity_top_k and similarity_score

* Updated settings in vector and chat service per Ivans request

* Updated code for mypy
2024-03-20 22:25:26 +01:00
Marco Repetto
774e256052 fix: Fixed docker-compose (#1758)
* Fixed docker-compose

* Update docker-compose.yaml
2024-03-20 21:36:45 +01:00
Iván Martínez
6f6c785dac feat(llm): Ollama timeout setting (#1773)
* added request_timeout to ollama, default set to 30.0 in settings.yaml and settings-ollama.yaml

* Update settings-ollama.yaml

* Update settings.yaml

* updated settings.py and tidied up settings-ollama-yaml

* feat(UI): Faster startup and document listing (#1763)

* fix(ingest): update script label (#1770)

huggingface -> Hugging Face

* Fix lint errors

---------

Co-authored-by: Stephen Gresham <steve@gresham.id.au>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
2024-03-20 21:33:46 +01:00
Brett England
c2d694852b feat: wipe per storage type (#1772) 2024-03-20 21:31:44 +01:00
Ikko Eltociear Ashimine
7d2de5c96f fix(ingest): update script label (#1770)
huggingface -> Hugging Face
2024-03-20 20:23:08 +01:00
Iván Martínez
348df781b5 feat(UI): Faster startup and document listing (#1763) 2024-03-20 19:11:44 +01:00
Iván Martínez
572518143a feat(docs): Feature/upgrade docs (#1741)
* Upgrade fern version

* Add info about SDKs
2024-03-19 21:26:53 +01:00
Brett England
134fc54d7d feat(ingest): Created a faster ingestion mode - pipeline (#1750)
* Unify pgvector and postgres connection settings

* Remove local changes

* Update file pgvector->postgres

* postgresql should be postgres

* Adding pipeline ingestion mode

* disable hugging face parallelism.  Continue on file to doc transform failure

* Semaphore to limit docq async workers. ETA reporting
2024-03-19 21:24:46 +01:00
Otto L
1efac6a3fe feat(llm - embed): Add support for Azure OpenAI (#1698)
* Add support for Azure OpenAI

* fix: wrong default api_version

Should be dashes instead of underscores.
see: https://learn.microsoft.com/en-us/azure/ai-services/openai/reference

* fix: code styling

applied "make check" changes

* refactor: extend documentation

* mention azopenai as available option and extras
* add recommended section
* include settings-azopenai.yaml configuration file

* fix: documentation
2024-03-15 16:49:50 +01:00
Brett England
258d02d87c fix(docs): Minor documentation amendment (#1739)
* Unify pgvector and postgres connection settings

* Remove local changes

* Update file pgvector->postgres

* postgresql should be postgres
2024-03-15 16:36:32 +01:00
Brett England
63de7e4930 feat: unify settings for vector and nodestore connections to PostgreSQL (#1730)
* Unify pgvector and postgres connection settings

* Remove local changes

* Update file pgvector->postgres
2024-03-15 09:55:17 +01:00
Brett England
68b3a34b03 feat(nodestore): add Postgres for the doc and index store (#1706)
* Adding Postgres for the doc and index store

* Adding documentation.  Rename postgres database local->simple.  Postgres storage dependencies

* Update documentation for postgres storage

* Renaming feature to nodestore

* update docstore -> nodestore in doc

* missed some docstore changes in doc

* Updated poetry.lock

* Formatting updates to pass ruff/black checks

* Correction to unreachable code!

* Format adjustment to pass black test

* Adjust extra inclusion name for vector pg

* extra dep change for pg vector

* storage-postgres -> storage-nodestore-postgres

* Hash change on poetry lock
2024-03-14 17:12:33 +01:00
Iván Martínez
d17c34e81a fix(settings): set default tokenizer to avoid running make setup fail (#1709) 2024-03-13 09:53:40 +01:00
Andrew Jiang
84ad16af80 feat(docs): upgrade fern (#1596) 2024-03-11 23:02:56 +01:00
Arun Yadav
821bca32e9 feat(local): tiktoken cache within repo for offline (#1467) 2024-03-11 22:55:13 +01:00
icsy7867
02dc83e8e9 feat(llm): adds serveral settings for llamacpp and ollama (#1703) 2024-03-11 22:51:05 +01:00
Hoffelhas
410bf7a71f feat(ui): maintain score order when curating sources (#1643)
* Update ui.py

Changed 'curated_sources' from a list, in order to maintain score order when returning the curated sources.

* Maintain score order after curating sources
2024-03-11 22:27:30 +01:00
icsy7867
290b9fb084 feat(ui): add sources check to not repeat identical sources (#1705) 2024-03-11 22:24:18 +01:00
github-actions[bot]
1b03b369c0 chore(main): release 0.4.0 (#1628)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-03-06 17:53:35 +01:00
Iván Martínez
45f05711eb feat: Upgrade to LlamaIndex to 0.10 (#1663)
* Extract optional dependencies

* Separate local mode into llms-llama-cpp and embeddings-huggingface for clarity

* Support Ollama embeddings

* Upgrade to llamaindex 0.10.14. Remove legacy use of ServiceContext in ContextChatEngine

* Fix vector retriever filters
2024-03-06 17:51:30 +01:00
Daniel Gallego Vico
12f3a39e8a Update x handle to zylon private gpt (#1644) 2024-02-23 15:51:35 +01:00
TQ
cd40e3982b feat(Vector): support pgvector (#1624) 2024-02-20 15:29:26 +01:00
github-actions[bot]
066ea5bf28 chore(main): release 0.3.0 (#1413)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-02-16 17:42:39 +01:00
Iván Martínez
aa13afde07 feat(UI): Select file to Query or Delete + Delete ALL (#1612)
---------

Co-authored-by: Robin Boone <rboone@sofics.com>
2024-02-16 17:36:09 +01:00
icsy7867
24fb80ca38 fix(UI): Updated ui.py. Frees up the CPU to not be bottlenecked.
Updated ui.py to include a small sleep timer while building the stream deltas.  This recursive function fires off so quickly to eats up too much of the CPU.  This small sleep frees up the CPU to not be bottlenecked.  This value can go lower/shorter.  But 0.02 or 0.025 seems to work well. (#1589)

Co-authored-by: root <root@wesgitlabdemo.icl.gtri.org>
2024-02-16 12:52:14 +01:00
Ygal Blum
6bbec79583 feat(llm): Add support for Ollama LLM (#1526) 2024-02-09 15:50:50 +01:00
Nick Smirnov
b178b51451 feat(bulk-ingest): Add --ignored Flag to Exclude Specific Files and Directories During Ingestion (#1432) 2024-02-07 19:59:32 +01:00
Iván Martínez
24fae660e6 feat: Add stream information to generate SDKs (#1569) 2024-02-02 16:14:22 +01:00
Pablo Orgaz
3e67e21d38 Add embedding mode config (#1541) 2024-01-25 10:55:32 +01:00
Naveen Kannan
869233f0e4 fix: Adding an LLM param to fix broken generator from llamacpp (#1519) 2024-01-17 18:10:45 +01:00
CognitiveTech
e326126d0d feat: add mistral + chatml prompts (#1426) 2024-01-16 22:51:14 +01:00
Robert Gay
6191bcdbd6 fix: minor bug in chat stream output - python error being serialized (#1449) 2024-01-16 16:41:20 +01:00
Iván Martínez
d3acd85fe3 fix(tests): load the test settings only when running tests
Previous implementation causes false positives with the last version of LlamaIndex
2024-01-09 12:03:16 +01:00
Guido Schulz
0a89d76cc5 fix(docs): Update quickstart doc and set version in pyproject.toml to 0.2.0 2023-12-26 13:09:31 +01:00
Matthew Hill
2d27a9f956 feat(llm): Add openailike llm mode (#1447)
This mode behaves the same as the openai mode, except that it allows setting custom models not
supported by OpenAI. It can be used with any tool that serves models from an OpenAI compatible API.

Implements #1424
2023-12-26 10:26:08 +01:00
imartinez
fee9f08ef3 Move back to 3900 for the context window to avoid melting local machines 2023-12-22 18:21:43 +01:00
Iván Martínez
fde2b942bc fix(deploy): fix local and external dockerfiles 2023-12-22 14:16:46 +01:00
Iván Martínez
4c69c458ab Improve ingest logs (#1438) 2023-12-21 17:13:46 +01:00
Iván Martínez
4780540870 feat(settings): Configurable context_window and tokenizer (#1437) 2023-12-21 14:49:35 +01:00
Iván Martínez
6eeb95ec7f feat(API): Ingest plain text (#1417)
* Add ingest/text route to ingest plain text

* Add new ingest text test and adapt ingest/file ones

* Include new API in docs

* Remove duplicated logic
2023-12-18 21:47:05 +01:00
Pablo Orgaz
059f35840a fix(docker): docker broken copy (#1419) 2023-12-18 16:55:18 +01:00
Iván Martínez
8ec7cf49f4 feat(settings): Update default model to TheBloke/Mistral-7B-Instruct-v0.2-GGUF (#1415)
* Update LlamaCPP dependency

* Default to TheBloke/Mistral-7B-Instruct-v0.2-GGUF

* Fix API docs
2023-12-17 16:11:08 +01:00
Rohit Das
c71ae7cee9 feat(ui): make chat area stretch to fill the screen (#1397) 2023-12-17 12:02:13 +01:00
cognitivetech
2564f8d2bb fix(settings): correct yaml multiline string (#1403) 2023-12-16 19:02:46 +01:00
Eliott Bouhana
4e496e970a docs: remove misleading comment about pgpt working with python 3.12 (#1394)
I was misled into believing I could install using python 3.12 whereas the pyproject.toml explicitly states otherwise. This PR only removes this comment to make sure other people are not also trapped 😄
2023-12-15 21:35:02 +01:00
Federico Grandi
3582764801 ci: fix preview docs checkout ref (#1393) 2023-12-12 20:33:34 +01:00
Federico Grandi
1d28ae2915 docs: fix minor capitalization typo (#1392) 2023-12-12 20:31:38 +01:00
github-actions[bot]
e8ac51bba4 chore(main): release 0.2.0 (#1387)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-12-10 20:08:12 +01:00
3ly-13
145f3ec9f4 feat(ui): Allows User to Set System Prompt via "Additional Options" in Chat Interface (#1353) 2023-12-10 19:45:14 +01:00
3ly-13
a072a40a7c Allow setting OpenAI model in settings (#1386)
feat(settings): Allow setting openai model to be used. Default to GPT 3.5
2023-12-09 20:13:00 +01:00
Louis Melchior
a3ed14c58f feat(llm): drop default_system_prompt (#1385)
As discussed on Discord, the decision has been made to remove the system prompts by default, to better segregate the API and the UI usages.

A concurrent PR (#1353) is enabling the dynamic setting of a system prompt in the UI.

Therefore, if UI users want to use a custom system prompt, they can specify one directly in the UI.
If the API users want to use a custom prompt, they can pass it directly into their messages that they are passing to the API.

In the highlight of the two use case above, it becomes clear that default system_prompt does not need to exist.
2023-12-08 23:13:51 +01:00
Iván Martínez
f235c50be9 Delete old docs (#1384) 2023-12-08 22:39:23 +01:00
EEmlan
9302620eac Adding german speaking model to documentation (#1374) 2023-12-08 11:26:25 +01:00
Max Zangs
9cf972563e Add setup option to Makefile (#1368) 2023-12-08 10:34:12 +01:00
108 changed files with 9178 additions and 6260 deletions

105
.github/ISSUE_TEMPLATE/bug.yml vendored Normal file
View File

@@ -0,0 +1,105 @@
name: Bug Report
description: Report a bug or issue with the project.
title: "[BUG] "
labels: ["bug"]
body:
- type: markdown
attributes:
value: |
**Please describe the bug you encountered.**
- type: checkboxes
id: pre-check
attributes:
label: Pre-check
description: Please confirm that you have searched for duplicate issues before creating this one.
options:
- label: I have searched the existing issues and none cover this bug.
required: true
- type: textarea
id: description
attributes:
label: Description
description: Provide a detailed description of the bug.
placeholder: "Detailed description of the bug"
validations:
required: true
- type: textarea
id: steps
attributes:
label: Steps to Reproduce
description: Provide the steps to reproduce the bug.
placeholder: "1. Step one\n2. Step two\n3. Step three"
validations:
required: true
- type: input
id: expected
attributes:
label: Expected Behavior
description: Describe what you expected to happen.
placeholder: "Expected behavior"
validations:
required: true
- type: input
id: actual
attributes:
label: Actual Behavior
description: Describe what actually happened.
placeholder: "Actual behavior"
validations:
required: true
- type: input
id: environment
attributes:
label: Environment
description: Provide details about your environment (e.g., OS, GPU, profile, etc.).
placeholder: "Environment details"
validations:
required: true
- type: input
id: additional
attributes:
label: Additional Information
description: Provide any additional information that may be relevant (e.g., logs, screenshots).
placeholder: "Any additional information that may be relevant"
- type: input
id: version
attributes:
label: Version
description: Provide the version of the project where you encountered the bug.
placeholder: "Version number"
- type: markdown
attributes:
value: |
**Please ensure the following setup checklist has been reviewed before submitting the bug report.**
- type: checkboxes
id: general-setup-checklist
attributes:
label: Setup Checklist
description: Verify the following general aspects of your setup.
options:
- label: Confirm that you have followed the installation instructions in the projects documentation.
- label: Check that you are using the latest version of the project.
- label: Verify disk space availability for model storage and data processing.
- label: Ensure that you have the necessary permissions to run the project.
- type: checkboxes
id: nvidia-setup-checklist
attributes:
label: NVIDIA GPU Setup Checklist
description: Verify the following aspects of your NVIDIA GPU setup.
options:
- label: Check that the all CUDA dependencies are installed and are compatible with your GPU (refer to [CUDA's documentation](https://docs.nvidia.com/deploy/cuda-compatibility/#frequently-asked-questions))
- label: Ensure an NVIDIA GPU is installed and recognized by the system (run `nvidia-smi` to verify).
- label: Ensure proper permissions are set for accessing GPU resources.
- label: Docker users - Verify that the NVIDIA Container Toolkit is configured correctly (e.g. run `sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi`)

8
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@@ -0,0 +1,8 @@
blank_issues_enabled: false
contact_links:
- name: Documentation
url: https://docs.privategpt.dev
about: Please refer to our documentation for more details and guidance.
- name: Discord
url: https://discord.gg/bK6mRVpErU
about: Join our Discord community to ask questions and get help.

19
.github/ISSUE_TEMPLATE/docs.yml vendored Normal file
View File

@@ -0,0 +1,19 @@
name: Documentation
description: Suggest a change or addition to the documentation.
title: "[DOCS] "
labels: ["documentation"]
body:
- type: markdown
attributes:
value: |
**Please describe the documentation change or addition you would like to suggest.**
- type: textarea
id: description
attributes:
label: Description
description: Provide a detailed description of the documentation change.
placeholder: "Detailed description of the documentation change"
validations:
required: true

37
.github/ISSUE_TEMPLATE/feature.yml vendored Normal file
View File

@@ -0,0 +1,37 @@
name: Enhancement
description: Suggest an enhancement or improvement to the project.
title: "[FEATURE] "
labels: ["enhancement"]
body:
- type: markdown
attributes:
value: |
**Please describe the enhancement or improvement you would like to suggest.**
- type: textarea
id: feature_description
attributes:
label: Feature Description
description: Provide a detailed description of the enhancement.
placeholder: "Detailed description of the enhancement"
validations:
required: true
- type: textarea
id: reason
attributes:
label: Reason
description: Explain the reason for this enhancement.
placeholder: "Reason for the enhancement"
validations:
required: true
- type: textarea
id: value
attributes:
label: Value of Feature
description: Describe the value or benefits this feature will bring.
placeholder: "Value or benefits of the feature"
validations:
required: true

19
.github/ISSUE_TEMPLATE/question.yml vendored Normal file
View File

@@ -0,0 +1,19 @@
name: Question
description: Ask a question about the project.
title: "[QUESTION] "
labels: ["question"]
body:
- type: markdown
attributes:
value: |
**Please describe your question in detail.**
- type: textarea
id: question
attributes:
label: Question
description: Provide a detailed description of your question.
placeholder: "Detailed description of the question"
validations:
required: true

37
.github/pull_request_template.md vendored Normal file
View File

@@ -0,0 +1,37 @@
# Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
## Type of Change
Please delete options that are not relevant.
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update
## How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
- [ ] Added new unit/integration tests
- [ ] I stared at the code and made sure it makes sense
**Test Configuration**:
* Firmware version:
* Hardware:
* Toolchain:
* SDK:
## Checklist:
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged and published in downstream modules
- [ ] I ran `make check; make test` to ensure mypy and tests pass

View File

@@ -8,7 +8,7 @@ inputs:
poetry_version:
required: true
type: string
default: "1.5.1"
default: "1.8.3"
runs:
using: composite
@@ -25,6 +25,6 @@ runs:
python-version: ${{ inputs.python_version }}
cache: "poetry"
- name: Install Dependencies
run: poetry install --with ui --no-root
run: poetry install --extras "ui vector-stores-qdrant" --no-root
shell: bash

View File

@@ -1,45 +0,0 @@
name: docker
on:
release:
types: [ published ]
workflow_dispatch:
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Log in to the Container registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=sha
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
file: Dockerfile.external
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

83
.github/workflows/generate-release.yml vendored Normal file
View File

@@ -0,0 +1,83 @@
name: generate-release
on:
release:
types: [ published ]
workflow_dispatch:
env:
REGISTRY: docker.io
IMAGE_NAME: ${{ github.repository }}
platforms: linux/amd64,linux/arm64
DEFAULT_TYPE: "external"
jobs:
build-and-push-image:
runs-on: ubuntu-latest
strategy:
matrix:
type: [ local, external ]
permissions:
contents: read
packages: write
outputs:
version: ${{ steps.version.outputs.version }}
steps:
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/free-disk-space@main
with:
tool-cache: false
android: true
dotnet: true
haskell: true
large-packages: true
docker-images: false
swap-storage: true
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=semver,pattern={{version}},enable=${{ matrix.type == env.DEFAULT_TYPE }}
type=semver,pattern={{version}}-${{ matrix.type }}
type=semver,pattern={{major}}.{{minor}},enable=${{ matrix.type == env.DEFAULT_TYPE }}
type=semver,pattern={{major}}.{{minor}}-${{ matrix.type }}
type=raw,value=latest,enable=${{ matrix.type == env.DEFAULT_TYPE }}
type=sha
flavor: |
latest=false
- name: Build and push Docker image
uses: docker/build-push-action@v6
with:
context: .
file: Dockerfile.${{ matrix.type }}
platforms: ${{ env.platforms }}
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
- name: Version output
id: version
run: echo "version=${{ steps.meta.outputs.version }}" >> "$GITHUB_OUTPUT"

View File

@@ -11,11 +11,17 @@ jobs:
preview-docs:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
ref: refs/pull/${{ github.event.pull_request.number }}/merge
- name: Setup Node.js
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: "18"
@@ -35,14 +41,14 @@ jobs:
# Set the output for the step
echo "::set-output name=preview_url::$preview_url"
- name: Comment PR with URL using github-actions bot
uses: actions/github-script@v4
uses: actions/github-script@v7
if: ${{ steps.generate_docs.outputs.preview_url }}
with:
script: |
const preview_url = '${{ steps.generate_docs.outputs.preview_url }}';
const issue_number = context.issue.number;
github.issues.createComment({
...context.repo,
issue_number: issue_number,
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `Published docs preview URL: ${preview_url}`
})

2
.gitignore vendored
View File

@@ -1,4 +1,6 @@
.venv
.env
venv
settings-me.yaml

View File

@@ -1,5 +1,84 @@
# Changelog
## [0.5.0](https://github.com/zylon-ai/private-gpt/compare/v0.4.0...v0.5.0) (2024-04-02)
### Features
* **code:** improve concat of strings in ui ([#1785](https://github.com/zylon-ai/private-gpt/issues/1785)) ([bac818a](https://github.com/zylon-ai/private-gpt/commit/bac818add51b104cda925b8f1f7b51448e935ca1))
* **docker:** set default Docker to use Ollama ([#1812](https://github.com/zylon-ai/private-gpt/issues/1812)) ([f83abff](https://github.com/zylon-ai/private-gpt/commit/f83abff8bc955a6952c92cc7bcb8985fcec93afa))
* **docs:** Add guide Llama-CPP Linux AMD GPU support ([#1782](https://github.com/zylon-ai/private-gpt/issues/1782)) ([8a836e4](https://github.com/zylon-ai/private-gpt/commit/8a836e4651543f099c59e2bf497ab8c55a7cd2e5))
* **docs:** Feature/upgrade docs ([#1741](https://github.com/zylon-ai/private-gpt/issues/1741)) ([5725181](https://github.com/zylon-ai/private-gpt/commit/572518143ac46532382db70bed6f73b5082302c1))
* **docs:** upgrade fern ([#1596](https://github.com/zylon-ai/private-gpt/issues/1596)) ([84ad16a](https://github.com/zylon-ai/private-gpt/commit/84ad16af80191597a953248ce66e963180e8ddec))
* **ingest:** Created a faster ingestion mode - pipeline ([#1750](https://github.com/zylon-ai/private-gpt/issues/1750)) ([134fc54](https://github.com/zylon-ai/private-gpt/commit/134fc54d7d636be91680dc531f5cbe2c5892ac56))
* **llm - embed:** Add support for Azure OpenAI ([#1698](https://github.com/zylon-ai/private-gpt/issues/1698)) ([1efac6a](https://github.com/zylon-ai/private-gpt/commit/1efac6a3fe19e4d62325e2c2915cd84ea277f04f))
* **llm:** adds serveral settings for llamacpp and ollama ([#1703](https://github.com/zylon-ai/private-gpt/issues/1703)) ([02dc83e](https://github.com/zylon-ai/private-gpt/commit/02dc83e8e9f7ada181ff813f25051bbdff7b7c6b))
* **llm:** Ollama LLM-Embeddings decouple + longer keep_alive settings ([#1800](https://github.com/zylon-ai/private-gpt/issues/1800)) ([b3b0140](https://github.com/zylon-ai/private-gpt/commit/b3b0140e244e7a313bfaf4ef10eb0f7e4192710e))
* **llm:** Ollama timeout setting ([#1773](https://github.com/zylon-ai/private-gpt/issues/1773)) ([6f6c785](https://github.com/zylon-ai/private-gpt/commit/6f6c785dac2bbad37d0b67fda215784298514d39))
* **local:** tiktoken cache within repo for offline ([#1467](https://github.com/zylon-ai/private-gpt/issues/1467)) ([821bca3](https://github.com/zylon-ai/private-gpt/commit/821bca32e9ee7c909fd6488445ff6a04463bf91b))
* **nodestore:** add Postgres for the doc and index store ([#1706](https://github.com/zylon-ai/private-gpt/issues/1706)) ([68b3a34](https://github.com/zylon-ai/private-gpt/commit/68b3a34b032a08ca073a687d2058f926032495b3))
* **rag:** expose similarity_top_k and similarity_score to settings ([#1771](https://github.com/zylon-ai/private-gpt/issues/1771)) ([087cb0b](https://github.com/zylon-ai/private-gpt/commit/087cb0b7b74c3eb80f4f60b47b3a021c81272ae1))
* **RAG:** Introduce SentenceTransformer Reranker ([#1810](https://github.com/zylon-ai/private-gpt/issues/1810)) ([83adc12](https://github.com/zylon-ai/private-gpt/commit/83adc12a8ef0fa0c13a0dec084fa596445fc9075))
* **scripts:** Wipe qdrant and obtain db Stats command ([#1783](https://github.com/zylon-ai/private-gpt/issues/1783)) ([ea153fb](https://github.com/zylon-ai/private-gpt/commit/ea153fb92f1f61f64c0d04fff0048d4d00b6f8d0))
* **ui:** Add Model Information to ChatInterface label ([f0b174c](https://github.com/zylon-ai/private-gpt/commit/f0b174c097c2d5e52deae8ef88de30a0d9013a38))
* **ui:** add sources check to not repeat identical sources ([#1705](https://github.com/zylon-ai/private-gpt/issues/1705)) ([290b9fb](https://github.com/zylon-ai/private-gpt/commit/290b9fb084632216300e89bdadbfeb0380724b12))
* **UI:** Faster startup and document listing ([#1763](https://github.com/zylon-ai/private-gpt/issues/1763)) ([348df78](https://github.com/zylon-ai/private-gpt/commit/348df781b51606b2f9810bcd46f850e54192fd16))
* **ui:** maintain score order when curating sources ([#1643](https://github.com/zylon-ai/private-gpt/issues/1643)) ([410bf7a](https://github.com/zylon-ai/private-gpt/commit/410bf7a71f17e77c4aec723ab80c233b53765964))
* unify settings for vector and nodestore connections to PostgreSQL ([#1730](https://github.com/zylon-ai/private-gpt/issues/1730)) ([63de7e4](https://github.com/zylon-ai/private-gpt/commit/63de7e4930ac90dd87620225112a22ffcbbb31ee))
* wipe per storage type ([#1772](https://github.com/zylon-ai/private-gpt/issues/1772)) ([c2d6948](https://github.com/zylon-ai/private-gpt/commit/c2d694852b4696834962a42fde047b728722ad74))
### Bug Fixes
* **docs:** Minor documentation amendment ([#1739](https://github.com/zylon-ai/private-gpt/issues/1739)) ([258d02d](https://github.com/zylon-ai/private-gpt/commit/258d02d87c5cb81d6c3a6f06aa69339b670dffa9))
* Fixed docker-compose ([#1758](https://github.com/zylon-ai/private-gpt/issues/1758)) ([774e256](https://github.com/zylon-ai/private-gpt/commit/774e2560520dc31146561d09a2eb464c68593871))
* **ingest:** update script label ([#1770](https://github.com/zylon-ai/private-gpt/issues/1770)) ([7d2de5c](https://github.com/zylon-ai/private-gpt/commit/7d2de5c96fd42e339b26269b3155791311ef1d08))
* **settings:** set default tokenizer to avoid running make setup fail ([#1709](https://github.com/zylon-ai/private-gpt/issues/1709)) ([d17c34e](https://github.com/zylon-ai/private-gpt/commit/d17c34e81a84518086b93605b15032e2482377f7))
## [0.4.0](https://github.com/imartinez/privateGPT/compare/v0.3.0...v0.4.0) (2024-03-06)
### Features
* Upgrade to LlamaIndex to 0.10 ([#1663](https://github.com/imartinez/privateGPT/issues/1663)) ([45f0571](https://github.com/imartinez/privateGPT/commit/45f05711eb71ffccdedb26f37e680ced55795d44))
* **Vector:** support pgvector ([#1624](https://github.com/imartinez/privateGPT/issues/1624)) ([cd40e39](https://github.com/imartinez/privateGPT/commit/cd40e3982b780b548b9eea6438c759f1c22743a8))
## [0.3.0](https://github.com/imartinez/privateGPT/compare/v0.2.0...v0.3.0) (2024-02-16)
### Features
* add mistral + chatml prompts ([#1426](https://github.com/imartinez/privateGPT/issues/1426)) ([e326126](https://github.com/imartinez/privateGPT/commit/e326126d0d4cd7e46a79f080c442c86f6dd4d24b))
* Add stream information to generate SDKs ([#1569](https://github.com/imartinez/privateGPT/issues/1569)) ([24fae66](https://github.com/imartinez/privateGPT/commit/24fae660e6913aac6b52745fb2c2fe128ba2eb79))
* **API:** Ingest plain text ([#1417](https://github.com/imartinez/privateGPT/issues/1417)) ([6eeb95e](https://github.com/imartinez/privateGPT/commit/6eeb95ec7f17a618aaa47f5034ee5bccae02b667))
* **bulk-ingest:** Add --ignored Flag to Exclude Specific Files and Directories During Ingestion ([#1432](https://github.com/imartinez/privateGPT/issues/1432)) ([b178b51](https://github.com/imartinez/privateGPT/commit/b178b514519550e355baf0f4f3f6beb73dca7df2))
* **llm:** Add openailike llm mode ([#1447](https://github.com/imartinez/privateGPT/issues/1447)) ([2d27a9f](https://github.com/imartinez/privateGPT/commit/2d27a9f956d672cb1fe715cf0acdd35c37f378a5)), closes [#1424](https://github.com/imartinez/privateGPT/issues/1424)
* **llm:** Add support for Ollama LLM ([#1526](https://github.com/imartinez/privateGPT/issues/1526)) ([6bbec79](https://github.com/imartinez/privateGPT/commit/6bbec79583b7f28d9bea4b39c099ebef149db843))
* **settings:** Configurable context_window and tokenizer ([#1437](https://github.com/imartinez/privateGPT/issues/1437)) ([4780540](https://github.com/imartinez/privateGPT/commit/47805408703c23f0fd5cab52338142c1886b450b))
* **settings:** Update default model to TheBloke/Mistral-7B-Instruct-v0.2-GGUF ([#1415](https://github.com/imartinez/privateGPT/issues/1415)) ([8ec7cf4](https://github.com/imartinez/privateGPT/commit/8ec7cf49f40701a4f2156c48eb2fad9fe6220629))
* **ui:** make chat area stretch to fill the screen ([#1397](https://github.com/imartinez/privateGPT/issues/1397)) ([c71ae7c](https://github.com/imartinez/privateGPT/commit/c71ae7cee92463bbc5ea9c434eab9f99166e1363))
* **UI:** Select file to Query or Delete + Delete ALL ([#1612](https://github.com/imartinez/privateGPT/issues/1612)) ([aa13afd](https://github.com/imartinez/privateGPT/commit/aa13afde07122f2ddda3942f630e5cadc7e4e1ee))
### Bug Fixes
* Adding an LLM param to fix broken generator from llamacpp ([#1519](https://github.com/imartinez/privateGPT/issues/1519)) ([869233f](https://github.com/imartinez/privateGPT/commit/869233f0e4f03dc23e5fae43cf7cb55350afdee9))
* **deploy:** fix local and external dockerfiles ([fde2b94](https://github.com/imartinez/privateGPT/commit/fde2b942bc03688701ed563be6d7d597c75e4e4e))
* **docker:** docker broken copy ([#1419](https://github.com/imartinez/privateGPT/issues/1419)) ([059f358](https://github.com/imartinez/privateGPT/commit/059f35840adbc3fb93d847d6decf6da32d08670c))
* **docs:** Update quickstart doc and set version in pyproject.toml to 0.2.0 ([0a89d76](https://github.com/imartinez/privateGPT/commit/0a89d76cc5ed4371ffe8068858f23dfbb5e8cc37))
* minor bug in chat stream output - python error being serialized ([#1449](https://github.com/imartinez/privateGPT/issues/1449)) ([6191bcd](https://github.com/imartinez/privateGPT/commit/6191bcdbd6e92b6f4d5995967dc196c9348c5954))
* **settings:** correct yaml multiline string ([#1403](https://github.com/imartinez/privateGPT/issues/1403)) ([2564f8d](https://github.com/imartinez/privateGPT/commit/2564f8d2bb8c4332a6a0ab6d722a2ac15006b85f))
* **tests:** load the test settings only when running tests ([d3acd85](https://github.com/imartinez/privateGPT/commit/d3acd85fe34030f8cfd7daf50b30c534087bdf2b))
* **UI:** Updated ui.py. Frees up the CPU to not be bottlenecked. ([24fb80c](https://github.com/imartinez/privateGPT/commit/24fb80ca38f21910fe4fd81505d14960e9ed4faa))
## [0.2.0](https://github.com/imartinez/privateGPT/compare/v0.1.0...v0.2.0) (2023-12-10)
### Features
* **llm:** drop default_system_prompt ([#1385](https://github.com/imartinez/privateGPT/issues/1385)) ([a3ed14c](https://github.com/imartinez/privateGPT/commit/a3ed14c58f77351dbd5f8f2d7868d1642a44f017))
* **ui:** Allows User to Set System Prompt via "Additional Options" in Chat Interface ([#1353](https://github.com/imartinez/privateGPT/issues/1353)) ([145f3ec](https://github.com/imartinez/privateGPT/commit/145f3ec9f41c4def5abf4065a06fb0786e2d992a))
## [0.1.0](https://github.com/imartinez/privateGPT/compare/v0.0.2...v0.1.0) (2023-11-30)

View File

@@ -8,18 +8,9 @@ message: >-
metadata from this file.
type: software
authors:
- given-names: Iván
family-names: Martínez Toro
email: ivanmartit@gmail.com
orcid: 'https://orcid.org/0009-0004-5065-2311'
- family-names: Gallego Vico
given-names: Daniel
email: danielgallegovico@gmail.com
orcid: 'https://orcid.org/0009-0006-8582-4384'
- given-names: Pablo
family-names: Orgaz
email: pabloogc+gh@gmail.com
orcid: 'https://orcid.org/0009-0008-0080-1437'
repository-code: 'https://github.com/imartinez/privateGPT'
- name: Zylon by PrivateGPT
address: hello@zylon.ai
website: 'https://www.zylon.ai/'
repository-code: 'https://github.com/zylon-ai/private-gpt'
license: Apache-2.0
date-released: '2023-05-02'

View File

@@ -3,8 +3,9 @@ FROM python:3.11.6-slim-bookworm as base
# Install poetry
RUN pip install pipx
RUN python3 -m pipx ensurepath
RUN pipx install poetry
RUN pipx install poetry==1.8.3
ENV PATH="/root/.local/bin:$PATH"
ENV PATH=".venv/bin/:$PATH"
# https://python-poetry.org/docs/configuration/#virtualenvsin-project
ENV POETRY_VIRTUALENVS_IN_PROJECT=true
@@ -13,24 +14,38 @@ FROM base as dependencies
WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./
RUN poetry install --with ui
ARG POETRY_EXTRAS="ui vector-stores-qdrant llms-ollama embeddings-ollama"
RUN poetry install --no-root --extras "${POETRY_EXTRAS}"
FROM base as app
ENV PYTHONUNBUFFERED=1
ENV PORT=8080
ENV APP_ENV=prod
ENV PYTHONPATH="$PYTHONPATH:/home/worker/app/private_gpt/"
EXPOSE 8080
# Prepare a non-root user
RUN adduser --system worker
# More info about how to configure UIDs and GIDs in Docker:
# https://github.com/systemd/systemd/blob/main/docs/UIDS-GIDS.md
# Define the User ID (UID) for the non-root user
# UID 100 is chosen to avoid conflicts with existing system users
ARG UID=100
# Define the Group ID (GID) for the non-root user
# GID 65534 is often used for the 'nogroup' or 'nobody' group
ARG GID=65534
RUN adduser --system --gid ${GID} --uid ${UID} --home /home/worker worker
WORKDIR /home/worker/app
RUN mkdir local_data; chown worker local_data
RUN mkdir models; chown worker models
RUN chown worker /home/worker/app
RUN mkdir local_data && chown worker local_data
RUN mkdir models && chown worker models
COPY --chown=worker --from=dependencies /home/worker/app/.venv/ .venv
COPY --chown=worker private_gpt/ private_gpt
COPY --chown=worker docs/ docs
COPY --chown=worker *.yaml *.md ./
COPY --chown=worker *.yaml .
COPY --chown=worker scripts/ scripts
USER worker
ENTRYPOINT .venv/bin/python -m private_gpt
ENTRYPOINT python -m private_gpt

View File

@@ -5,8 +5,9 @@ FROM python:3.11.6-slim-bookworm as base
# Install poetry
RUN pip install pipx
RUN python3 -m pipx ensurepath
RUN pipx install poetry
RUN pipx install poetry==1.8.3
ENV PATH="/root/.local/bin:$PATH"
ENV PATH=".venv/bin/:$PATH"
# Dependencies to build llama-cpp
RUN apt update && apt install -y \
@@ -23,25 +24,39 @@ FROM base as dependencies
WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./
RUN poetry install --with local
RUN poetry install --with ui
ARG POETRY_EXTRAS="ui embeddings-huggingface llms-llama-cpp vector-stores-qdrant"
RUN poetry install --no-root --extras "${POETRY_EXTRAS}"
FROM base as app
ENV PYTHONUNBUFFERED=1
ENV PORT=8080
ENV APP_ENV=prod
ENV PYTHONPATH="$PYTHONPATH:/home/worker/app/private_gpt/"
EXPOSE 8080
# Prepare a non-root user
RUN adduser --system worker
# More info about how to configure UIDs and GIDs in Docker:
# https://github.com/systemd/systemd/blob/main/docs/UIDS-GIDS.md
# Define the User ID (UID) for the non-root user
# UID 100 is chosen to avoid conflicts with existing system users
ARG UID=100
# Define the Group ID (GID) for the non-root user
# GID 65534 is often used for the 'nogroup' or 'nobody' group
ARG GID=65534
RUN adduser --system --gid ${GID} --uid ${UID} --home /home/worker worker
WORKDIR /home/worker/app
RUN mkdir local_data; chown worker local_data
RUN mkdir models; chown worker models
RUN chown worker /home/worker/app
RUN mkdir local_data && chown worker local_data
RUN mkdir models && chown worker models
COPY --chown=worker --from=dependencies /home/worker/app/.venv/ .venv
COPY --chown=worker private_gpt/ private_gpt
COPY --chown=worker docs/ docs
COPY --chown=worker *.yaml *.md ./
COPY --chown=worker *.yaml ./
COPY --chown=worker scripts/ scripts
USER worker
ENTRYPOINT .venv/bin/python -m private_gpt
ENTRYPOINT python -m private_gpt

View File

@@ -51,5 +51,28 @@ api-docs:
ingest:
@poetry run python scripts/ingest_folder.py $(call args)
stats:
poetry run python scripts/utils.py stats
wipe:
poetry run python scripts/utils.py wipe
poetry run python scripts/utils.py wipe
setup:
poetry run python scripts/setup
list:
@echo "Available commands:"
@echo " test : Run tests using pytest"
@echo " test-coverage : Run tests with coverage report"
@echo " black : Check code format with black"
@echo " ruff : Check code with ruff"
@echo " format : Format code with black and ruff"
@echo " mypy : Run mypy for type checking"
@echo " check : Run format and mypy commands"
@echo " run : Run the application"
@echo " dev-windows : Run the application in development mode on Windows"
@echo " dev : Run the application in development mode"
@echo " api-docs : Generate API documentation"
@echo " ingest : Ingest data using specified script"
@echo " wipe : Wipe data using specified script"
@echo " setup : Setup the application"

View File

@@ -1,15 +1,9 @@
# 🔒 PrivateGPT 📑
[![Tests](https://github.com/imartinez/privateGPT/actions/workflows/tests.yml/badge.svg)](https://github.com/imartinez/privateGPT/actions/workflows/tests.yml?query=branch%3Amain)
[![Tests](https://github.com/zylon-ai/private-gpt/actions/workflows/tests.yml/badge.svg)](https://github.com/zylon-ai/private-gpt/actions/workflows/tests.yml?query=branch%3Amain)
[![Website](https://img.shields.io/website?up_message=check%20it&down_message=down&url=https%3A%2F%2Fdocs.privategpt.dev%2F&label=Documentation)](https://docs.privategpt.dev/)
[![Discord](https://img.shields.io/discord/1164200432894234644?logo=discord&label=PrivateGPT)](https://discord.gg/bK6mRVpErU)
[![X (formerly Twitter) Follow](https://img.shields.io/twitter/follow/PrivateGPT_AI)](https://twitter.com/PrivateGPT_AI)
> Install & usage docs: https://docs.privategpt.dev/
>
> Join the community: [Twitter](https://twitter.com/PrivateGPT_AI) & [Discord](https://discord.gg/bK6mRVpErU)
[![X (formerly Twitter) Follow](https://img.shields.io/twitter/follow/ZylonPrivateGPT)](https://twitter.com/ZylonPrivateGPT)
![Gradio UI](/fern/docs/assets/ui.png?raw=true)
@@ -17,6 +11,12 @@ PrivateGPT is a production-ready AI project that allows you to ask questions abo
of Large Language Models (LLMs), even in scenarios without an Internet connection. 100% private, no data leaves your
execution environment at any point.
>[!TIP]
> If you are looking for an **enterprise-ready, fully private AI workspace**
> check out [Zylon's website](https://zylon.ai) or [request a demo](https://cal.com/zylon/demo?source=pgpt-readme).
> Crafted by the team behind PrivateGPT, Zylon is a best-in-class AI collaborative
> workspace that can be easily deployed on-premise (data center, bare metal...) or in your private cloud (AWS, GCP, Azure...).
The project provides an API offering all the primitives required to build private, context-aware AI applications.
It follows and extends the [OpenAI API standard](https://openai.com/blog/openai-api),
and supports both normal and streaming responses.
@@ -38,13 +38,10 @@ In addition to this, a working [Gradio UI](https://www.gradio.app/)
client is provided to test the API, together with a set of useful tools such as bulk model
download script, ingestion script, documents folder watch, etc.
> 👂 **Need help applying PrivateGPT to your specific use case?**
> [Let us know more about it](https://forms.gle/4cSDmH13RZBHV9at7)
> and we'll try to help! We are refining PrivateGPT through your feedback.
## 🎞️ Overview
DISCLAIMER: This README is not updated as frequently as the [documentation](https://docs.privategpt.dev/).
Please check it out for the latest updates!
>[!WARNING]
> This README is not updated as frequently as the [documentation](https://docs.privategpt.dev/).
> Please check it out for the latest updates!
### Motivation behind PrivateGPT
Generative AI is a game changer for our society, but adoption in companies of all sizes and data-sensitive
@@ -62,7 +59,7 @@ thus a simpler and more educational implementation to understand the basic conce
to build a fully local -and therefore, private- chatGPT-like tool.
If you want to keep experimenting with it, we have saved it in the
[primordial branch](https://github.com/imartinez/privateGPT/tree/primordial) of the project.
[primordial branch](https://github.com/zylon-ai/private-gpt/tree/primordial) of the project.
> It is strongly recommended to do a clean clone and install of this new version of
PrivateGPT if you come from the previous, primordial version.
@@ -73,7 +70,7 @@ completions, document ingestion, RAG pipelines and other low-level building bloc
We want to make it easier for any developer to build AI applications and experiences, as well as provide
a suitable extensive architecture for the community to keep contributing.
Stay tuned to our [releases](https://github.com/imartinez/privateGPT/releases) to check out all the new features and changes included.
Stay tuned to our [releases](https://github.com/zylon-ai/private-gpt/releases) to check out all the new features and changes included.
## 📄 Documentation
Full documentation on installation, dependencies, configuration, running the server, deployment options,
@@ -117,7 +114,7 @@ Don't know what to contribute? Here is the public
[Project Board](https://github.com/users/imartinez/projects/3) with several ideas.
Head over to Discord
#contributors channel and ask for write permissions on that Github project.
#contributors channel and ask for write permissions on that GitHub project.
## 💬 Community
Join the conversation around PrivateGPT on our:
@@ -132,19 +129,19 @@ Here are a couple of examples:
#### BibTeX
```bibtex
@software{Martinez_Toro_PrivateGPT_2023,
author = {Martínez Toro, Iván and Gallego Vico, Daniel and Orgaz, Pablo},
@software{Zylon_PrivateGPT_2023,
author = {Zylon by PrivateGPT},
license = {Apache-2.0},
month = may,
title = {{PrivateGPT}},
url = {https://github.com/imartinez/privateGPT},
url = {https://github.com/zylon-ai/private-gpt},
year = {2023}
}
```
#### APA
```
Martínez Toro, I., Gallego Vico, D., & Orgaz, P. (2023). PrivateGPT [Computer software]. https://github.com/imartinez/privateGPT
Zylon by PrivateGPT (2023). PrivateGPT [Computer software]. https://github.com/zylon-ai/private-gpt
```
## 🤗 Partners & Supporters
@@ -158,4 +155,4 @@ This project has been strongly influenced and supported by other amazing project
[GPT4All](https://github.com/nomic-ai/gpt4all),
[LlamaCpp](https://github.com/ggerganov/llama.cpp),
[Chroma](https://www.trychroma.com/)
and [SentenceTransformers](https://www.sbert.net/).
and [SentenceTransformers](https://www.sbert.net/).

View File

@@ -1,14 +1,19 @@
services:
private-gpt:
build:
dockerfile: Dockerfile.local
dockerfile: Dockerfile.external
volumes:
- ./local_data/:/home/worker/app/local_data
- ./models/:/home/worker/app/models
ports:
- 8001:8080
- 8001:8001
environment:
PORT: 8080
PORT: 8001
PGPT_PROFILES: docker
PGPT_MODE: local
PGPT_MODE: ollama
PGPT_EMBED_MODE: ollama
ollama:
image: ollama/ollama:latest
ports:
- 11434:11434
volumes:
- ./models:/root/.ollama

View File

View File

@@ -1,474 +0,0 @@
## Introduction
PrivateGPT provides an **API** containing all the building blocks required to build
**private, context-aware AI applications**. The API follows and extends OpenAI API standard, and supports
both normal and streaming responses.
The API is divided in two logical blocks:
- High-level API, abstracting all the complexity of a RAG (Retrieval Augmented Generation) pipeline implementation:
- Ingestion of documents: internally managing document parsing, splitting, metadata extraction,
embedding generation and storage.
- Chat & Completions using context from ingested documents: abstracting the retrieval of context, the prompt
engineering and the response generation.
- Low-level API, allowing advanced users to implement their own complex pipelines:
- Embeddings generation: based on a piece of text.
- Contextual chunks retrieval: given a query, returns the most relevant chunks of text from the ingested
documents.
> A working **Gradio UI client** is provided to test the API, together with a set of
> useful tools such as bulk model download script, ingestion script, documents folder
> watch, etc.
## Quick Local Installation steps
The steps in `Installation and Settings` section are better explained and cover more
setup scenarios. But if you are looking for a quick setup guide, here it is:
```
# Clone the repo
git clone https://github.com/imartinez/privateGPT
cd privateGPT
# Install Python 3.11
pyenv install 3.11
pyenv local 3.11
# Install dependencies
poetry install --with ui,local
# Download Embedding and LLM models
poetry run python scripts/setup
# (Optional) For Mac with Metal GPU, enable it. Check Installation and Settings section
to know how to enable GPU on other platforms
CMAKE_ARGS="-DLLAMA_METAL=on" pip install --force-reinstall --no-cache-dir llama-cpp-python
# Run the local server
PGPT_PROFILES=local make run
# Note: on Mac with Metal you should see a ggml_metal_add_buffer log, stating GPU is
being used
# Navigate to the UI and try it out!
http://localhost:8001/
```
## Installation and Settings
### Base requirements to run PrivateGPT
* Git clone PrivateGPT repository, and navigate to it:
```
git clone https://github.com/imartinez/privateGPT
cd privateGPT
```
* Install Python 3.11. Ideally through a python version manager like `pyenv`.
Python 3.12
should work too. Earlier python versions are not supported.
* osx/linux: [pyenv](https://github.com/pyenv/pyenv)
* windows: [pyenv-win](https://github.com/pyenv-win/pyenv-win)
```
pyenv install 3.11
pyenv local 3.11
```
* Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) for dependency management:
* Have a valid C++ compiler like gcc. See [Troubleshooting: C++ Compiler](#troubleshooting-c-compiler) for more details.
* Install `make` for scripts:
* osx: (Using homebrew): `brew install make`
* windows: (Using chocolatey) `choco install make`
### Install dependencies
Install the dependencies:
```bash
poetry install --with ui
```
Verify everything is working by running `make run` (or `poetry run python -m private_gpt`) and navigate to
http://localhost:8001. You should see a [Gradio UI](https://gradio.app/) **configured with a mock LLM** that will
echo back the input. Later we'll see how to configure a real LLM.
### Settings
> Note: the default settings of PrivateGPT work out-of-the-box for a 100% local setup. Skip this section if you just
> want to test PrivateGPT locally, and come back later to learn about more configuration options.
PrivateGPT is configured through *profiles* that are defined using yaml files, and selected through env variables.
The full list of properties configurable can be found in `settings.yaml`
#### env var `PGPT_SETTINGS_FOLDER`
The location of the settings folder. Defaults to the root of the project.
Should contain the default `settings.yaml` and any other `settings-{profile}.yaml`.
#### env var `PGPT_PROFILES`
By default, the profile definition in `settings.yaml` is loaded.
Using this env var you can load additional profiles; format is a comma separated list of profile names.
This will merge `settings-{profile}.yaml` on top of the base settings file.
For example:
`PGPT_PROFILES=local,cuda` will load `settings-local.yaml`
and `settings-cuda.yaml`, their contents will be merged with
later profiles properties overriding values of earlier ones like `settings.yaml`.
During testing, the `test` profile will be active along with the default, therefore `settings-test.yaml`
file is required.
#### Environment variables expansion
Configuration files can contain environment variables,
they will be expanded at runtime.
Expansion must follow the pattern `${VARIABLE_NAME:default_value}`.
For example, the following configuration will use the value of the `PORT`
environment variable or `8001` if it's not set.
Missing variables with no default will produce an error.
```yaml
server:
port: ${PORT:8001}
```
### Local LLM requirements
Install extra dependencies for local execution:
```bash
poetry install --with local
```
For PrivateGPT to run fully locally GPU acceleration is required
(CPU execution is possible, but very slow), however,
typical Macbook laptops or window desktops with mid-range GPUs lack VRAM to run
even the smallest LLMs. For that reason
**local execution is only supported for models compatible with [llama.cpp](https://github.com/ggerganov/llama.cpp)**
These two models are known to work well:
* https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF
* https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF (recommended)
To ease the installation process, use the `setup` script that will download both
the embedding and the LLM model and place them in the correct location (under `models` folder):
```bash
poetry run python scripts/setup
```
If you are ok with CPU execution, you can skip the rest of this section.
As stated before, llama.cpp is required and in
particular [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
is used.
> It's highly encouraged that you fully read llama-cpp and llama-cpp-python documentation relevant to your platform.
> Running into installation issues is very likely, and you'll need to troubleshoot them yourself.
#### Customizing low level parameters
Currently not all the parameters of llama-cpp and llama-cpp-python are available at PrivateGPT's `settings.yaml` file. In case you need to customize parameters such as the number of layers loaded into the GPU, you might change these at the `llm_component.py` file under the `private_gpt/components/llm/llm_component.py`. If you are getting an out of memory error, you might also try a smaller model or stick to the proposed recommended models, instead of custom tuning the parameters.
#### OSX GPU support
You will need to build [llama.cpp](https://github.com/ggerganov/llama.cpp) with
metal support. To do that run:
```bash
CMAKE_ARGS="-DLLAMA_METAL=on" pip install --force-reinstall --no-cache-dir llama-cpp-python
```
#### Windows NVIDIA GPU support
Windows GPU support is done through CUDA.
Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
dependencies.
Some tips to get it working with an NVIDIA card and CUDA (Tested on Windows 10 with CUDA 11.5 RTX 3070):
* Install latest VS2022 (and build tools) https://visualstudio.microsoft.com/vs/community/
* Install CUDA toolkit https://developer.nvidia.com/cuda-downloads
* Verify your installation is correct by running `nvcc --version` and `nvidia-smi`, ensure your CUDA version is up to
date and your GPU is detected.
* [Optional] Install CMake to troubleshoot building issues by compiling llama.cpp directly https://cmake.org/download/
If you have all required dependencies properly configured running the
following powershell command should succeed.
```powershell
$env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
```
If your installation was correct, you should see a message similar to the following next
time you start the server `BLAS = 1`.
```
llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
```
Note that llama.cpp offloads matrix calculations to the GPU but the performance is
still hit heavily due to latency between CPU and GPU communication. You might need to tweak
batch sizes and other parameters to get the best performance for your particular system.
#### Linux NVIDIA GPU support and Windows-WSL
Linux GPU support is done through CUDA.
Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
external
dependencies.
Some tips:
* Make sure you have an up-to-date C++ compiler
* Install CUDA toolkit https://developer.nvidia.com/cuda-downloads
* Verify your installation is correct by running `nvcc --version` and `nvidia-smi`, ensure your CUDA version is up to
date and your GPU is detected.
After that running the following command in the repository will install llama.cpp with GPU support:
`
CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
`
If your installation was correct, you should see a message similar to the following next
time you start the server `BLAS = 1`.
```
llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
```
#### Vectorstores
PrivateGPT supports [Chroma](https://www.trychroma.com/), [Qdrant](https://qdrant.tech/) as vectorstore providers. Chroma being the default.
To enable Qdrant, set the `vectorstore.database` property in the `settings.yaml` file to `qdrant` and install the `qdrant` extra.
```bash
poetry install --extras qdrant
```
By default Qdrant tries to connect to an instance at `http://localhost:3000`.
Qdrant settings can be configured by setting values to the `qdrant` property in the `settings.yaml` file.
The available configuration options are:
| Field | Description |
|--------------|-------------|
| location | If `:memory:` - use in-memory Qdrant instance.<br>If `str` - use it as a `url` parameter.|
| url | Either host or str of 'Optional[scheme], host, Optional[port], Optional[prefix]'.<br> Eg. `http://localhost:6333` |
| port | Port of the REST API interface. Default: `6333` |
| grpc_port | Port of the gRPC interface. Default: `6334` |
| prefer_grpc | If `true` - use gRPC interface whenever possible in custom methods. |
| https | If `true` - use HTTPS(SSL) protocol.|
| api_key | API key for authentication in Qdrant Cloud.|
| prefix | If set, add `prefix` to the REST URL path.<br>Example: `service/v1` will result in `http://localhost:6333/service/v1/{qdrant-endpoint}` for REST API.|
| timeout | Timeout for REST and gRPC API requests.<br>Default: 5.0 seconds for REST and unlimited for gRPC |
| host | Host name of Qdrant service. If url and host are not set, defaults to 'localhost'.|
| path | Persistence path for QdrantLocal. Eg. `local_data/private_gpt/qdrant`|
| force_disable_check_same_thread | Force disable check_same_thread for QdrantLocal sqlite connection.|
#### Known issues and Troubleshooting
Execution of LLMs locally still has a lot of sharp edges, specially when running on non Linux platforms.
You might encounter several issues:
* Performance: RAM or VRAM usage is very high, your computer might experience slowdowns or even crashes.
* GPU Virtualization on Windows and OSX: Simply not possible with docker desktop, you have to run the server directly on
the host.
* Building errors: Some of PrivateGPT dependencies need to build native code, and they might fail on some platforms.
Most likely you are missing some dev tools in your machine (updated C++ compiler, CUDA is not on PATH, etc.).
If you encounter any of these issues, please open an issue and we'll try to help.
#### Troubleshooting: C++ Compiler
If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++
compiler on your computer.
**For Windows 10/11**
To install a C++ compiler on Windows 10/11, follow these steps:
1. Install Visual Studio 2022.
2. Make sure the following components are selected:
* Universal Windows Platform development
* C++ CMake tools for Windows
3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/).
4. Run the installer and select the `gcc` component.
** For OSX **
1. Check if you have a C++ compiler installed, Xcode might have done it for you. for example running `gcc`.
2. If not, you can install clang or gcc with homebrew `brew install gcc`
#### Troubleshooting: Mac Running Intel
When running a Mac with Intel hardware (not M1), you may run into _clang: error: the clang compiler does not support '
-march=native'_ during pip install.
If so set your archflags during pip install. eg: _ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt_
## Running the Server
After following the installation steps you should be ready to go. Here are some common run setups:
### Running 100% locally
Make sure you have followed the *Local LLM requirements* section before moving on.
This command will start PrivateGPT using the `settings.yaml` (default profile) together with the `settings-local.yaml`
configuration files. By default, it will enable both the API and the Gradio UI. Run:
```
PGPT_PROFILES=local make run
```
or
```
PGPT_PROFILES=local poetry run python -m private_gpt
```
When the server is started it will print a log *Application startup complete*.
Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API
using Swagger UI.
### Local server using OpenAI as LLM
If you cannot run a local model (because you don't have a GPU, for example) or for testing purposes, you may
decide to run PrivateGPT using OpenAI as the LLM.
In order to do so, create a profile `settings-openai.yaml` with the following contents:
```yaml
llm:
mode: openai
openai:
api_key: <your_openai_api_key> # You could skip this configuration and use the OPENAI_API_KEY env var instead
```
And run PrivateGPT loading that profile you just created:
```PGPT_PROFILES=openai make run```
or
```PGPT_PROFILES=openai poetry run python -m private_gpt```
> Note this will still use the local Embeddings model, as it is ok to use it on a CPU.
> We'll support using OpenAI embeddings in a future release.
When the server is started it will print a log *Application startup complete*.
Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API.
You'll notice the speed and quality of response is higher, given you are using OpenAI's servers for the heavy
computations.
### Use AWS's Sagemaker
🚧 Under construction 🚧
## Gradio UI user manual
Gradio UI is a ready to use way of testing most of PrivateGPT API functionalities.
![Gradio PrivateGPT](https://lh3.googleusercontent.com/drive-viewer/AK7aPaD_Hc-A8A9ooMe-hPgm_eImgsbxAjb__8nFYj8b_WwzvL1Gy90oAnp1DfhPaN6yGiEHCOXs0r77W1bYHtPzlVwbV7fMsA=s1600)
### Execution Modes
It has 3 modes of execution (you can select in the top-left):
* Query Docs: uses the context from the
ingested documents to answer the questions posted in the chat. It also takes
into account previous chat messages as context.
* Makes use of `/chat/completions` API with `use_context=true` and no
`context_filter`.
* Search in Docs: fast search that returns the 4 most related text
chunks, together with their source document and page.
* Makes use of `/chunks` API with no `context_filter`, `limit=4` and
`prev_next_chunks=0`.
* LLM Chat: simple, non-contextual chat with the LLM. The ingested documents won't
be taken into account, only the previous messages.
* Makes use of `/chat/completions` API with `use_context=false`.
### Document Ingestion
Ingest documents by using the `Upload a File` button. You can check the progress of
the ingestion in the console logs of the server.
The list of ingested files is shown below the button.
If you want to delete the ingested documents, refer to *Reset Local documents
database* section in the documentation.
### Chat
Normal chat interface, self-explanatory ;)
You can check the actual prompt being passed to the LLM by looking at the logs of
the server. We'll add better observability in future releases.
## Deployment options
🚧 We are working on Dockerized deployment guidelines 🚧
## Observability
Basic logs are enabled using LlamaIndex
basic logging (for example ingestion progress or LLM prompts and answers).
🚧 We are working on improved Observability. 🚧
## Ingesting & Managing Documents
🚧 Document Update and Delete are still WIP. 🚧
The ingestion of documents can be done in different ways:
* Using the `/ingest` API
* Using the Gradio UI
* Using the Bulk Local Ingestion functionality (check next section)
### Bulk Local Ingestion
When you are running PrivateGPT in a fully local setup, you can ingest a complete folder for convenience (containing
pdf, text files, etc.)
and optionally watch changes on it with the command:
```bash
make ingest /path/to/folder -- --watch
```
To log the processed and failed files to an additional file, use:
```bash
make ingest /path/to/folder -- --watch --log-file /path/to/log/file.log
```
After ingestion is complete, you should be able to chat with your documents
by navigating to http://localhost:8001 and using the option `Query documents`,
or using the completions / chat API.
### Reset Local documents database
When running in a local setup, you can remove all ingested documents by simply
deleting all contents of `local_data` folder (except .gitignore).
To simplify this process, you can use the command:
```bash
make wipe
```
## API
As explained in the introduction, the API contains high level APIs (ingestion and chat/completions) and low level APIs
(embeddings and chunk retrieval). In this section the different specific API calls are explained.

View File

@@ -1,22 +0,0 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>PrivateGPT Docs</title>
<!-- needed for adaptive design -->
<meta name="viewport" content="width=device-width, initial-scale=1">
<link href="https://fonts.googleapis.com/css?family=Montserrat:300,400,700|Roboto:300,400,700" rel="stylesheet">
<link rel="shortcut icon" href="https://fastapi.tiangolo.com/img/favicon.png">
<!-- ReDoc doesn't change outer page styles -->
<style>
body {
margin: 0;
padding: 0;
}
</style>
</head>
<body>
<noscript> ReDoc requires Javascript to function. Please enable it to browse the documentation. </noscript>
<redoc spec-url="/openapi.json"></redoc>
<script src="https://cdn.jsdelivr.net/npm/redoc@next/bundles/redoc.standalone.js"></script>
</body>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.6 KiB

File diff suppressed because one or more lines are too long

View File

@@ -1,4 +1,4 @@
# Documentation of privateGPT
# Documentation of PrivateGPT
The documentation of this project is being rendered thanks to [fern](https://github.com/fern-api/fern).

View File

@@ -30,18 +30,20 @@ navigation:
layout:
- section: Welcome
contents:
- page: Welcome
- page: Introduction
path: ./docs/pages/overview/welcome.mdx
- page: Quickstart
path: ./docs/pages/overview/quickstart.mdx
# How to install privateGPT, with FAQ and troubleshooting
# How to install PrivateGPT, with FAQ and troubleshooting
- tab: installation
layout:
- section: Getting started
contents:
- page: Main Concepts
path: ./docs/pages/installation/concepts.mdx
- page: Installation
path: ./docs/pages/installation/installation.mdx
# Manual of privateGPT: how to use it and configure it
- page: Troubleshooting
path: ./docs/pages/installation/troubleshooting.mdx
# Manual of PrivateGPT: how to use it and configure it
- tab: manual
layout:
- section: General configuration
@@ -58,23 +60,31 @@ navigation:
contents:
- page: Vector Stores
path: ./docs/pages/manual/vectordb.mdx
- page: Node Stores
path: ./docs/pages/manual/nodestore.mdx
- section: Advanced Setup
contents:
- page: LLM Backends
path: ./docs/pages/manual/llms.mdx
- page: Reranking
path: ./docs/pages/manual/reranker.mdx
- section: User Interface
contents:
- page: User interface (Gradio) Manual
path: ./docs/pages/manual/ui.mdx
# Small code snippet or example of usage to help users
- page: Gradio Manual
path: ./docs/pages/ui/gradio.mdx
- page: Alternatives
path: ./docs/pages/ui/alternatives.mdx
- tab: recipes
layout:
- section: Choice of LLM
- section: Getting started
contents:
# TODO: add recipes
- page: List of LLMs
path: ./docs/pages/recipes/list-llm.mdx
# More advanced usage of privateGPT, by API
- page: Quickstart
path: ./docs/pages/recipes/quickstart.mdx
- section: General use cases
contents:
- page: Summarize
path: ./docs/pages/recipes/summarize.mdx
# More advanced usage of PrivateGPT, by API
- tab: api-reference
layout:
- section: Overview
@@ -88,12 +98,11 @@ navigation:
# Definition of the navbar, will be displayed in the top right corner.
# `type:primary` is always displayed at the most right side of the navbar
navbar-links:
- type: secondary
text: Github
url: "https://github.com/imartinez/privateGPT"
- type: secondary
text: Contact us
url: "mailto:hello@zylon.ai"
- type: github
value: "https://github.com/zylon-ai/private-gpt"
- type: primary
text: Join the Discord
url: https://discord.com/invite/bK6mRVpErU

Binary file not shown.

Before

Width:  |  Height:  |  Size: 212 KiB

After

Width:  |  Height:  |  Size: 154 KiB

View File

@@ -1 +1,14 @@
# API Reference
The API is divided in two logical blocks:
1. High-level API, abstracting all the complexity of a RAG (Retrieval Augmented Generation) pipeline implementation:
- Ingestion of documents: internally managing document parsing, splitting, metadata extraction,
embedding generation and storage.
- Chat & Completions using context from ingested documents: abstracting the retrieval of context, the prompt
engineering and the response generation.
2. Low-level API, allowing advanced users to implement their own complex pipelines:
- Embeddings generation: based on a piece of text.
- Contextual chunks retrieval: given a query, returns the most relevant chunks of text from the ingested
documents.

View File

@@ -8,14 +8,14 @@ The clients are kept up to date automatically, so we encourage you to use the la
<Cards>
<Card
title="Node.js/TypeScript"
title="TypeScript"
icon="fa-brands fa-node"
href="https://github.com/imartinez/privateGPT-typescript"
href="https://github.com/zylon-ai/privategpt-ts"
/>
<Card
title="Python"
icon="fa-brands fa-python"
href="https://github.com/imartinez/privateGPT-python"
href="https://github.com/zylon-ai/pgpt-python"
/>
<br />
</Cards>
@@ -24,14 +24,14 @@ The clients are kept up to date automatically, so we encourage you to use the la
<Cards>
<Card
title="Java"
title="Java - WIP"
icon="fa-brands fa-java"
href="https://github.com/imartinez/privateGPT-java"
href="https://github.com/zylon-ai/private-gpt-java"
/>
<Card
title="Go"
title="Go - WIP"
icon="fa-brands fa-golang"
href="https://github.com/imartinez/privateGPT-go"
href="https://github.com/zylon-ai/private-gpt-go"
/>
</Cards>

View File

@@ -0,0 +1,67 @@
PrivateGPT is a service that wraps a set of AI RAG primitives in a comprehensive set of APIs providing a private, secure, customizable and easy to use GenAI development framework.
It uses FastAPI and LLamaIndex as its core frameworks. Those can be customized by changing the codebase itself.
It supports a variety of LLM providers, embeddings providers, and vector stores, both local and remote. Those can be easily changed without changing the codebase.
# Different Setups support
## Setup configurations available
You get to decide the setup for these 3 main components:
- **LLM**: the large language model provider used for inference. It can be local, or remote, or even OpenAI.
- **Embeddings**: the embeddings provider used to encode the input, the documents and the users' queries. Same as the LLM, it can be local, or remote, or even OpenAI.
- **Vector store**: the store used to index and retrieve the documents.
There is an extra component that can be enabled or disabled: the UI. It is a Gradio UI that allows to interact with the API in a more user-friendly way.
<Callout intent = "warning">
A working **Gradio UI client** is provided to test the API, together with a set of useful tools such as bulk
model download script, ingestion script, documents folder watch, etc. Please refer to the [UI alternatives](/manual/user-interface/alternatives) page for more UI alternatives.
</Callout>
### Setups and Dependencies
Your setup will be the combination of the different options available. You'll find recommended setups in the [installation](./installation) section.
PrivateGPT uses poetry to manage its dependencies. You can install the dependencies for the different setups by running `poetry install --extras "<extra1> <extra2>..."`.
Extras are the different options available for each component. For example, to install the dependencies for a a local setup with UI and qdrant as vector database, Ollama as LLM and local embeddings, you would run:
```bash
poetry install --extras "ui vector-stores-qdrant llms-ollama embeddings-ollama"
```
Refer to the [installation](./installation) section for more details.
### Setups and Configuration
PrivateGPT uses yaml to define its configuration in files named `settings-<profile>.yaml`.
Different configuration files can be created in the root directory of the project.
PrivateGPT will load the configuration at startup from the profile specified in the `PGPT_PROFILES` environment variable.
For example, running:
```bash
PGPT_PROFILES=ollama make run
```
will load the configuration from `settings.yaml` and `settings-ollama.yaml`.
- `settings.yaml` is always loaded and contains the default configuration.
- `settings-ollama.yaml` is loaded if the `ollama` profile is specified in the `PGPT_PROFILES` environment variable. It can override configuration from the default `settings.yaml`
## About Fully Local Setups
In order to run PrivateGPT in a fully local setup, you will need to run the LLM, Embeddings and Vector Store locally.
### LLM
For local LLM there are two options:
* (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
* You can use the 'llms-llama-cpp' option in PrivateGPT, which will use LlamaCPP. It works great on Mac with Metal most of the times (leverages Metal GPU), but it can be tricky in certain Linux and Windows distributions, depending on the GPU. In the installation document you'll find guides and troubleshooting.
In order for LlamaCPP powered LLM to work (the second option), you need to download the LLM model to the `models` folder. You can do so by running the `setup` script:
```bash
poetry run python scripts/setup
```
### Embeddings
For local Embeddings there are two options:
* (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
* You can use the 'embeddings-huggingface' option in PrivateGPT, which will use HuggingFace.
In order for HuggingFace LLM to work (the second option), you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
```bash
poetry run python scripts/setup
```
### Vector stores
The vector stores supported (Qdrant, Milvus, ChromaDB and Postgres) run locally by default.

View File

@@ -1,113 +1,275 @@
## Installation and Settings
It is important that you review the [Main Concepts](../concepts) section to understand the different components of PrivateGPT and how they interact with each other.
### Base requirements to run PrivateGPT
* Git clone PrivateGPT repository, and navigate to it:
## Base requirements to run PrivateGPT
### 1. Clone the PrivateGPT Repository
Clone the repository and navigate to it:
```bash
git clone https://github.com/imartinez/privateGPT
cd privateGPT
git clone https://github.com/zylon-ai/private-gpt
cd private-gpt
```
* Install Python `3.11` (*if you do not have it already*). Ideally through a python version manager like `pyenv`.
Python 3.12 should work too. Earlier python versions are not supported.
* osx/linux: [pyenv](https://github.com/pyenv/pyenv)
* windows: [pyenv-win](https://github.com/pyenv-win/pyenv-win)
### 2. Install Python 3.11
If you do not have Python 3.11 installed, install it using a Python version manager like `pyenv`. Earlier Python versions are not supported.
#### macOS/Linux
Install and set Python 3.11 using [pyenv](https://github.com/pyenv/pyenv):
```bash
pyenv install 3.11
pyenv local 3.11
```
#### Windows
Install and set Python 3.11 using [pyenv-win](https://github.com/pyenv-win/pyenv-win):
```bash
pyenv install 3.11
pyenv local 3.11
```
* Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) for dependency management:
### 3. Install `Poetry`
Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) for dependency management:
Follow the instructions on the official Poetry website to install it.
* Have a valid C++ compiler like gcc. See [Troubleshooting: C++ Compiler](#troubleshooting-c-compiler) for more details.
* Install `make` for scripts:
* osx: (Using homebrew): `brew install make`
* windows: (Using chocolatey) `choco install make`
### Install dependencies
Install the dependencies:
```bash
poetry install --with ui
```
Verify everything is working by running `make run` (or `poetry run python -m private_gpt`) and navigate to
http://localhost:8001. You should see a [Gradio UI](https://gradio.app/) **configured with a mock LLM** that will
echo back the input. Below we'll see how to configure a real LLM.
### Settings
<Callout intent="info">
The default settings of PrivateGPT should work out-of-the-box for a 100% local setup. **However**, as is, it runs exclusively on your CPU.
Skip this section if you just want to test PrivateGPT locally, and come back later to learn about more configuration options (and have better performances).
<Callout intent="warning">
A bug exists in Poetry versions 1.7.0 and earlier. We strongly recommend upgrading to a tested version.
To upgrade Poetry to latest tested version, run `poetry self update 1.8.3` after installing it.
</Callout>
<br />
### Local LLM requirements
Install extra dependencies for local execution:
### 4. Optional: Install `make`
To run various scripts, you need to install `make`. Follow the instructions for your operating system:
#### macOS
(Using Homebrew):
```bash
poetry install --with local
brew install make
```
#### Windows
(Using Chocolatey):
```bash
choco install make
```
For PrivateGPT to run fully locally GPU acceleration is required
(CPU execution is possible, but very slow), however,
typical Macbook laptops or window desktops with mid-range GPUs lack VRAM to run
even the smallest LLMs. For that reason
**local execution is only supported for models compatible with [llama.cpp](https://github.com/ggerganov/llama.cpp)**
## Install and Run Your Desired Setup
These two models are known to work well:
PrivateGPT allows customization of the setup, from fully local to cloud-based, by deciding the modules to use. To install only the required dependencies, PrivateGPT offers different `extras` that can be combined during the installation process:
* https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF
* https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF (recommended)
```bash
poetry install --extras "<extra1> <extra2>..."
```
Where `<extra>` can be any of the following options described below.
To ease the installation process, use the `setup` script that will download both
the embedding and the LLM model and place them in the correct location (under `models` folder):
### Available Modules
You need to choose one option per category (LLM, Embeddings, Vector Stores, UI). Below are the tables listing the available options for each category.
#### LLM
| **Option** | **Description** | **Extra** |
|--------------|------------------------------------------------------------------------|---------------------|
| **ollama** | Adds support for Ollama LLM, requires Ollama running locally | llms-ollama |
| llama-cpp | Adds support for local LLM using LlamaCPP | llms-llama-cpp |
| sagemaker | Adds support for Amazon Sagemaker LLM, requires Sagemaker endpoints | llms-sagemaker |
| openai | Adds support for OpenAI LLM, requires OpenAI API key | llms-openai |
| openailike | Adds support for 3rd party LLM providers compatible with OpenAI's API | llms-openai-like |
| azopenai | Adds support for Azure OpenAI LLM, requires Azure endpoints | llms-azopenai |
| gemini | Adds support for Gemini LLM, requires Gemini API key | llms-gemini |
#### Embeddings
| **Option** | **Description** | **Extra** |
|------------------|--------------------------------------------------------------------------------|-------------------------|
| **ollama** | Adds support for Ollama Embeddings, requires Ollama running locally | embeddings-ollama |
| huggingface | Adds support for local Embeddings using HuggingFace | embeddings-huggingface |
| openai | Adds support for OpenAI Embeddings, requires OpenAI API key | embeddings-openai |
| sagemaker | Adds support for Amazon Sagemaker Embeddings, requires Sagemaker endpoints | embeddings-sagemaker |
| azopenai | Adds support for Azure OpenAI Embeddings, requires Azure endpoints | embeddings-azopenai |
| gemini | Adds support for Gemini Embeddings, requires Gemini API key | embeddings-gemini |
#### Vector Stores
| **Option** | **Description** | **Extra** |
|------------------|-----------------------------------------|-------------------------|
| **qdrant** | Adds support for Qdrant vector store | vector-stores-qdrant |
| milvus | Adds support for Milvus vector store | vector-stores-milvus |
| chroma | Adds support for Chroma DB vector store | vector-stores-chroma |
| postgres | Adds support for Postgres vector store | vector-stores-postgres |
| clickhouse | Adds support for Clickhouse vector store| vector-stores-clickhouse|
#### UI
| **Option** | **Description** | **Extra** |
|--------------|------------------------------------------|-----------|
| Gradio | Adds support for UI using Gradio | ui |
<Callout intent = "warning">
A working **Gradio UI client** is provided to test the API, together with a set of useful tools such as bulk
model download script, ingestion script, documents folder watch, etc. Please refer to the [UI alternatives](/manual/user-interface/alternatives) page for more UI alternatives.
</Callout>
## Recommended Setups
There are just some examples of recommended setups. You can mix and match the different options to fit your needs.
You'll find more information in the Manual section of the documentation.
> **Important for Windows**: In the examples below or how to run PrivateGPT with `make run`, `PGPT_PROFILES` env var is being set inline following Unix command line syntax (works on MacOS and Linux).
If you are using Windows, you'll need to set the env var in a different way, for example:
```powershell
# Powershell
$env:PGPT_PROFILES="ollama"
make run
```
or
```cmd
# CMD
set PGPT_PROFILES=ollama
make run
```
Refer to the [troubleshooting](./troubleshooting) section for specific issues you might encounter.
### Local, Ollama-powered setup - RECOMMENDED
**The easiest way to run PrivateGPT fully locally** is to depend on Ollama for the LLM. Ollama provides local LLM and Embeddings super easy to install and use, abstracting the complexity of GPU support. It's the recommended setup for local development.
Go to [ollama.ai](https://ollama.ai/) and follow the instructions to install Ollama on your machine.
After the installation, make sure the Ollama desktop app is closed.
Now, start Ollama service (it will start a local inference server, serving both the LLM and the Embeddings):
```bash
ollama serve
```
Install the models to be used, the default settings-ollama.yaml is configured to user llama3.1 8b LLM (~4GB) and nomic-embed-text Embeddings (~275MB)
By default, PGPT will automatically pull models as needed. This behavior can be changed by modifying the `ollama.autopull_models` property.
In any case, if you want to manually pull models, run the following commands:
```bash
ollama pull llama3.1
ollama pull nomic-embed-text
```
Once done, on a different terminal, you can install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
```
Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.
```bash
PGPT_PROFILES=ollama make run
```
PrivateGPT will use the already existing `settings-ollama.yaml` settings file, which is already configured to use Ollama LLM and Embeddings, and Qdrant. Review it and adapt it to your needs (different models, different Ollama port, etc.)
The UI will be available at http://localhost:8001
### Private, Sagemaker-powered setup
If you need more performance, you can run a version of PrivateGPT that relies on powerful AWS Sagemaker machines to serve the LLM and Embeddings.
You need to have access to sagemaker inference endpoints for the LLM and / or the embeddings, and have AWS credentials properly configured.
Edit the `settings-sagemaker.yaml` file to include the correct Sagemaker endpoints.
Then, install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-sagemaker embeddings-sagemaker vector-stores-qdrant"
```
Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.
```bash
PGPT_PROFILES=sagemaker make run
```
PrivateGPT will use the already existing `settings-sagemaker.yaml` settings file, which is already configured to use Sagemaker LLM and Embeddings endpoints, and Qdrant.
The UI will be available at http://localhost:8001
### Non-Private, OpenAI-powered test setup
If you want to test PrivateGPT with OpenAI's LLM and Embeddings -taking into account your data is going to OpenAI!- you can run the following command:
You need an OPENAI API key to run this setup.
Edit the `settings-openai.yaml` file to include the correct API KEY. Never commit it! It's a secret! As an alternative to editing `settings-openai.yaml`, you can just set the env var OPENAI_API_KEY.
Then, install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-openai embeddings-openai vector-stores-qdrant"
```
Once installed, you can run PrivateGPT.
```bash
PGPT_PROFILES=openai make run
```
PrivateGPT will use the already existing `settings-openai.yaml` settings file, which is already configured to use OpenAI LLM and Embeddings endpoints, and Qdrant.
The UI will be available at http://localhost:8001
### Non-Private, Azure OpenAI-powered test setup
If you want to test PrivateGPT with Azure OpenAI's LLM and Embeddings -taking into account your data is going to Azure OpenAI!- you can run the following command:
You need to have access to Azure OpenAI inference endpoints for the LLM and / or the embeddings, and have Azure OpenAI credentials properly configured.
Edit the `settings-azopenai.yaml` file to include the correct Azure OpenAI endpoints.
Then, install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-azopenai embeddings-azopenai vector-stores-qdrant"
```
Once installed, you can run PrivateGPT.
```bash
PGPT_PROFILES=azopenai make run
```
PrivateGPT will use the already existing `settings-azopenai.yaml` settings file, which is already configured to use Azure OpenAI LLM and Embeddings endpoints, and Qdrant.
The UI will be available at http://localhost:8001
### Local, Llama-CPP powered setup
If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command:
```bash
poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"
```
In order for local LLM and embeddings to work, you need to download the models to the `models` folder. You can do so by running the `setup` script:
```bash
poetry run python scripts/setup
```
If you are ok with CPU execution, you can skip the rest of this section.
Once installed, you can run PrivateGPT with the following command:
As stated before, llama.cpp is required and in
```bash
PGPT_PROFILES=local make run
```
PrivateGPT will load the already existing `settings-local.yaml` file, which is already configured to use LlamaCPP LLM, HuggingFace embeddings and Qdrant.
The UI will be available at http://localhost:8001
#### Llama-CPP support
For PrivateGPT to run fully locally without Ollama, Llama.cpp is required and in
particular [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
is used.
You'll need to have a valid C++ compiler like gcc installed. See [Troubleshooting: C++ Compiler](#troubleshooting-c-compiler) for more details.
> It's highly encouraged that you fully read llama-cpp and llama-cpp-python documentation relevant to your platform.
> Running into installation issues is very likely, and you'll need to troubleshoot them yourself.
#### Customizing low level parameters
Currently, not all the parameters of `llama.cpp` and `llama-cpp-python` are available at PrivateGPT's `settings.yaml` file.
In case you need to customize parameters such as the number of layers loaded into the GPU, you might change
these at the `llm_component.py` file under the `private_gpt/components/llm/llm_component.py`.
##### Available LLM config options
The `llm` section of the settings allows for the following configurations:
- `mode`: how to run your llm
- `max_new_tokens`: this lets you configure the number of new tokens the LLM will generate and add to the context window (by default Llama.cpp uses `256`)
Example:
```yaml
llm:
mode: local
max_new_tokens: 256
```
If you are getting an out of memory error, you might also try a smaller model or stick to the proposed
recommended models, instead of custom tuning the parameters.
#### OSX GPU support
##### Llama-CPP OSX GPU support
You will need to build [llama.cpp](https://github.com/ggerganov/llama.cpp) with metal support.
@@ -127,7 +289,7 @@ More information is available in the documentation of the libraries themselves:
* [llama-cpp-python's documentation](https://llama-cpp-python.readthedocs.io/en/latest/#installation-with-hardware-acceleration)
* [llama.cpp](https://github.com/ggerganov/llama.cpp#build)
#### Windows NVIDIA GPU support
##### Llama-CPP Windows NVIDIA GPU support
Windows GPU support is done through CUDA.
Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
@@ -160,7 +322,7 @@ Note that llama.cpp offloads matrix calculations to the GPU but the performance
still hit heavily due to latency between CPU and GPU communication. You might need to tweak
batch sizes and other parameters to get the best performance for your particular system.
#### Linux NVIDIA GPU support and Windows-WSL
##### Llama-CPP Linux NVIDIA GPU support and Windows-WSL
Linux GPU support is done through CUDA.
Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
@@ -188,7 +350,41 @@ llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, co
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
```
### Known issues and Troubleshooting
##### Llama-CPP Linux AMD GPU support
Linux GPU support is done through ROCm.
Some tips:
* Install ROCm from [quick-start install guide](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html)
* [Install PyTorch for ROCm](https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/install-pytorch.html)
```bash
wget https://repo.radeon.com/rocm/manylinux/rocm-rel-6.0/torch-2.1.1%2Brocm6.0-cp311-cp311-linux_x86_64.whl
poetry run pip install --force-reinstall --no-cache-dir torch-2.1.1+rocm6.0-cp311-cp311-linux_x86_64.whl
```
* Install bitsandbytes for ROCm
```bash
PYTORCH_ROCM_ARCH=gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100,gfx1101,gfx940,gfx941,gfx942
BITSANDBYTES_VERSION=62353b0200b8557026c176e74ac48b84b953a854
git clone https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6
cd bitsandbytes-rocm-5.6
git checkout ${BITSANDBYTES_VERSION}
make hip ROCM_TARGET=${PYTORCH_ROCM_ARCH} ROCM_HOME=/opt/rocm/
pip install . --extra-index-url https://download.pytorch.org/whl/nightly
```
After that running the following command in the repository will install llama.cpp with GPU support:
```bash
LLAMA_CPP_PYTHON_VERSION=0.2.56
DAMDGPU_TARGETS=gfx900;gfx906;gfx908;gfx90a;gfx1030;gfx1100;gfx1101;gfx940;gfx941;gfx942
CMAKE_ARGS="-DLLAMA_HIPBLAS=ON -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DAMDGPU_TARGETS=${DAMDGPU_TARGETS}" poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python==${LLAMA_CPP_PYTHON_VERSION}
```
If your installation was correct, you should see a message similar to the following next time you start the server `BLAS = 1`.
```
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
```
##### Llama-CPP Known issues and Troubleshooting
Execution of LLMs locally still has a lot of sharp edges, specially when running on non Linux platforms.
You might encounter several issues:
@@ -205,7 +401,7 @@ If, during your installation, something does not go as planned, retry in *verbos
For example, when installing packages with `pip install`, you can add the option `-vvv` to show the details of the installation.
#### Troubleshooting: C++ Compiler
##### Llama-CPP Troubleshooting: C++ Compiler
If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++
compiler on your computer.
@@ -227,9 +423,9 @@ To install a C++ compiler on Windows 10/11, follow these steps:
Store and search for Xcode and install it. **Or** you can install the command line tools by running `xcode-select --install`.
2. If not, you can install clang or gcc with homebrew `brew install gcc`
#### Troubleshooting: Mac Running Intel
##### Llama-CPP Troubleshooting: Mac Running Intel
When running a Mac with Intel hardware (not M1), you may run into _clang: error: the clang compiler does not support '
-march=native'_ during pip install.
If so set your archflags during pip install. eg: _ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt_
If so set your archflags during pip install. eg: _ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt_

View File

@@ -0,0 +1,49 @@
# Downloading Gated and Private Models
Many models are gated or private, requiring special access to use them. Follow these steps to gain access and set up your environment for using these models.
## Accessing Gated Models
1. **Request Access:**
Follow the instructions provided [here](https://huggingface.co/docs/hub/en/models-gated) to request access to the gated model.
2. **Generate a Token:**
Once you have access, generate a token by following the instructions [here](https://huggingface.co/docs/hub/en/security-tokens).
3. **Set the Token:**
Add the generated token to your `settings.yaml` file:
```yaml
huggingface:
access_token: <your-token>
```
Alternatively, set the `HF_TOKEN` environment variable:
```bash
export HF_TOKEN=<your-token>
```
# Tokenizer Setup
PrivateGPT uses the `AutoTokenizer` library to tokenize input text accurately. It connects to HuggingFace's API to download the appropriate tokenizer for the specified model.
## Configuring the Tokenizer
1. **Specify the Model:**
In your `settings.yaml` file, specify the model you want to use:
```yaml
llm:
tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct
```
2. **Set Access Token for Gated Models:**
If you are using a gated model, ensure the `access_token` is set as mentioned in the previous section.
This configuration ensures that PrivateGPT can download and use the correct tokenizer for the model you are working with.
# Embedding dimensions mismatch
If you encounter an error message like `Embedding dimensions mismatch`, it is likely due to the embedding model and
current vector dimension mismatch. To resolve this issue, ensure that the model and the input data have the same vector dimensions.
By default, PrivateGPT uses `nomic-embed-text` embeddings, which have a vector dimension of 768.
If you are using a different embedding model, ensure that the vector dimensions match the model's output.
<Callout intent = "warning">
In versions below to 0.6.0, the default embedding model was `BAAI/bge-small-en-v1.5` in `huggingface` setup.
If you plan to reuse the old generated embeddings, you need to update the `settings.yaml` file to use the correct embedding model:
```yaml
huggingface:
embedding_hf_model_name: BAAI/bge-small-en-v1.5
embedding:
embed_dim: 384
```
</Callout>

View File

@@ -8,6 +8,14 @@ The ingestion of documents can be done in different ways:
## Bulk Local Ingestion
You will need to activate `data.local_ingestion.enabled` in your setting file to use this feature. Additionally,
it is probably a good idea to set `data.local_ingestion.allow_ingest_from` to specify which folders are allowed to be ingested.
<Callout intent = "warning">
Be careful enabling this feature in a production environment, as it can be a security risk, as it allows users to
ingest any local file with permissions.
</Callout>
When you are running PrivateGPT in a fully local setup, you can ingest a complete folder for convenience (containing
pdf, text files, etc.)
and optionally watch changes on it with the command:
@@ -62,6 +70,7 @@ The following ingestion mode exist:
* `simple`: historic behavior, ingest one document at a time, sequentially
* `batch`: read, parse, and embed multiple documents using batches (batch read, and then batch parse, and then batch embed)
* `parallel`: read, parse, and embed multiple documents in parallel. This is the fastest ingestion mode for local setup.
* `pipeline`: Alternative to parallel.
To change the ingestion mode, you can use the `embedding.ingest_mode` configuration value. The default value is `simple`.
To configure the number of workers used for parallel or batched ingestion, you can use
@@ -92,7 +101,7 @@ time PGPT_PROFILES=mock python ./scripts/ingest_folder.py ~/my-dir/to-ingest/
## Supported file formats
privateGPT by default supports all the file formats that contains clear text (for example, `.txt` files, `.html`, etc.).
PrivateGPT by default supports all the file formats that contains clear text (for example, `.txt` files, `.html`, etc.).
However, these text based file formats as only considered as text files, and are not pre-processed in any other way.
It also supports the following file formats:
@@ -114,11 +123,15 @@ It also supports the following file formats:
* `.ipynb`
* `.json`
**Please note the following nuance**: while `privateGPT` supports these file formats, it **might** require additional
<Callout intent = "info">
While `PrivateGPT` supports these file formats, it **might** require additional
dependencies to be installed in your python's virtual environment.
For example, if you try to ingest `.epub` files, `privateGPT` might fail to do it, and will instead display an
For example, if you try to ingest `.epub` files, `PrivateGPT` might fail to do it, and will instead display an
explanatory error asking you to download the necessary dependencies to install this file format.
</Callout>
<Callout intent = "info">
**Other file formats might work**, but they will be considered as plain text
files (in other words, they will be ingested as `.txt` files).
files (in other words, they will be ingested as `.txt` files).
</Callout>

View File

@@ -25,6 +25,30 @@ When the server is started it will print a log *Application startup complete*.
Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API
using Swagger UI.
#### Customizing low level parameters
Currently, not all the parameters of `llama.cpp` and `llama-cpp-python` are available at PrivateGPT's `settings.yaml` file.
In case you need to customize parameters such as the number of layers loaded into the GPU, you might change
these at the `llm_component.py` file under the `private_gpt/components/llm/llm_component.py`.
##### Available LLM config options
The `llm` section of the settings allows for the following configurations:
- `mode`: how to run your llm
- `max_new_tokens`: this lets you configure the number of new tokens the LLM will generate and add to the context window (by default Llama.cpp uses `256`)
Example:
```yaml
llm:
mode: local
max_new_tokens: 256
```
If you are getting an out of memory error, you might also try a smaller model or stick to the proposed
recommended models, instead of custom tuning the parameters.
### Using OpenAI
If you cannot run a local model (because you don't have a GPU, for example) or for testing purposes, you may
@@ -37,7 +61,10 @@ llm:
mode: openai
openai:
api_base: <openai-api-base-url> # Defaults to https://api.openai.com/v1
api_key: <your_openai_api_key> # You could skip this configuration and use the OPENAI_API_KEY env var instead
model: <openai_model_to_use> # Optional model to use. Default is "gpt-3.5-turbo"
# Note: Open AI Models are listed here: https://platform.openai.com/docs/models
```
And run PrivateGPT loading that profile you just created:
@@ -53,6 +80,61 @@ Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:80
You'll notice the speed and quality of response is higher, given you are using OpenAI's servers for the heavy
computations.
### Using OpenAI compatible API
Many tools, including [LocalAI](https://localai.io/) and [vLLM](https://docs.vllm.ai/en/latest/),
support serving local models with an OpenAI compatible API. Even when overriding the `api_base`,
using the `openai` mode doesn't allow you to use custom models. Instead, you should use the `openailike` mode:
```yaml
llm:
mode: openailike
```
This mode uses the same settings as the `openai` mode.
As an example, you can follow the [vLLM quickstart guide](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server)
to run an OpenAI compatible server. Then, you can run PrivateGPT using the `settings-vllm.yaml` profile:
`PGPT_PROFILES=vllm make run`
### Using Azure OpenAI
If you cannot run a local model (because you don't have a GPU, for example) or for testing purposes, you may
decide to run PrivateGPT using Azure OpenAI as the LLM and Embeddings model.
In order to do so, create a profile `settings-azopenai.yaml` with the following contents:
```yaml
llm:
mode: azopenai
embedding:
mode: azopenai
azopenai:
api_key: <your_azopenai_api_key> # You could skip this configuration and use the AZ_OPENAI_API_KEY env var instead
azure_endpoint: <your_azopenai_endpoint> # You could skip this configuration and use the AZ_OPENAI_ENDPOINT env var instead
api_version: <api_version> # The API version to use. Default is "2023_05_15"
embedding_deployment_name: <your_embedding_deployment_name> # You could skip this configuration and use the AZ_OPENAI_EMBEDDING_DEPLOYMENT_NAME env var instead
embedding_model: <openai_embeddings_to_use> # Optional model to use. Default is "text-embedding-ada-002"
llm_deployment_name: <your_model_deployment_name> # You could skip this configuration and use the AZ_OPENAI_LLM_DEPLOYMENT_NAME env var instead
llm_model: <openai_model_to_use> # Optional model to use. Default is "gpt-35-turbo"
```
And run PrivateGPT loading that profile you just created:
`PGPT_PROFILES=azopenai make run`
or
`PGPT_PROFILES=azopenai poetry run python -m private_gpt`
When the server is started it will print a log *Application startup complete*.
Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API.
You'll notice the speed and quality of response is higher, given you are using Azure OpenAI's servers for the heavy
computations.
### Using AWS Sagemaker
For a fully private & performant setup, you can choose to have both your LLM and Embeddings model deployed using Sagemaker.
@@ -80,4 +162,73 @@ or
`PGPT_PROFILES=sagemaker poetry run python -m private_gpt`
When the server is started it will print a log *Application startup complete*.
Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API.
Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API.
### Using Ollama
Another option for a fully private setup is using [Ollama](https://ollama.ai/).
Note: how to deploy Ollama and pull models onto it is out of the scope of this documentation.
In order to do so, create a profile `settings-ollama.yaml` with the following contents:
```yaml
llm:
mode: ollama
ollama:
model: <ollama_model_to_use> # Required Model to use.
# Note: Ollama Models are listed here: https://ollama.ai/library
# Be sure to pull the model to your Ollama server
api_base: <ollama-api-base-url> # Defaults to http://localhost:11434
```
And run PrivateGPT loading that profile you just created:
`PGPT_PROFILES=ollama make run`
or
`PGPT_PROFILES=ollama poetry run python -m private_gpt`
When the server is started it will print a log *Application startup complete*.
Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API.
### Using IPEX-LLM
For a fully private setup on Intel GPUs (such as a local PC with an iGPU, or discrete GPUs like Arc, Flex, and Max), you can use [IPEX-LLM](https://github.com/intel-analytics/ipex-llm).
To deploy Ollama and pull models using IPEX-LLM, please refer to [this guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/ollama_quickstart.html). Then, follow the same steps outlined in the [Using Ollama](#using-ollama) section to create a `settings-ollama.yaml` profile and run the private-GPT server.
### Using Gemini
If you cannot run a local model (because you don't have a GPU, for example) or for testing purposes, you may
decide to run PrivateGPT using Gemini as the LLM and Embeddings model. In addition, you will benefit from
multimodal inputs, such as text and images, in a very large contextual window.
In order to do so, create a profile `settings-gemini.yaml` with the following contents:
```yaml
llm:
mode: gemini
embedding:
mode: gemini
gemini:
api_key: <your_gemini_api_key> # You could skip this configuration and use the GEMINI_API_KEY env var instead
model: <gemini_model_to_use> # Optional model to use. Default is models/gemini-pro"
embedding_model: <gemini_embeddings_to_use> # Optional model to use. Default is "models/embedding-001"
```
And run PrivateGPT loading that profile you just created:
`PGPT_PROFILES=gemini make run`
or
`PGPT_PROFILES=gemini poetry run python -m private_gpt`
When the server is started it will print a log *Application startup complete*.
Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API.

View File

@@ -0,0 +1,66 @@
## NodeStores
PrivateGPT supports **Simple** and [Postgres](https://www.postgresql.org/) providers. Simple being the default.
In order to select one or the other, set the `nodestore.database` property in the `settings.yaml` file to `simple` or `postgres`.
```yaml
nodestore:
database: simple
```
### Simple Document Store
Setting up simple document store: Persist data with in-memory and disk storage.
Enabling the simple document store is an excellent choice for small projects or proofs of concept where you need to persist data while maintaining minimal setup complexity. To get started, set the nodestore.database property in your settings.yaml file as follows:
```yaml
nodestore:
database: simple
```
The beauty of the simple document store is its flexibility and ease of implementation. It provides a solid foundation for managing and retrieving data without the need for complex setup or configuration. The combination of in-memory processing and disk persistence ensures that you can efficiently handle small to medium-sized datasets while maintaining data consistency across runs.
### Postgres Document Store
To enable Postgres, set the `nodestore.database` property in the `settings.yaml` file to `postgres` and install the `storage-nodestore-postgres` extra. Note: Vector Embeddings Storage in Postgres is configured separately
```bash
poetry install --extras storage-nodestore-postgres
```
The available configuration options are:
| Field | Description |
|---------------|-----------------------------------------------------------|
| **host** | The server hosting the Postgres database. Default is `localhost` |
| **port** | The port on which the Postgres database is accessible. Default is `5432` |
| **database** | The specific database to connect to. Default is `postgres` |
| **user** | The username for database access. Default is `postgres` |
| **password** | The password for database access. (Required) |
| **schema_name** | The database schema to use. Default is `private_gpt` |
For example:
```yaml
nodestore:
database: postgres
postgres:
host: localhost
port: 5432
database: postgres
user: postgres
password: <PASSWORD>
schema_name: private_gpt
```
Given the above configuration, Two PostgreSQL tables will be created upon successful connection: one for storing metadata related to the index and another for document data itself.
```
postgres=# \dt private_gpt.*
List of relations
Schema | Name | Type | Owner
-------------+-----------------+-------+--------------
private_gpt | data_docstore | table | postgres
private_gpt | data_indexstore | table | postgres
postgres=#
```

View File

@@ -0,0 +1,36 @@
## Enhancing Response Quality with Reranking
PrivateGPT offers a reranking feature aimed at optimizing response generation by filtering out irrelevant documents, potentially leading to faster response times and enhanced relevance of answers generated by the LLM.
### Enabling Reranking
Document reranking can significantly improve the efficiency and quality of the responses by pre-selecting the most relevant documents before generating an answer. To leverage this feature, ensure that it is enabled in the RAG settings and consider adjusting the parameters to best fit your use case.
#### Additional Requirements
Before enabling reranking, you must install additional dependencies:
```bash
poetry install --extras rerank-sentence-transformers
```
This command installs dependencies for the cross-encoder reranker from sentence-transformers, which is currently the only supported method by PrivateGPT for document reranking.
#### Configuration
To enable and configure reranking, adjust the `rag` section within the `settings.yaml` file. Here are the key settings to consider:
- `similarity_top_k`: Determines the number of documents to initially retrieve and consider for reranking. This value should be larger than `top_n`.
- `rerank`:
- `enabled`: Set to `true` to activate the reranking feature.
- `top_n`: Specifies the number of documents to use in the final answer generation process, chosen from the top-ranked documents provided by `similarity_top_k`.
Example configuration snippet:
```yaml
rag:
similarity_top_k: 10 # Number of documents to retrieve and consider for reranking
rerank:
enabled: true
top_n: 3 # Number of top-ranked documents to use for generating the answer
```

View File

@@ -3,8 +3,8 @@
The configuration of your private GPT server is done thanks to `settings` files (more precisely `settings.yaml`).
These text files are written using the [YAML](https://en.wikipedia.org/wiki/YAML) syntax.
While privateGPT is distributing safe and universal configuration files, you might want to quickly customize your
privateGPT, and this can be done using the `settings` files.
While PrivateGPT is distributing safe and universal configuration files, you might want to quickly customize your
PrivateGPT, and this can be done using the `settings` files.
This project is defining the concept of **profiles** (or configuration profiles).
This mechanism, using your environment variables, is giving you the ability to easily switch between
@@ -30,15 +30,20 @@ For example, on **linux and macOS**, this gives:
export PGPT_PROFILES=my_profile_name_here
```
Windows Powershell(s) have a different syntax, one of them being:
Windows Command Prompt (cmd) has a different syntax:
```shell
set PGPT_PROFILES=my_profile_name_here
```
Windows Powershell has a different syntax:
```shell
$env:PGPT_PROFILES="my_profile_name_here"
```
If the above is not working, you might want to try other ways to set an env variable in your window's terminal.
---
Once you've set this environment variable to the desired profile, you can simply launch your privateGPT,
Once you've set this environment variable to the desired profile, you can simply launch your PrivateGPT,
and it will run using your profile on top of the default configuration.
## Reference

View File

@@ -1,39 +0,0 @@
## Gradio UI user manual
Gradio UI is a ready to use way of testing most of PrivateGPT API functionalities.
![Gradio PrivateGPT](https://lh3.googleusercontent.com/drive-viewer/AK7aPaD_Hc-A8A9ooMe-hPgm_eImgsbxAjb__8nFYj8b_WwzvL1Gy90oAnp1DfhPaN6yGiEHCOXs0r77W1bYHtPzlVwbV7fMsA=s1600)
### Execution Modes
It has 3 modes of execution (you can select in the top-left):
* Query Docs: uses the context from the
ingested documents to answer the questions posted in the chat. It also takes
into account previous chat messages as context.
* Makes use of `/chat/completions` API with `use_context=true` and no
`context_filter`.
* Search in Docs: fast search that returns the 4 most related text
chunks, together with their source document and page.
* Makes use of `/chunks` API with no `context_filter`, `limit=4` and
`prev_next_chunks=0`.
* LLM Chat: simple, non-contextual chat with the LLM. The ingested documents won't
be taken into account, only the previous messages.
* Makes use of `/chat/completions` API with `use_context=false`.
### Document Ingestion
Ingest documents by using the `Upload a File` button. You can check the progress of
the ingestion in the console logs of the server.
The list of ingested files is shown below the button.
If you want to delete the ingested documents, refer to *Reset Local documents
database* section in the documentation.
### Chat
Normal chat interface, self-explanatory ;)
You can check the actual prompt being passed to the LLM by looking at the logs of
the server. We'll add better observability in future releases.

View File

@@ -1,7 +1,7 @@
## Vectorstores
PrivateGPT supports [Qdrant](https://qdrant.tech/) and [Chroma](https://www.trychroma.com/) as vectorstore providers. Qdrant being the default.
PrivateGPT supports [Qdrant](https://qdrant.tech/), [Milvus](https://milvus.io/), [Chroma](https://www.trychroma.com/), [PGVector](https://github.com/pgvector/pgvector) and [ClickHouse](https://github.com/ClickHouse/ClickHouse) as vectorstore providers. Qdrant being the default.
In order to select one or the other, set the `vectorstore.database` property in the `settings.yaml` file to `qdrant` or `chroma`.
In order to select one or the other, set the `vectorstore.database` property in the `settings.yaml` file to `qdrant`, `milvus`, `chroma`, `postgres` and `clickhouse`.
```yaml
vectorstore:
@@ -39,6 +39,24 @@ qdrant:
path: local_data/private_gpt/qdrant
```
### Milvus configuration
To enable Milvus, set the `vectorstore.database` property in the `settings.yaml` file to `milvus` and install the `milvus` extra.
```bash
poetry install --extras vector-stores-milvus
```
The available configuration options are:
| Field | Description |
|--------------|-------------|
| uri | Default is set to "local_data/private_gpt/milvus/milvus_local.db" as a local file; you can also set up a more performant Milvus server on docker or k8s e.g.http://localhost:19530, as your uri; To use Zilliz Cloud, adjust the uri and token to Endpoint and Api key in Zilliz Cloud.|
| token | Pair with Milvus server on docker or k8s or zilliz cloud api key.|
| collection_name | The name of the collection, set to default "milvus_db".|
| overwrite | Overwrite the data in collection if it existed, set to default as True. |
To obtain a local setup (disk-based database) without running a Milvus server, configure the uri value in settings.yaml, to store in local_data/private_gpt/milvus/milvus_local.db.
### Chroma configuration
To enable Chroma, set the `vectorstore.database` property in the `settings.yaml` file to `chroma` and install the `chroma` extra.
@@ -47,4 +65,123 @@ To enable Chroma, set the `vectorstore.database` property in the `settings.yaml`
poetry install --extras chroma
```
By default `chroma` will use a disk-based database stored in local_data_path / "chroma_db" (being local_data_path defined in settings.yaml)
By default `chroma` will use a disk-based database stored in local_data_path / "chroma_db" (being local_data_path defined in settings.yaml)
### PGVector
To use the PGVector store a [postgreSQL](https://www.postgresql.org/) database with the PGVector extension must be used.
To enable PGVector, set the `vectorstore.database` property in the `settings.yaml` file to `postgres` and install the `vector-stores-postgres` extra.
```bash
poetry install --extras vector-stores-postgres
```
PGVector settings can be configured by setting values to the `postgres` property in the `settings.yaml` file.
The available configuration options are:
| Field | Description |
|---------------|-----------------------------------------------------------|
| **host** | The server hosting the Postgres database. Default is `localhost` |
| **port** | The port on which the Postgres database is accessible. Default is `5432` |
| **database** | The specific database to connect to. Default is `postgres` |
| **user** | The username for database access. Default is `postgres` |
| **password** | The password for database access. (Required) |
| **schema_name** | The database schema to use. Default is `private_gpt` |
For example:
```yaml
vectorstore:
database: postgres
postgres:
host: localhost
port: 5432
database: postgres
user: postgres
password: <PASSWORD>
schema_name: private_gpt
```
The following table will be created in the database
```
postgres=# \d private_gpt.data_embeddings
Table "private_gpt.data_embeddings"
Column | Type | Collation | Nullable | Default
-----------+-------------------+-----------+----------+---------------------------------------------------------
id | bigint | | not null | nextval('private_gpt.data_embeddings_id_seq'::regclass)
text | character varying | | not null |
metadata_ | json | | |
node_id | character varying | | |
embedding | vector(768) | | |
Indexes:
"data_embeddings_pkey" PRIMARY KEY, btree (id)
postgres=#
```
The dimensions of the embeddings columns will be set based on the `embedding.embed_dim` value. If the embedding model changes this table may need to be dropped and recreated to avoid a dimension mismatch.
### ClickHouse
To utilize ClickHouse as the vector store, a [ClickHouse](https://github.com/ClickHouse/ClickHouse) database must be employed.
To enable ClickHouse, set the `vectorstore.database` property in the `settings.yaml` file to `clickhouse` and install the `vector-stores-clickhouse` extra.
```bash
poetry install --extras vector-stores-clickhouse
```
ClickHouse settings can be configured by setting values to the `clickhouse` property in the `settings.yaml` file.
The available configuration options are:
| Field | Description |
|----------------------|----------------------------------------------------------------|
| **host** | The server hosting the ClickHouse database. Default is `localhost` |
| **port** | The port on which the ClickHouse database is accessible. Default is `8123` |
| **username** | The username for database access. Default is `default` |
| **password** | The password for database access. (Optional) |
| **database** | The specific database to connect to. Default is `__default__` |
| **secure** | Use https/TLS for secure connection to the server. Default is `false` |
| **interface** | The protocol used for the connection, either 'http' or 'https'. (Optional) |
| **settings** | Specific ClickHouse server settings to be used with the session. (Optional) |
| **connect_timeout** | Timeout in seconds for establishing a connection. (Optional) |
| **send_receive_timeout** | Read timeout in seconds for http connection. (Optional) |
| **verify** | Verify the server certificate in secure/https mode. (Optional) |
| **ca_cert** | Path to Certificate Authority root certificate (.pem format). (Optional) |
| **client_cert** | Path to TLS Client certificate (.pem format). (Optional) |
| **client_cert_key** | Path to the private key for the TLS Client certificate. (Optional) |
| **http_proxy** | HTTP proxy address. (Optional) |
| **https_proxy** | HTTPS proxy address. (Optional) |
| **server_host_name** | Server host name to be checked against the TLS certificate. (Optional) |
For example:
```yaml
vectorstore:
database: clickhouse
clickhouse:
host: localhost
port: 8443
username: admin
password: <PASSWORD>
database: embeddings
secure: false
```
The following table will be created in the database:
```
clickhouse-client
:) \d embeddings.llama_index
Table "llama_index"
№ | name | type | default_type | default_expression | comment | codec_expression | ttl_expression
----|-----------|----------------------------------------------|--------------|--------------------|---------|------------------|---------------
1 | id | String | | | | |
2 | doc_id | String | | | | |
3 | text | String | | | | |
4 | vector | Array(Float32) | | | | |
5 | node_info | Tuple(start Nullable(UInt64), end Nullable(UInt64)) | | | | |
6 | metadata | String | | | | |
clickhouse-client
```
The dimensions of the embeddings columns will be set based on the `embedding.embed_dim` value. If the embedding model changes, this table may need to be dropped and recreated to avoid a dimension mismatch.

View File

@@ -1,21 +0,0 @@
## Local Installation steps
The steps in [Installation](/installation) section are better explained and cover more
setup scenarios (macOS, Windows, Linux).
But if you like one-liners, have python3.11 installed, and you are running a UNIX (macOS or Linux)
system, you can get up and running on CPU in few lines:
```bash
git clone https://github.com/imartinez/privateGPT && cd privateGPT && \
python3.11 -m venv .venv && source .venv/bin/activate && \
pip install --upgrade pip poetry && poetry install --with ui,local && ./scripts/setup
# Launch the privateGPT API server **and** the gradio UI
python3.11 -m private_gpt
# In another terminal, create a new browser window on your private GPT!
open http:////127.0.0.1:8001/
```
The above is not working, or it is too slow, so **you want to run it on GPU(s)**?
Please check the more detailed [installation guide](/installation).

View File

@@ -1,20 +1,27 @@
## Introduction 👋
PrivateGPT provides an **API** containing all the building blocks required to
build **private, context-aware AI applications**.
<Callout intent = "tip">
If you are looking for an **enterprise-ready, fully private AI workspace**
check out [Zylon's website](https://zylon.ai) or [request a demo](https://cal.com/zylon/demo?source=pgpt-docs).
Crafted by the team behind PrivateGPT, Zylon is a best-in-class AI collaborative
workspace that can be easily deployed on-premise (data center, bare metal...) or in your private cloud (AWS, GCP, Azure...).
</Callout>
The API follows and extends OpenAI API standard, and supports both normal and streaming responses.
That means that, if you can use OpenAI API in one of your tools, you can use your own PrivateGPT API instead,
with no code changes, **and for free** if you are running privateGPT in `local` mode.
Looking for the installation quickstart? [Quickstart installation guide for Linux and macOS](/overview/welcome/quickstart).
Do you want to install it on Windows? Or do you want to take full advantage of your hardware for better performances?
The installation guide will help you in the [Installation section](/installation).
with no code changes, **and for free** if you are running PrivateGPT in a `local` setup.
Get started by understanding the [Main Concepts and Installation](/installation) and then dive into the [API Reference](/api-reference).
## Frequently Visited Resources
<Cards>
<Card
title="Main Concepts"
icon="fa-solid fa-lines-leaning"
href="/installation"
/>
<Card
title="API Reference"
icon="fa-solid fa-code"
@@ -32,22 +39,4 @@ The installation guide will help you in the [Installation section](/installation
/>
</Cards>
## API Organization
The API is divided in two logical blocks:
1. High-level API, abstracting all the complexity of a RAG (Retrieval Augmented Generation) pipeline implementation:
- Ingestion of documents: internally managing document parsing, splitting, metadata extraction,
embedding generation and storage.
- Chat & Completions using context from ingested documents: abstracting the retrieval of context, the prompt
engineering and the response generation.
2. Low-level API, allowing advanced users to implement their own complex pipelines:
- Embeddings generation: based on a piece of text.
- Contextual chunks retrieval: given a query, returns the most relevant chunks of text from the ingested
documents.
<Callout intent = "info">
A working **Gradio UI client** is provided to test the API, together with a set of useful tools such as bulk
model download script, ingestion script, documents folder watch, etc.
</Callout>
<br />

View File

@@ -1,95 +0,0 @@
# List of working LLM
**Do you have any working combination of LLM and embeddings?**
Please open a PR to add it to the list, and come on our Discord to tell us about it!
## Prompt style
LLMs might have been trained with different prompt styles.
The prompt style is the way the prompt is written, and how the system message is injected in the prompt.
For example, `llama2` looks like this:
```text
<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>
{{ user_message }} [/INST]
```
While `default` (the `llama_index` default) looks like this:
```text
system: {{ system_prompt }}
user: {{ user_message }}
assistant: {{ assistant_message }}
```
And the "`tag`" style looks like this:
```text
<|system|>: {{ system_prompt }}
<|user|>: {{ user_message }}
<|assistant|>: {{ assistant_message }}
```
Some LLMs will not understand this prompt style, and will not work (returning nothing).
You can try to change the prompt style to `default` (or `tag`) in the settings, and it will
change the way the messages are formatted to be passed to the LLM.
## Example of configuration
You might want to change the prompt depending on the language and model you are using.
### English, with instructions
`settings-en.yaml`:
```yml
local:
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.1-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.1.Q4_K_M.gguf
embedding_hf_model_name: BAAI/bge-small-en-v1.5
prompt_style: "llama2"
```
### French, with instructions
`settings-fr.yaml`:
```yml
local:
llm_hf_repo_id: TheBloke/Vigogne-2-7B-Instruct-GGUF
llm_hf_model_file: vigogne-2-7b-instruct.Q4_K_M.gguf
embedding_hf_model_name: dangvantuan/sentence-camembert-base
prompt_style: "default"
# prompt_style: "tag" # also works
# The default system prompt is injected only when the `prompt_style` != default, and there are no system message in the discussion
# default_system_prompt: Vous êtes un assistant IA qui répond à la question posée à la fin en utilisant le contexte suivant. Si vous ne connaissez pas la réponse, dites simplement que vous ne savez pas, n'essayez pas d'inventer une réponse. Veuillez répondre exclusivement en français.
```
You might want to change the prompt as the one above might not directly answer your question.
You can read online about how to write a good prompt, but in a nutshell, make it (extremely) directive.
You can try and troubleshot your prompt by writing multiline requests in the UI, while
writing your interaction with the model, for example:
```text
Tu es un programmeur senior qui programme en python et utilise le framework fastapi. Ecrit moi un serveur qui retourne "hello world".
```
Another example:
```text
Context: None
Situation: tu es au milieu d'un champ.
Tache: va a la rivière, en bas du champ.
Décrit comment aller a la rivière.
```
### Optimised Models
GodziLLa2-70B LLM (English, rank 2 on HuggingFace OpenLLM Leaderboard), bge large Embedding Model (rank 1 on HuggingFace MTEB Leaderboard)
`settings-optimised.yaml`:
```yml
local:
llm_hf_repo_id: TheBloke/GodziLLa2-70B-GGUF
llm_hf_model_file: godzilla2-70b.Q4_K_M.gguf
embedding_hf_model_name: BAAI/bge-large-en
prompt_style: "llama2"
```

View File

@@ -0,0 +1,23 @@
# Recipes
Recipes are predefined use cases that help users solve very specific tasks using PrivateGPT.
They provide a streamlined approach to achieve common goals with the platform, offering both a starting point and inspiration for further exploration.
The main goal of Recipes is to empower the community to create and share solutions, expanding the capabilities of PrivateGPT.
## How to Create a New Recipe
1. **Identify the Task**: Define a specific task or problem that the Recipe will address.
2. **Develop the Solution**: Create a clear and concise guide, including any necessary code snippets or configurations.
3. **Submit a PR**: Fork the PrivateGPT repository, add your Recipe to the appropriate section, and submit a PR for review.
We encourage you to be creative and think outside the box! Your contributions help shape the future of PrivateGPT.
## Available Recipes
<Cards>
<Card
title="Summarize"
icon="fa-solid fa-file-alt"
href="/recipes/general-use-cases/summarize"
/>
</Cards>

View File

@@ -0,0 +1,20 @@
The Summarize Recipe provides a method to extract concise summaries from ingested documents or texts using PrivateGPT.
This tool is particularly useful for quickly understanding large volumes of information by distilling key points and main ideas.
## Use Case
The primary use case for the `Summarize` tool is to automate the summarization of lengthy documents,
making it easier for users to grasp the essential information without reading through entire texts.
This can be applied in various scenarios, such as summarizing research papers, news articles, or business reports.
## Key Features
1. **Ingestion-compatible**: The user provides the text to be summarized. The text can be directly inputted or retrieved from ingested documents within the system.
2. **Customization**: The summary generation can be influenced by providing specific `instructions` or a `prompt`. These inputs guide the model on how to frame the summary, allowing for customization according to user needs.
3. **Streaming Support**: The tool supports streaming, allowing for real-time summary generation, which can be particularly useful for handling large texts or providing immediate feedback.
## Contributing
If you have ideas for improving the Summarize or want to add new features, feel free to contribute!
You can submit your enhancements via a pull request on our [GitHub repository](https://github.com/zylon-ai/private-gpt).

View File

@@ -0,0 +1,21 @@
This page aims to present different user interface (UI) alternatives for integrating and using PrivateGPT. These alternatives range from demo applications to fully customizable UI setups that can be adapted to your specific needs.
**Do you have any working demo project using PrivateGPT?**
Please open a PR to add it to the list, and come on our Discord to tell us about it!
<Callout intent = "note">
WIP: This page provides an overview of one of the UI alternatives available for PrivateGPT. More alternatives will be added to this page as they become available.
</Callout>
## [PrivateGPT SDK Demo App](https://github.com/frgarciames/privategpt-react)
The PrivateGPT SDK demo app is a robust starting point for developers looking to integrate and customize PrivateGPT in their applications. Leveraging modern technologies like Tailwind, shadcn/ui, and Biomejs, it provides a smooth development experience and a highly customizable user interface. Refer to the [repository](https://github.com/frgarciames/privategpt-react) for more details and to get started.
**Tech Stack:**
- **Tailwind:** A utility-first CSS framework for rapid UI development.
- **shadcn/ui:** A set of high-quality, customizable UI components.
- **PrivateGPT Web SDK:** The core SDK for interacting with PrivateGPT.
- **Biomejs formatter/linter:** A tool for maintaining code quality and consistency.

View File

@@ -0,0 +1,71 @@
## Gradio UI user manual
Gradio UI is a ready to use way of testing most of PrivateGPT API functionalities.
![Gradio PrivateGPT](https://github.com/zylon-ai/private-gpt/raw/main/fern/docs/assets/ui.png?raw=true)
<Callout intent = "warning">
A working **Gradio UI client** is provided to test the API, together with a set of useful tools such as bulk
model download script, ingestion script, documents folder watch, etc. Please refer to the [UI alternatives](/manual/user-interface/alternatives) page for more UI alternatives.
</Callout>
### Execution Modes
It has 3 modes of execution (you can select in the top-left):
* Query Docs: uses the context from the
ingested documents to answer the questions posted in the chat. It also takes
into account previous chat messages as context.
* Makes use of `/chat/completions` API with `use_context=true` and no
`context_filter`.
* Search in Docs: fast search that returns the 4 most related text
chunks, together with their source document and page.
* Makes use of `/chunks` API with no `context_filter`, `limit=4` and
`prev_next_chunks=0`.
* LLM Chat: simple, non-contextual chat with the LLM. The ingested documents won't
be taken into account, only the previous messages.
* Makes use of `/chat/completions` API with `use_context=false`.
### Document Ingestion
Ingest documents by using the `Upload a File` button. You can check the progress of
the ingestion in the console logs of the server.
The list of ingested files is shown below the button.
If you want to delete the ingested documents, refer to *Reset Local documents
database* section in the documentation.
### Chat
Normal chat interface, self-explanatory ;)
#### System Prompt
You can view and change the system prompt being passed to the LLM by clicking "Additional Inputs"
in the chat interface. The system prompt is also logged on the server.
By default, the `Query Docs` mode uses the setting value `ui.default_query_system_prompt`.
The `LLM Chat` mode attempts to use the optional settings value `ui.default_chat_system_prompt`.
If no system prompt is entered, the UI will display the default system prompt being used
for the active mode.
##### System Prompt Examples:
The system prompt can effectively provide your chat bot specialized roles, and results tailored to the prompt
you have given the model. Examples of system prompts can be be found
[here](https://www.w3schools.com/gen_ai/chatgpt-3-5/chatgpt-3-5_roles.php).
Some interesting examples to try include:
* You are -X-. You have all the knowledge and personality of -X-. Answer as if you were -X- using
their manner of speaking and vocabulary.
* Example: You are Shakespeare. You have all the knowledge and personality of Shakespeare.
Answer as if you were Shakespeare using their manner of speaking and vocabulary.
* You are an expert (at) -role-. Answer all questions using your expertise on -specific domain topic-.
* Example: You are an expert software engineer. Answer all questions using your expertise on Python.
* You are a -role- bot, respond with -response criteria needed-. If no -response criteria- is needed,
respond with -alternate response-.
* Example: You are a grammar checking bot, respond with any grammatical corrections needed. If no corrections
are needed, respond with "verified".

View File

@@ -1,4 +1,4 @@
{
"organization": "privategpt",
"version": "0.15.3"
"version": "0.31.17"
}

View File

@@ -1,20 +1,8 @@
{
"openapi": "3.1.0",
"info": {
"title": "PrivateGPT",
"summary": "PrivateGPT is a production-ready AI project that allows you to ask questions to your documents using the power of Large Language Models (LLMs), even in scenarios without Internet connection. 100% private, no data leaves your execution environment at any point.",
"description": "",
"contact": {
"url": "https://github.com/imartinez/privateGPT"
},
"license": {
"name": "Apache 2.0",
"url": "https://www.apache.org/licenses/LICENSE-2.0.html"
},
"version": "0.1.0",
"x-logo": {
"url": "https://lh3.googleusercontent.com/drive-viewer/AK7aPaD_iNlMoTquOBsw4boh4tIYxyEuhz6EtEs8nzq3yNkNAK00xGjE1KUCmPJSk3TYOjcs6tReG6w_cLu1S7L_gPgT9z52iw=s2560"
}
"title": "FastAPI",
"version": "0.1.0"
},
"paths": {
"/v1/completions": {
@@ -56,6 +44,15 @@
}
}
}
},
"x-fern-streaming": {
"stream-condition": "stream",
"response": {
"$ref": "#/components/schemas/OpenAICompletion"
},
"response-stream": {
"$ref": "#/components/schemas/OpenAICompletion"
}
}
}
},
@@ -65,7 +62,7 @@
"Contextual Completions"
],
"summary": "Chat Completion",
"description": "Given a list of messages comprising a conversation, return a response.\n\nOptionally include a `system_prompt` to influence the way the LLM answers.\n\nIf `use_context` is set to `true`, the model will use context coming\nfrom the ingested documents to create the response. The documents being used can\nbe filtered using the `context_filter` and passing the document IDs to be used.\nIngested documents IDs can be found using `/ingest/list` endpoint. If you want\nall ingested documents to be used, remove `context_filter` altogether.\n\nWhen using `'include_sources': true`, the API will return the source Chunks used\nto create the response, which come from the context provided.\n\nWhen using `'stream': true`, the API will return data chunks following [OpenAI's\nstreaming model](https://platform.openai.com/docs/api-reference/chat/streaming):\n```\n{\"id\":\"12345\",\"object\":\"completion.chunk\",\"created\":1694268190,\n\"model\":\"private-gpt\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\"Hello\"},\n\"finish_reason\":null}]}\n```",
"description": "Given a list of messages comprising a conversation, return a response.\n\nOptionally include an initial `role: system` message to influence the way\nthe LLM answers.\n\nIf `use_context` is set to `true`, the model will use context coming\nfrom the ingested documents to create the response. The documents being used can\nbe filtered using the `context_filter` and passing the document IDs to be used.\nIngested documents IDs can be found using `/ingest/list` endpoint. If you want\nall ingested documents to be used, remove `context_filter` altogether.\n\nWhen using `'include_sources': true`, the API will return the source Chunks used\nto create the response, which come from the context provided.\n\nWhen using `'stream': true`, the API will return data chunks following [OpenAI's\nstreaming model](https://platform.openai.com/docs/api-reference/chat/streaming):\n```\n{\"id\":\"12345\",\"object\":\"completion.chunk\",\"created\":1694268190,\n\"model\":\"private-gpt\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\"Hello\"},\n\"finish_reason\":null}]}\n```",
"operationId": "chat_completion_v1_chat_completions_post",
"requestBody": {
"content": {
@@ -98,6 +95,15 @@
}
}
}
},
"x-fern-streaming": {
"stream-condition": "stream",
"response": {
"$ref": "#/components/schemas/OpenAICompletion"
},
"response-stream": {
"$ref": "#/components/schemas/OpenAICompletion"
}
}
}
},
@@ -149,7 +155,7 @@
"Ingestion"
],
"summary": "Ingest",
"description": "Ingests and processes a file, storing its chunks to be used as context.\n\nThe context obtained from files is later used in\n`/chat/completions`, `/completions`, and `/chunks` APIs.\n\nMost common document\nformats are supported, but you may be prompted to install an extra dependency to\nmanage a specific file type.\n\nA file can generate different Documents (for example a PDF generates one Document\nper page). All Documents IDs are returned in the response, together with the\nextracted Metadata (which is later used to improve context retrieval). Those IDs\ncan be used to filter the context used to create responses in\n`/chat/completions`, `/completions`, and `/chunks` APIs.",
"description": "Ingests and processes a file.\n\nDeprecated. Use ingest/file instead.",
"operationId": "ingest_v1_ingest_post",
"requestBody": {
"content": {
@@ -161,6 +167,91 @@
},
"required": true
},
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/IngestResponse"
}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
},
"deprecated": true
}
},
"/v1/ingest/file": {
"post": {
"tags": [
"Ingestion"
],
"summary": "Ingest File",
"description": "Ingests and processes a file, storing its chunks to be used as context.\n\nThe context obtained from files is later used in\n`/chat/completions`, `/completions`, and `/chunks` APIs.\n\nMost common document\nformats are supported, but you may be prompted to install an extra dependency to\nmanage a specific file type.\n\nA file can generate different Documents (for example a PDF generates one Document\nper page). All Documents IDs are returned in the response, together with the\nextracted Metadata (which is later used to improve context retrieval). Those IDs\ncan be used to filter the context used to create responses in\n`/chat/completions`, `/completions`, and `/chunks` APIs.",
"operationId": "ingest_file_v1_ingest_file_post",
"requestBody": {
"content": {
"multipart/form-data": {
"schema": {
"$ref": "#/components/schemas/Body_ingest_file_v1_ingest_file_post"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/IngestResponse"
}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/v1/ingest/text": {
"post": {
"tags": [
"Ingestion"
],
"summary": "Ingest Text",
"description": "Ingests and processes a text, storing its chunks to be used as context.\n\nThe context obtained from files is later used in\n`/chat/completions`, `/completions`, and `/chunks` APIs.\n\nA Document will be generated with the given text. The Document\nID is returned in the response, together with the\nextracted Metadata (which is later used to improve context retrieval). That ID\ncan be used to filter the context used to create responses in\n`/chat/completions`, `/completions`, and `/chunks` APIs.",
"operationId": "ingest_text_v1_ingest_text_post",
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/IngestTextBody"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Successful Response",
@@ -248,6 +339,48 @@
}
}
},
"/v1/summarize": {
"post": {
"tags": [
"Recipes"
],
"summary": "Summarize",
"description": "Given a text, the model will return a summary.\n\nOptionally include `instructions` to influence the way the summary is generated.\n\nIf `use_context`\nis set to `true`, the model will also use the content coming from the ingested\ndocuments in the summary. The documents being used can\nbe filtered by their metadata using the `context_filter`.\nIngested documents metadata can be found using `/ingest/list` endpoint.\nIf you want all ingested documents to be used, remove `context_filter` altogether.\n\nIf `prompt` is set, it will be used as the prompt for the summarization,\notherwise the default prompt will be used.\n\nWhen using `'stream': true`, the API will return data chunks following [OpenAI's\nstreaming model](https://platform.openai.com/docs/api-reference/chat/streaming):\n```\n{\"id\":\"12345\",\"object\":\"completion.chunk\",\"created\":1694268190,\n\"model\":\"private-gpt\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\"Hello\"},\n\"finish_reason\":null}]}\n```",
"operationId": "summarize_v1_summarize_post",
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/SummarizeBody"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/SummarizeResponse"
}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/v1/embeddings": {
"post": {
"tags": [
@@ -315,6 +448,20 @@
},
"components": {
"schemas": {
"Body_ingest_file_v1_ingest_file_post": {
"properties": {
"file": {
"type": "string",
"format": "binary",
"title": "File"
}
},
"type": "object",
"required": [
"file"
],
"title": "Body_ingest_file_v1_ingest_file_post"
},
"Body_ingest_v1_ingest_post": {
"properties": {
"file": {
@@ -338,17 +485,6 @@
"type": "array",
"title": "Messages"
},
"system_prompt": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "System Prompt"
},
"use_context": {
"type": "boolean",
"title": "Use Context",
@@ -389,13 +525,16 @@
},
"include_sources": true,
"messages": [
{
"content": "You are a rapper. Always answer with a rap.",
"role": "system"
},
{
"content": "How do you fry an egg?",
"role": "user"
}
],
"stream": false,
"system_prompt": "You are a rapper. Always answer with a rap.",
"use_context": true
}
]
@@ -403,6 +542,10 @@
"Chunk": {
"properties": {
"object": {
"type": "string",
"enum": [
"context.chunk"
],
"const": "context.chunk",
"title": "Object"
},
@@ -515,10 +658,18 @@
"ChunksResponse": {
"properties": {
"object": {
"type": "string",
"enum": [
"list"
],
"const": "list",
"title": "Object"
},
"model": {
"type": "string",
"enum": [
"private-gpt"
],
"const": "private-gpt",
"title": "Model"
},
@@ -591,6 +742,7 @@
"include_sources": false,
"prompt": "How do you fry an egg?",
"stream": false,
"system_prompt": "You are a rapper. Always answer with a rap.",
"use_context": false
}
]
@@ -630,6 +782,10 @@
"title": "Index"
},
"object": {
"type": "string",
"enum": [
"embedding"
],
"const": "embedding",
"title": "Object"
},
@@ -681,10 +837,18 @@
"EmbeddingsResponse": {
"properties": {
"object": {
"type": "string",
"enum": [
"list"
],
"const": "list",
"title": "Object"
},
"model": {
"type": "string",
"enum": [
"private-gpt"
],
"const": "private-gpt",
"title": "Model"
},
@@ -720,6 +884,10 @@
"HealthResponse": {
"properties": {
"status": {
"type": "string",
"enum": [
"ok"
],
"const": "ok",
"title": "Status",
"default": "ok"
@@ -731,10 +899,18 @@
"IngestResponse": {
"properties": {
"object": {
"type": "string",
"enum": [
"list"
],
"const": "list",
"title": "Object"
},
"model": {
"type": "string",
"enum": [
"private-gpt"
],
"const": "private-gpt",
"title": "Model"
},
@@ -754,9 +930,37 @@
],
"title": "IngestResponse"
},
"IngestTextBody": {
"properties": {
"file_name": {
"type": "string",
"title": "File Name",
"examples": [
"Avatar: The Last Airbender"
]
},
"text": {
"type": "string",
"title": "Text",
"examples": [
"Avatar is set in an Asian and Arctic-inspired world in which some people can telekinetically manipulate one of the four elements\u2014water, earth, fire or air\u2014through practices known as 'bending', inspired by Chinese martial arts."
]
}
},
"type": "object",
"required": [
"file_name",
"text"
],
"title": "IngestTextBody"
},
"IngestedDoc": {
"properties": {
"object": {
"type": "string",
"enum": [
"ingest.document"
],
"const": "ingest.document",
"title": "Object"
},
@@ -879,6 +1083,10 @@
]
},
"model": {
"type": "string",
"enum": [
"private-gpt"
],
"const": "private-gpt",
"title": "Model"
},
@@ -952,6 +1160,78 @@
"title": "OpenAIMessage",
"description": "Inference result, with the source of the message.\n\nRole could be the assistant or system\n(providing a default response, not AI generated)."
},
"SummarizeBody": {
"properties": {
"text": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Text"
},
"use_context": {
"type": "boolean",
"title": "Use Context",
"default": false
},
"context_filter": {
"anyOf": [
{
"$ref": "#/components/schemas/ContextFilter"
},
{
"type": "null"
}
]
},
"prompt": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Prompt"
},
"instructions": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Instructions"
},
"stream": {
"type": "boolean",
"title": "Stream",
"default": false
}
},
"type": "object",
"title": "SummarizeBody"
},
"SummarizeResponse": {
"properties": {
"summary": {
"type": "string",
"title": "Summary"
}
},
"type": "object",
"required": [
"summary"
],
"title": "SummarizeResponse"
},
"ValidationError": {
"properties": {
"loc": {
@@ -986,27 +1266,5 @@
"title": "ValidationError"
}
}
},
"tags": [
{
"name": "Ingestion",
"description": "High-level APIs covering document ingestion -internally managing document parsing, splitting,metadata extraction, embedding generation and storage- and ingested documents CRUD.Each ingested document is identified by an ID that can be used to filter the contextused in *Contextual Completions* and *Context Chunks* APIs."
},
{
"name": "Contextual Completions",
"description": "High-level APIs covering contextual Chat and Completions. They follow OpenAI's format, extending it to allow using the context coming from ingested documents to create the response. Internallymanage context retrieval, prompt engineering and the response generation."
},
{
"name": "Context Chunks",
"description": "Low-level API that given a query return relevant chunks of text coming from the ingesteddocuments."
},
{
"name": "Embeddings",
"description": "Low-level API to obtain the vector representation of a given text, using an Embeddings model.Follows OpenAI's embeddings API format."
},
{
"name": "Health",
"description": "Simple health API to make sure the server is up and running."
}
]
}
}

7056
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,4 +1,5 @@
"""private-gpt."""
import logging
import os
@@ -21,3 +22,6 @@ os.environ["GRADIO_ANALYTICS_ENABLED"] = "False"
# Disable chromaDB telemetry
# It is already disabled, see PR#1144
# os.environ["ANONYMIZED_TELEMETRY"] = "False"
# adding tiktoken cache path within repo to be able to run in offline environment.
os.environ["TIKTOKEN_CACHE_DIR"] = "tiktoken_cache"

View File

@@ -3,7 +3,7 @@ import json
from typing import Any
import boto3
from llama_index.embeddings.base import BaseEmbedding
from llama_index.core.base.embeddings.base import BaseEmbedding
from pydantic import Field, PrivateAttr

View File

@@ -1,8 +1,7 @@
import logging
from injector import inject, singleton
from llama_index import MockEmbedding
from llama_index.embeddings.base import BaseEmbedding
from llama_index.core.embeddings import BaseEmbedding, MockEmbedding
from private_gpt.paths import models_cache_path
from private_gpt.settings.settings import Settings
@@ -19,27 +18,132 @@ class EmbeddingComponent:
embedding_mode = settings.embedding.mode
logger.info("Initializing the embedding model in mode=%s", embedding_mode)
match embedding_mode:
case "local":
from llama_index.embeddings import HuggingFaceEmbedding
case "huggingface":
try:
from llama_index.embeddings.huggingface import ( # type: ignore
HuggingFaceEmbedding,
)
except ImportError as e:
raise ImportError(
"Local dependencies not found, install with `poetry install --extras embeddings-huggingface`"
) from e
self.embedding_model = HuggingFaceEmbedding(
model_name=settings.local.embedding_hf_model_name,
model_name=settings.huggingface.embedding_hf_model_name,
cache_folder=str(models_cache_path),
trust_remote_code=settings.huggingface.trust_remote_code,
)
case "sagemaker":
from private_gpt.components.embedding.custom.sagemaker import (
SagemakerEmbedding,
)
try:
from private_gpt.components.embedding.custom.sagemaker import (
SagemakerEmbedding,
)
except ImportError as e:
raise ImportError(
"Sagemaker dependencies not found, install with `poetry install --extras embeddings-sagemaker`"
) from e
self.embedding_model = SagemakerEmbedding(
endpoint_name=settings.sagemaker.embedding_endpoint_name,
)
case "openai":
from llama_index import OpenAIEmbedding
try:
from llama_index.embeddings.openai import ( # type: ignore
OpenAIEmbedding,
)
except ImportError as e:
raise ImportError(
"OpenAI dependencies not found, install with `poetry install --extras embeddings-openai`"
) from e
openai_settings = settings.openai.api_key
self.embedding_model = OpenAIEmbedding(api_key=openai_settings)
api_base = (
settings.openai.embedding_api_base or settings.openai.api_base
)
api_key = settings.openai.embedding_api_key or settings.openai.api_key
model = settings.openai.embedding_model
self.embedding_model = OpenAIEmbedding(
api_base=api_base,
api_key=api_key,
model=model,
)
case "ollama":
try:
from llama_index.embeddings.ollama import ( # type: ignore
OllamaEmbedding,
)
from ollama import Client # type: ignore
except ImportError as e:
raise ImportError(
"Local dependencies not found, install with `poetry install --extras embeddings-ollama`"
) from e
ollama_settings = settings.ollama
# Calculate embedding model. If not provided tag, it will be use latest
model_name = (
ollama_settings.embedding_model + ":latest"
if ":" not in ollama_settings.embedding_model
else ollama_settings.embedding_model
)
self.embedding_model = OllamaEmbedding(
model_name=model_name,
base_url=ollama_settings.embedding_api_base,
)
if ollama_settings.autopull_models:
if ollama_settings.autopull_models:
from private_gpt.utils.ollama import (
check_connection,
pull_model,
)
# TODO: Reuse llama-index client when llama-index is updated
client = Client(
host=ollama_settings.embedding_api_base,
timeout=ollama_settings.request_timeout,
)
if not check_connection(client):
raise ValueError(
f"Failed to connect to Ollama, "
f"check if Ollama server is running on {ollama_settings.api_base}"
)
pull_model(client, model_name)
case "azopenai":
try:
from llama_index.embeddings.azure_openai import ( # type: ignore
AzureOpenAIEmbedding,
)
except ImportError as e:
raise ImportError(
"Azure OpenAI dependencies not found, install with `poetry install --extras embeddings-azopenai`"
) from e
azopenai_settings = settings.azopenai
self.embedding_model = AzureOpenAIEmbedding(
model=azopenai_settings.embedding_model,
deployment_name=azopenai_settings.embedding_deployment_name,
api_key=azopenai_settings.api_key,
azure_endpoint=azopenai_settings.azure_endpoint,
api_version=azopenai_settings.api_version,
)
case "gemini":
try:
from llama_index.embeddings.gemini import ( # type: ignore
GeminiEmbedding,
)
except ImportError as e:
raise ImportError(
"Gemini dependencies not found, install with `poetry install --extras embeddings-gemini`"
) from e
self.embedding_model = GeminiEmbedding(
api_key=settings.gemini.api_key,
model_name=settings.gemini.embedding_model,
)
case "mock":
# Not a random number, is the dimensionality used by
# the default embedding model

View File

@@ -6,22 +6,21 @@ import multiprocessing.pool
import os
import threading
from pathlib import Path
from queue import Queue
from typing import Any
from llama_index import (
Document,
ServiceContext,
StorageContext,
VectorStoreIndex,
load_index_from_storage,
)
from llama_index.data_structs import IndexDict
from llama_index.indices.base import BaseIndex
from llama_index.ingestion import run_transformations
from llama_index.core.data_structs import IndexDict
from llama_index.core.embeddings.utils import EmbedType
from llama_index.core.indices import VectorStoreIndex, load_index_from_storage
from llama_index.core.indices.base import BaseIndex
from llama_index.core.ingestion import run_transformations
from llama_index.core.schema import BaseNode, Document, TransformComponent
from llama_index.core.storage import StorageContext
from private_gpt.components.ingest.ingest_helper import IngestionHelper
from private_gpt.paths import local_data_path
from private_gpt.settings.settings import Settings
from private_gpt.utils.eta import eta
logger = logging.getLogger(__name__)
@@ -30,13 +29,15 @@ class BaseIngestComponent(abc.ABC):
def __init__(
self,
storage_context: StorageContext,
service_context: ServiceContext,
embed_model: EmbedType,
transformations: list[TransformComponent],
*args: Any,
**kwargs: Any,
) -> None:
logger.debug("Initializing base ingest component type=%s", type(self).__name__)
self.storage_context = storage_context
self.service_context = service_context
self.embed_model = embed_model
self.transformations = transformations
@abc.abstractmethod
def ingest(self, file_name: str, file_data: Path) -> list[Document]:
@@ -55,11 +56,12 @@ class BaseIngestComponentWithIndex(BaseIngestComponent, abc.ABC):
def __init__(
self,
storage_context: StorageContext,
service_context: ServiceContext,
embed_model: EmbedType,
transformations: list[TransformComponent],
*args: Any,
**kwargs: Any,
) -> None:
super().__init__(storage_context, service_context, *args, **kwargs)
super().__init__(storage_context, embed_model, transformations, *args, **kwargs)
self.show_progress = True
self._index_thread_lock = (
@@ -73,9 +75,10 @@ class BaseIngestComponentWithIndex(BaseIngestComponent, abc.ABC):
# Load the index with store_nodes_override=True to be able to delete them
index = load_index_from_storage(
storage_context=self.storage_context,
service_context=self.service_context,
store_nodes_override=True, # Force store nodes in index and document stores
show_progress=self.show_progress,
embed_model=self.embed_model,
transformations=self.transformations,
)
except ValueError:
# There are no index in the storage context, creating a new one
@@ -83,9 +86,10 @@ class BaseIngestComponentWithIndex(BaseIngestComponent, abc.ABC):
index = VectorStoreIndex.from_documents(
[],
storage_context=self.storage_context,
service_context=self.service_context,
store_nodes_override=True, # Force store nodes in index and document stores
show_progress=self.show_progress,
embed_model=self.embed_model,
transformations=self.transformations,
)
index.storage_context.persist(persist_dir=local_data_path)
return index
@@ -106,11 +110,12 @@ class SimpleIngestComponent(BaseIngestComponentWithIndex):
def __init__(
self,
storage_context: StorageContext,
service_context: ServiceContext,
embed_model: EmbedType,
transformations: list[TransformComponent],
*args: Any,
**kwargs: Any,
) -> None:
super().__init__(storage_context, service_context, *args, **kwargs)
super().__init__(storage_context, embed_model, transformations, *args, **kwargs)
def ingest(self, file_name: str, file_data: Path) -> list[Document]:
logger.info("Ingesting file_name=%s", file_name)
@@ -151,16 +156,17 @@ class BatchIngestComponent(BaseIngestComponentWithIndex):
def __init__(
self,
storage_context: StorageContext,
service_context: ServiceContext,
embed_model: EmbedType,
transformations: list[TransformComponent],
count_workers: int,
*args: Any,
**kwargs: Any,
) -> None:
super().__init__(storage_context, service_context, *args, **kwargs)
super().__init__(storage_context, embed_model, transformations, *args, **kwargs)
# Make an efficient use of the CPU and GPU, the embedding
# must be in the transformations
assert (
len(self.service_context.transformations) >= 2
len(self.transformations) >= 2
), "Embeddings must be in the transformations"
assert count_workers > 0, "count_workers must be > 0"
self.count_workers = count_workers
@@ -197,7 +203,7 @@ class BatchIngestComponent(BaseIngestComponentWithIndex):
logger.debug("Transforming count=%s documents into nodes", len(documents))
nodes = run_transformations(
documents, # type: ignore[arg-type]
self.service_context.transformations,
self.transformations,
show_progress=self.show_progress,
)
# Locking the index to avoid concurrent writes
@@ -225,16 +231,17 @@ class ParallelizedIngestComponent(BaseIngestComponentWithIndex):
def __init__(
self,
storage_context: StorageContext,
service_context: ServiceContext,
embed_model: EmbedType,
transformations: list[TransformComponent],
count_workers: int,
*args: Any,
**kwargs: Any,
) -> None:
super().__init__(storage_context, service_context, *args, **kwargs)
super().__init__(storage_context, embed_model, transformations, *args, **kwargs)
# To make an efficient use of the CPU and GPU, the embeddings
# must be in the transformations (to be computed in batches)
assert (
len(self.service_context.transformations) >= 2
len(self.transformations) >= 2
), "Embeddings must be in the transformations"
assert count_workers > 0, "count_workers must be > 0"
self.count_workers = count_workers
@@ -278,7 +285,7 @@ class ParallelizedIngestComponent(BaseIngestComponentWithIndex):
logger.debug("Transforming count=%s documents into nodes", len(documents))
nodes = run_transformations(
documents, # type: ignore[arg-type]
self.service_context.transformations,
self.transformations,
show_progress=self.show_progress,
)
# Locking the index to avoid concurrent writes
@@ -309,20 +316,202 @@ class ParallelizedIngestComponent(BaseIngestComponentWithIndex):
self._file_to_documents_work_pool.terminate()
class PipelineIngestComponent(BaseIngestComponentWithIndex):
"""Pipeline ingestion - keeping the embedding worker pool as busy as possible.
This class implements a threaded ingestion pipeline, which comprises two threads
and two queues. The primary thread is responsible for reading and parsing files
into documents. These documents are then placed into a queue, which is
distributed to a pool of worker processes for embedding computation. After
embedding, the documents are transferred to another queue where they are
accumulated until a threshold is reached. Upon reaching this threshold, the
accumulated documents are flushed to the document store, index, and vector
store.
Exception handling ensures robustness against erroneous files. However, in the
pipelined design, one error can lead to the discarding of multiple files. Any
discarded files will be reported.
"""
NODE_FLUSH_COUNT = 5000 # Save the index every # nodes.
def __init__(
self,
storage_context: StorageContext,
embed_model: EmbedType,
transformations: list[TransformComponent],
count_workers: int,
*args: Any,
**kwargs: Any,
) -> None:
super().__init__(storage_context, embed_model, transformations, *args, **kwargs)
self.count_workers = count_workers
assert (
len(self.transformations) >= 2
), "Embeddings must be in the transformations"
assert count_workers > 0, "count_workers must be > 0"
self.count_workers = count_workers
# We are doing our own multiprocessing
# To do not collide with the multiprocessing of huggingface, we disable it
os.environ["TOKENIZERS_PARALLELISM"] = "false"
# doc_q stores parsed files as Document chunks.
# Using a shallow queue causes the filesystem parser to block
# when it reaches capacity. This ensures it doesn't outpace the
# computationally intensive embeddings phase, avoiding unnecessary
# memory consumption. The semaphore is used to bound the async worker
# embedding computations to cause the doc Q to fill and block.
self.doc_semaphore = multiprocessing.Semaphore(
self.count_workers
) # limit the doc queue to # items.
self.doc_q: Queue[tuple[str, str | None, list[Document] | None]] = Queue(20)
# node_q stores documents parsed into nodes (embeddings).
# Larger queue size so we don't block the embedding workers during a slow
# index update.
self.node_q: Queue[
tuple[str, str | None, list[Document] | None, list[BaseNode] | None]
] = Queue(40)
threading.Thread(target=self._doc_to_node, daemon=True).start()
threading.Thread(target=self._write_nodes, daemon=True).start()
def _doc_to_node(self) -> None:
# Parse documents into nodes
with multiprocessing.pool.ThreadPool(processes=self.count_workers) as pool:
while True:
try:
cmd, file_name, documents = self.doc_q.get(
block=True
) # Documents for a file
if cmd == "process":
# Push CPU/GPU embedding work to the worker pool
# Acquire semaphore to control access to worker pool
self.doc_semaphore.acquire()
pool.apply_async(
self._doc_to_node_worker, (file_name, documents)
)
elif cmd == "quit":
break
finally:
if cmd != "process":
self.doc_q.task_done() # unblock Q joins
def _doc_to_node_worker(self, file_name: str, documents: list[Document]) -> None:
# CPU/GPU intensive work in its own process
try:
nodes = run_transformations(
documents, # type: ignore[arg-type]
self.transformations,
show_progress=self.show_progress,
)
self.node_q.put(("process", file_name, documents, nodes))
finally:
self.doc_semaphore.release()
self.doc_q.task_done() # unblock Q joins
def _save_docs(
self, files: list[str], documents: list[Document], nodes: list[BaseNode]
) -> None:
try:
logger.info(
f"Saving {len(files)} files ({len(documents)} documents / {len(nodes)} nodes)"
)
self._index.insert_nodes(nodes)
for document in documents:
self._index.docstore.set_document_hash(
document.get_doc_id(), document.hash
)
self._save_index()
except Exception:
# Tell the user so they can investigate these files
logger.exception(f"Processing files {files}")
finally:
# Clearing work, even on exception, maintains a clean state.
nodes.clear()
documents.clear()
files.clear()
def _write_nodes(self) -> None:
# Save nodes to index. I/O intensive.
node_stack: list[BaseNode] = []
doc_stack: list[Document] = []
file_stack: list[str] = []
while True:
try:
cmd, file_name, documents, nodes = self.node_q.get(block=True)
if cmd in ("flush", "quit"):
if file_stack:
self._save_docs(file_stack, doc_stack, node_stack)
if cmd == "quit":
break
elif cmd == "process":
node_stack.extend(nodes) # type: ignore[arg-type]
doc_stack.extend(documents) # type: ignore[arg-type]
file_stack.append(file_name) # type: ignore[arg-type]
# Constant saving is heavy on I/O - accumulate to a threshold
if len(node_stack) >= self.NODE_FLUSH_COUNT:
self._save_docs(file_stack, doc_stack, node_stack)
finally:
self.node_q.task_done()
def _flush(self) -> None:
self.doc_q.put(("flush", None, None))
self.doc_q.join()
self.node_q.put(("flush", None, None, None))
self.node_q.join()
def ingest(self, file_name: str, file_data: Path) -> list[Document]:
documents = IngestionHelper.transform_file_into_documents(file_name, file_data)
self.doc_q.put(("process", file_name, documents))
self._flush()
return documents
def bulk_ingest(self, files: list[tuple[str, Path]]) -> list[Document]:
docs = []
for file_name, file_data in eta(files):
try:
documents = IngestionHelper.transform_file_into_documents(
file_name, file_data
)
self.doc_q.put(("process", file_name, documents))
docs.extend(documents)
except Exception:
logger.exception(f"Skipping {file_data.name}")
self._flush()
return docs
def get_ingestion_component(
storage_context: StorageContext,
service_context: ServiceContext,
embed_model: EmbedType,
transformations: list[TransformComponent],
settings: Settings,
) -> BaseIngestComponent:
"""Get the ingestion component for the given configuration."""
ingest_mode = settings.embedding.ingest_mode
if ingest_mode == "batch":
return BatchIngestComponent(
storage_context, service_context, settings.embedding.count_workers
storage_context=storage_context,
embed_model=embed_model,
transformations=transformations,
count_workers=settings.embedding.count_workers,
)
elif ingest_mode == "parallel":
return ParallelizedIngestComponent(
storage_context, service_context, settings.embedding.count_workers
storage_context=storage_context,
embed_model=embed_model,
transformations=transformations,
count_workers=settings.embedding.count_workers,
)
elif ingest_mode == "pipeline":
return PipelineIngestComponent(
storage_context=storage_context,
embed_model=embed_model,
transformations=transformations,
count_workers=settings.embedding.count_workers,
)
else:
return SimpleIngestComponent(storage_context, service_context)
return SimpleIngestComponent(
storage_context=storage_context,
embed_model=embed_model,
transformations=transformations,
)

View File

@@ -1,14 +1,58 @@
import logging
from pathlib import Path
from llama_index import Document
from llama_index.readers import JSONReader, StringIterableReader
from llama_index.readers.file.base import DEFAULT_FILE_READER_CLS
from llama_index.core.readers import StringIterableReader
from llama_index.core.readers.base import BaseReader
from llama_index.core.readers.json import JSONReader
from llama_index.core.schema import Document
logger = logging.getLogger(__name__)
# Inspired by the `llama_index.core.readers.file.base` module
def _try_loading_included_file_formats() -> dict[str, type[BaseReader]]:
try:
from llama_index.readers.file.docs import ( # type: ignore
DocxReader,
HWPReader,
PDFReader,
)
from llama_index.readers.file.epub import EpubReader # type: ignore
from llama_index.readers.file.image import ImageReader # type: ignore
from llama_index.readers.file.ipynb import IPYNBReader # type: ignore
from llama_index.readers.file.markdown import MarkdownReader # type: ignore
from llama_index.readers.file.mbox import MboxReader # type: ignore
from llama_index.readers.file.slides import PptxReader # type: ignore
from llama_index.readers.file.tabular import PandasCSVReader # type: ignore
from llama_index.readers.file.video_audio import ( # type: ignore
VideoAudioReader,
)
except ImportError as e:
raise ImportError("`llama-index-readers-file` package not found") from e
default_file_reader_cls: dict[str, type[BaseReader]] = {
".hwp": HWPReader,
".pdf": PDFReader,
".docx": DocxReader,
".pptx": PptxReader,
".ppt": PptxReader,
".pptm": PptxReader,
".jpg": ImageReader,
".png": ImageReader,
".jpeg": ImageReader,
".mp3": VideoAudioReader,
".mp4": VideoAudioReader,
".csv": PandasCSVReader,
".epub": EpubReader,
".md": MarkdownReader,
".mbox": MboxReader,
".ipynb": IPYNBReader,
}
return default_file_reader_cls
# Patching the default file reader to support other file types
FILE_READER_CLS = DEFAULT_FILE_READER_CLS.copy()
FILE_READER_CLS = _try_loading_included_file_formats()
FILE_READER_CLS.update(
{
".json": JSONReader,

View File

@@ -7,26 +7,20 @@ import logging
from typing import TYPE_CHECKING, Any
import boto3 # type: ignore
from llama_index.bridge.pydantic import Field
from llama_index.llms import (
from llama_index.core.base.llms.generic_utils import (
completion_response_to_chat_response,
stream_completion_response_to_chat_response,
)
from llama_index.core.bridge.pydantic import Field
from llama_index.core.llms import (
CompletionResponse,
CustomLLM,
LLMMetadata,
)
from llama_index.llms.base import (
from llama_index.core.llms.callbacks import (
llm_chat_callback,
llm_completion_callback,
)
from llama_index.llms.generic_utils import (
completion_response_to_chat_response,
stream_completion_response_to_chat_response,
)
from llama_index.llms.llama_utils import (
completion_to_prompt as generic_completion_to_prompt,
)
from llama_index.llms.llama_utils import (
messages_to_prompt as generic_messages_to_prompt,
)
if TYPE_CHECKING:
from collections.abc import Sequence
@@ -161,8 +155,8 @@ class SagemakerLLM(CustomLLM):
model_kwargs = model_kwargs or {}
model_kwargs.update({"n_ctx": context_window, "verbose": verbose})
messages_to_prompt = messages_to_prompt or generic_messages_to_prompt
completion_to_prompt = completion_to_prompt or generic_completion_to_prompt
messages_to_prompt = messages_to_prompt or {}
completion_to_prompt = completion_to_prompt or {}
generate_kwargs = generate_kwargs or {}
generate_kwargs.update(
@@ -224,7 +218,7 @@ class SagemakerLLM(CustomLLM):
response_body = resp["Body"]
response_str = response_body.read().decode("utf-8")
response_dict = eval(response_str)
response_dict = json.loads(response_str)
return CompletionResponse(
text=response_dict[0]["generated_text"][len(prompt) :], raw=resp
@@ -249,12 +243,19 @@ class SagemakerLLM(CustomLLM):
event_stream = resp["Body"]
start_json = b"{"
stop_token = "<|endoftext|>"
first_token = True
for line in LineIterator(event_stream):
if line != b"" and start_json in line:
data = json.loads(line[line.find(start_json) :].decode("utf-8"))
if data["token"]["text"] != stop_token:
special = data["token"]["special"]
stop = data["token"]["text"] == stop_token
if not special and not stop:
delta = data["token"]["text"]
# trim the leading space for the first token if present
if first_token:
delta = delta.lstrip()
first_token = False
text += delta
yield CompletionResponse(delta=delta, text=text, raw=data)

View File

@@ -1,10 +1,15 @@
import logging
from collections.abc import Callable
from typing import Any
from injector import inject, singleton
from llama_index.llms import MockLLM
from llama_index.llms.base import LLM
from llama_index.core.llms import LLM, MockLLM
from llama_index.core.settings import Settings as LlamaIndexSettings
from llama_index.core.utils import set_global_tokenizer
from transformers import AutoTokenizer # type: ignore
from private_gpt.paths import models_path
from private_gpt.components.llm.prompt_helper import get_prompt_style
from private_gpt.paths import models_cache_path, models_path
from private_gpt.settings.settings import Settings
logger = logging.getLogger(__name__)
@@ -17,48 +22,205 @@ class LLMComponent:
@inject
def __init__(self, settings: Settings) -> None:
llm_mode = settings.llm.mode
if settings.llm.tokenizer and settings.llm.mode != "mock":
# Try to download the tokenizer. If it fails, the LLM will still work
# using the default one, which is less accurate.
try:
set_global_tokenizer(
AutoTokenizer.from_pretrained(
pretrained_model_name_or_path=settings.llm.tokenizer,
cache_dir=str(models_cache_path),
token=settings.huggingface.access_token,
)
)
except Exception as e:
logger.warning(
f"Failed to download tokenizer {settings.llm.tokenizer}: {e!s}"
f"Please follow the instructions in the documentation to download it if needed: "
f"https://docs.privategpt.dev/installation/getting-started/troubleshooting#tokenizer-setup."
f"Falling back to default tokenizer."
)
logger.info("Initializing the LLM in mode=%s", llm_mode)
match settings.llm.mode:
case "local":
from llama_index.llms import LlamaCPP
from private_gpt.components.llm.prompt.prompt_helper import (
get_prompt_style,
)
prompt_style = get_prompt_style(
prompt_style=settings.local.prompt_style,
template_name=settings.local.template_name,
default_system_prompt=settings.local.default_system_prompt,
)
case "llamacpp":
try:
from llama_index.llms.llama_cpp import LlamaCPP # type: ignore
except ImportError as e:
raise ImportError(
"Local dependencies not found, install with `poetry install --extras llms-llama-cpp`"
) from e
prompt_style = get_prompt_style(settings.llm.prompt_style)
settings_kwargs = {
"tfs_z": settings.llamacpp.tfs_z, # ollama and llama-cpp
"top_k": settings.llamacpp.top_k, # ollama and llama-cpp
"top_p": settings.llamacpp.top_p, # ollama and llama-cpp
"repeat_penalty": settings.llamacpp.repeat_penalty, # ollama llama-cpp
"n_gpu_layers": -1,
"offload_kqv": True,
}
self.llm = LlamaCPP(
model_path=str(models_path / settings.local.llm_hf_model_file),
temperature=0.1,
model_path=str(models_path / settings.llamacpp.llm_hf_model_file),
temperature=settings.llm.temperature,
max_new_tokens=settings.llm.max_new_tokens,
# llama2 has a context window of 4096 tokens,
# but we set it lower to allow for some wiggle room
context_window=3900,
context_window=settings.llm.context_window,
generate_kwargs={},
callback_manager=LlamaIndexSettings.callback_manager,
# All to GPU
model_kwargs={"n_gpu_layers": -1},
model_kwargs=settings_kwargs,
# transform inputs into Llama2 format
messages_to_prompt=prompt_style.messages_to_prompt,
completion_to_prompt=prompt_style.completion_to_prompt,
verbose=True,
)
# prompt_style.improve_prompt_format(llm=cast(LlamaCPP, self.llm))
case "sagemaker":
from private_gpt.components.llm.custom.sagemaker import SagemakerLLM
try:
from private_gpt.components.llm.custom.sagemaker import SagemakerLLM
except ImportError as e:
raise ImportError(
"Sagemaker dependencies not found, install with `poetry install --extras llms-sagemaker`"
) from e
self.llm = SagemakerLLM(
endpoint_name=settings.sagemaker.llm_endpoint_name,
max_new_tokens=settings.llm.max_new_tokens,
context_window=settings.llm.context_window,
)
case "openai":
from llama_index.llms import OpenAI
try:
from llama_index.llms.openai import OpenAI # type: ignore
except ImportError as e:
raise ImportError(
"OpenAI dependencies not found, install with `poetry install --extras llms-openai`"
) from e
openai_settings = settings.openai.api_key
self.llm = OpenAI(api_key=openai_settings)
openai_settings = settings.openai
self.llm = OpenAI(
api_base=openai_settings.api_base,
api_key=openai_settings.api_key,
model=openai_settings.model,
)
case "openailike":
try:
from llama_index.llms.openai_like import OpenAILike # type: ignore
except ImportError as e:
raise ImportError(
"OpenAILike dependencies not found, install with `poetry install --extras llms-openai-like`"
) from e
prompt_style = get_prompt_style(settings.llm.prompt_style)
openai_settings = settings.openai
self.llm = OpenAILike(
api_base=openai_settings.api_base,
api_key=openai_settings.api_key,
model=openai_settings.model,
is_chat_model=True,
max_tokens=settings.llm.max_new_tokens,
api_version="",
temperature=settings.llm.temperature,
context_window=settings.llm.context_window,
max_new_tokens=settings.llm.max_new_tokens,
messages_to_prompt=prompt_style.messages_to_prompt,
completion_to_prompt=prompt_style.completion_to_prompt,
tokenizer=settings.llm.tokenizer,
timeout=openai_settings.request_timeout,
reuse_client=False,
)
case "ollama":
try:
from llama_index.llms.ollama import Ollama # type: ignore
except ImportError as e:
raise ImportError(
"Ollama dependencies not found, install with `poetry install --extras llms-ollama`"
) from e
ollama_settings = settings.ollama
settings_kwargs = {
"tfs_z": ollama_settings.tfs_z, # ollama and llama-cpp
"num_predict": ollama_settings.num_predict, # ollama only
"top_k": ollama_settings.top_k, # ollama and llama-cpp
"top_p": ollama_settings.top_p, # ollama and llama-cpp
"repeat_last_n": ollama_settings.repeat_last_n, # ollama
"repeat_penalty": ollama_settings.repeat_penalty, # ollama llama-cpp
}
# calculate llm model. If not provided tag, it will be use latest
model_name = (
ollama_settings.llm_model + ":latest"
if ":" not in ollama_settings.llm_model
else ollama_settings.llm_model
)
llm = Ollama(
model=model_name,
base_url=ollama_settings.api_base,
temperature=settings.llm.temperature,
context_window=settings.llm.context_window,
additional_kwargs=settings_kwargs,
request_timeout=ollama_settings.request_timeout,
)
if ollama_settings.autopull_models:
from private_gpt.utils.ollama import check_connection, pull_model
if not check_connection(llm.client):
raise ValueError(
f"Failed to connect to Ollama, "
f"check if Ollama server is running on {ollama_settings.api_base}"
)
pull_model(llm.client, model_name)
if (
ollama_settings.keep_alive
!= ollama_settings.model_fields["keep_alive"].default
):
# Modify Ollama methods to use the "keep_alive" field.
def add_keep_alive(func: Callable[..., Any]) -> Callable[..., Any]:
def wrapper(*args: Any, **kwargs: Any) -> Any:
kwargs["keep_alive"] = ollama_settings.keep_alive
return func(*args, **kwargs)
return wrapper
Ollama.chat = add_keep_alive(Ollama.chat)
Ollama.stream_chat = add_keep_alive(Ollama.stream_chat)
Ollama.complete = add_keep_alive(Ollama.complete)
Ollama.stream_complete = add_keep_alive(Ollama.stream_complete)
self.llm = llm
case "azopenai":
try:
from llama_index.llms.azure_openai import ( # type: ignore
AzureOpenAI,
)
except ImportError as e:
raise ImportError(
"Azure OpenAI dependencies not found, install with `poetry install --extras llms-azopenai`"
) from e
azopenai_settings = settings.azopenai
self.llm = AzureOpenAI(
model=azopenai_settings.llm_model,
deployment_name=azopenai_settings.llm_deployment_name,
api_key=azopenai_settings.api_key,
azure_endpoint=azopenai_settings.azure_endpoint,
api_version=azopenai_settings.api_version,
)
case "gemini":
try:
from llama_index.llms.gemini import ( # type: ignore
Gemini,
)
except ImportError as e:
raise ImportError(
"Google Gemini dependencies not found, install with `poetry install --extras llms-gemini`"
) from e
gemini_settings = settings.gemini
self.llm = Gemini(
model_name=gemini_settings.model, api_key=gemini_settings.api_key
)
case "mock":
self.llm = MockLLM()

View File

@@ -1,446 +0,0 @@
# Ignoring the mypy check in this file, given that this file is imported only if
# running in local mode (and therefore the llama-cpp-python library is installed).
# type: ignore
"""Helper to get your llama_index messages correctly serialized into a prompt.
This set of classes and functions is used to format a series of
llama_index ChatMessage into a prompt (a unique string) that will be passed
as is to the LLM. The LLM will then use this prompt to generate a completion.
There are **MANY** formats for prompts; usually, each model has its own format.
Models posted on HuggingFace usually have a description of the format they use.
The original models, that are shipped through `transformers`, have their
format defined in the file `tokenizer_config.json` in the model's directory.
The prompt format are usually defined as a Jinja template (with some custom
Jinja token definitions). These prompt templates are usable using
the `transformers.AutoTokenizer`, as described in
https://huggingface.co/docs/transformers/main/chat_templating
Examples of `tokenizer_config.json` files:
https://huggingface.co/bofenghuang/vigogne-2-7b-chat/blob/main/tokenizer_config.json
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/blob/main/tokenizer_config.json
https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/tokenizer_config.json
The format of the prompt is important, as if the wrong one is used, it
will lead to "hallucinations" and other completions that are not relevant.
"""
import abc
import logging
from collections.abc import Sequence
from pathlib import Path
from typing import Any
from jinja2 import FileSystemLoader
from jinja2.exceptions import TemplateError
from jinja2.sandbox import ImmutableSandboxedEnvironment
from llama_cpp import llama_chat_format, llama_types
from llama_index.llms import ChatMessage, MessageRole
from llama_index.llms.llama_utils import (
DEFAULT_SYSTEM_PROMPT,
completion_to_prompt,
messages_to_prompt,
)
from private_gpt.constants import PROJECT_ROOT_PATH
logger = logging.getLogger(__name__)
THIS_DIRECTORY_RELATIVE = Path(__file__).parent.relative_to(PROJECT_ROOT_PATH)
_LLAMA_CPP_PYTHON_CHAT_FORMAT: dict[str, llama_chat_format.ChatFormatter] = {
"llama-2": llama_chat_format.format_llama2,
"alpaca": llama_chat_format.format_alpaca,
"vicuna": llama_chat_format.format,
"oasst_llama": llama_chat_format.format_oasst_llama,
"baichuan-2": llama_chat_format.format_baichuan2,
"baichuan": llama_chat_format.format_baichuan,
"openbuddy": llama_chat_format.format_openbuddy,
"redpajama-incite": llama_chat_format.format_redpajama_incite,
"snoozy": llama_chat_format.format_snoozy,
"phind": llama_chat_format.format_phind,
"intel": llama_chat_format.format_intel,
"open-orca": llama_chat_format.format_open_orca,
"mistrallite": llama_chat_format.format_mistrallite,
"zephyr": llama_chat_format.format_zephyr,
"chatml": llama_chat_format.format_chatml,
"openchat": llama_chat_format.format_openchat,
}
# FIXME partial support
def llama_index_to_llama_cpp_messages(
messages: Sequence[ChatMessage],
) -> list[llama_types.ChatCompletionRequestMessage]:
"""Convert messages from llama_index to llama_cpp format.
Convert a list of llama_index ChatMessage to a
list of llama_cpp ChatCompletionRequestMessage.
"""
llama_cpp_messages: list[llama_types.ChatCompletionRequestMessage] = []
l_msg: llama_types.ChatCompletionRequestMessage
for msg in messages:
if msg.role == MessageRole.SYSTEM:
l_msg = llama_types.ChatCompletionRequestSystemMessage(
content=msg.content, role=msg.role.value
)
elif msg.role == MessageRole.USER:
# FIXME partial support
l_msg = llama_types.ChatCompletionRequestUserMessage(
content=msg.content, role=msg.role.value
)
elif msg.role == MessageRole.ASSISTANT:
# FIXME partial support
l_msg = llama_types.ChatCompletionRequestAssistantMessage(
content=msg.content, role=msg.role.value
)
elif msg.role == MessageRole.TOOL:
# FIXME partial support
l_msg = llama_types.ChatCompletionRequestToolMessage(
content=msg.content, role=msg.role.value, tool_call_id=""
)
elif msg.role == MessageRole.FUNCTION:
# FIXME partial support
l_msg = llama_types.ChatCompletionRequestFunctionMessage(
content=msg.content, role=msg.role.value, name=""
)
else:
raise ValueError(f"Unknown role='{msg.role}'")
llama_cpp_messages.append(l_msg)
return llama_cpp_messages
def _get_llama_cpp_chat_format(name: str) -> llama_chat_format.ChatFormatter:
logger.debug("Getting llama_cpp_python prompt_format='%s'", name)
try:
return _LLAMA_CPP_PYTHON_CHAT_FORMAT[name]
except KeyError as err:
raise ValueError(f"Unknown llama_cpp_python prompt style '{name}'") from err
class AbstractPromptStyle(abc.ABC):
"""Abstract class for prompt styles.
This class is used to format a series of messages into a prompt that can be
understood by the models. A series of messages represents the interaction(s)
between a user and an assistant. This series of messages can be considered as a
session between a user X and an assistant Y.This session holds, through the
messages, the state of the conversation. This session, to be understood by the
model, needs to be formatted into a prompt (i.e. a string that the models
can understand). Prompts can be formatted in different ways,
depending on the model.
The implementations of this class represent the different ways to format a
series of messages into a prompt.
"""
@abc.abstractmethod
def __init__(self, *args: Any, **kwargs: Any) -> None:
logger.debug("Initializing prompt_style=%s", self.__class__.__name__)
self.bos_token = "<s>"
self.eos_token = "</s>"
self.nl_token = "\n"
@abc.abstractmethod
def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
pass
@abc.abstractmethod
def _completion_to_prompt(self, completion: str) -> str:
pass
def messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
logger.debug("Formatting messages='%s' to prompt", messages)
prompt = self._messages_to_prompt(messages)
logger.debug("Got for messages='%s' the prompt='%s'", messages, prompt)
return prompt
def completion_to_prompt(self, completion: str) -> str:
logger.debug("Formatting completion='%s' to prompt", completion)
prompt = self._completion_to_prompt(completion)
logger.debug("Got for completion='%s' the prompt='%s'", completion, prompt)
return prompt
# def improve_prompt_format(self, llm: LlamaCPP) -> None:
# """Improve the prompt format of the given LLM.
#
# Use the given metadata in the LLM to improve the prompt format.
# """
# # FIXME: we are getting IDs (1,2,13) from llama.cpp, and not actual strings
# llama_cpp_llm = cast(Llama, llm._model)
# self.bos_token = llama_cpp_llm.token_bos()
# self.eos_token = llama_cpp_llm.token_eos()
# self.nl_token = llama_cpp_llm.token_nl()
# print([self.bos_token, self.eos_token, self.nl_token])
# # (1,2,13) are the IDs of the tokens
class AbstractPromptStyleWithSystemPrompt(AbstractPromptStyle, abc.ABC):
_DEFAULT_SYSTEM_PROMPT = DEFAULT_SYSTEM_PROMPT
def __init__(
self, default_system_prompt: str | None, *args: Any, **kwargs: Any
) -> None:
super().__init__(*args, **kwargs)
logger.debug("Got default_system_prompt='%s'", default_system_prompt)
self.default_system_prompt = default_system_prompt
def _add_missing_system_prompt(
self, messages: Sequence[ChatMessage]
) -> Sequence[ChatMessage]:
if messages[0].role != MessageRole.SYSTEM:
logger.debug(
"Adding system_promt='%s' to the given messages as there are none given in the session",
self.default_system_prompt,
)
messages = [
ChatMessage(
content=self.default_system_prompt, role=MessageRole.SYSTEM
),
*messages,
]
return messages
class DefaultPromptStyle(AbstractPromptStyle):
"""Default prompt style that uses the defaults from llama_utils.
It basically passes None to the LLM, indicating it should use
the default functions.
"""
def __init__(self, *args: Any, **kwargs: Any) -> None:
super().__init__(*args, **kwargs)
# Hacky way to override the functions
# Override the functions to be None, and pass None to the LLM.
self.messages_to_prompt = None # type: ignore[method-assign, assignment]
self.completion_to_prompt = None # type: ignore[method-assign, assignment]
def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
"""Dummy implementation."""
return ""
def _completion_to_prompt(self, completion: str) -> str:
"""Dummy implementation."""
return ""
class LlamaIndexPromptStyle(AbstractPromptStyleWithSystemPrompt):
"""Simple prompt style that just uses the default llama_utils functions.
It transforms the sequence of messages into a prompt that should look like:
```text
<s> [INST] <<SYS>> your system prompt here. <</SYS>>
user message here [/INST] assistant (model) response here </s>
```
"""
def __init__(
self, default_system_prompt: str | None = None, *args: Any, **kwargs: Any
) -> None:
# If no system prompt is given, the default one of the implementation is used.
# default_system_prompt can be None here
kwargs["default_system_prompt"] = default_system_prompt
super().__init__(*args, **kwargs)
def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
return messages_to_prompt(messages, self.default_system_prompt)
def _completion_to_prompt(self, completion: str) -> str:
return completion_to_prompt(completion, self.default_system_prompt)
class VigognePromptStyle(AbstractPromptStyleWithSystemPrompt):
"""Tag prompt style (used by Vigogne) that uses the prompt style `<|ROLE|>`.
It transforms the sequence of messages into a prompt that should look like:
```text
<|system|>: your system prompt here.
<|user|>: user message here
(possibly with context and question)
<|assistant|>: assistant (model) response here.
```
FIXME: should we add surrounding `<s>` and `</s>` tags, like in llama2?
"""
def __init__(
self,
default_system_prompt: str | None = None,
add_generation_prompt: bool = True,
*args: Any,
**kwargs: Any,
) -> None:
# We have to define a default system prompt here as the LLM will not
# use the default llama_utils functions.
default_system_prompt = default_system_prompt or self._DEFAULT_SYSTEM_PROMPT
kwargs["default_system_prompt"] = default_system_prompt
super().__init__(*args, **kwargs)
self.add_generation_prompt = add_generation_prompt
def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
messages = self._add_missing_system_prompt(messages)
return self._format_messages_to_prompt(messages)
def _completion_to_prompt(self, completion: str) -> str:
messages = [ChatMessage(content=completion, role=MessageRole.USER)]
return self._format_messages_to_prompt(messages)
def _format_messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
# TODO add BOS and EOS TOKEN !!!!! (c.f. jinja template)
"""Format message to prompt with `<|ROLE|>: MSG` style."""
assert messages[0].role == MessageRole.SYSTEM
prompt = ""
# TODO enclose the interaction between self.token_bos and self.token_eos
for message in messages:
role = message.role
content = message.content or ""
message_from_user = f"<|{role.lower()}|>: {content.strip()}"
message_from_user += self.nl_token
prompt += message_from_user
if self.add_generation_prompt:
# we are missing the last <|assistant|> tag that will trigger a completion
prompt += "<|assistant|>: "
return prompt
class LlamaCppPromptStyle(AbstractPromptStyleWithSystemPrompt):
def __init__(
self,
prompt_style: str,
default_system_prompt: str | None = None,
*args: Any,
**kwargs: Any,
) -> None:
"""Wrapper for llama_cpp_python defined prompt format.
:param prompt_style:
:param default_system_prompt: Used if no system prompt is given in the messages.
"""
assert prompt_style.startswith("llama_cpp.")
default_system_prompt = default_system_prompt or self._DEFAULT_SYSTEM_PROMPT
kwargs["default_system_prompt"] = default_system_prompt
super().__init__(*args, **kwargs)
self.prompt_style = prompt_style[len("llama_cpp.") :]
if self.prompt_style is None:
return
self._llama_cpp_formatter = _get_llama_cpp_chat_format(self.prompt_style)
def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
messages = self._add_missing_system_prompt(messages)
return self._llama_cpp_formatter(
messages=llama_index_to_llama_cpp_messages(messages)
).prompt
def _completion_to_prompt(self, completion: str) -> str:
messages = self._add_missing_system_prompt(
[ChatMessage(content=completion, role=MessageRole.USER)]
)
return self._llama_cpp_formatter(
messages=llama_index_to_llama_cpp_messages(messages)
).prompt
class TemplatePromptStyle(AbstractPromptStyleWithSystemPrompt):
def __init__(
self,
template_name: str,
template_dir: str | None = None,
add_generation_prompt: bool = True,
default_system_prompt: str | None = None,
*args: Any,
**kwargs: Any,
) -> None:
"""Prompt format using a Jinja template.
:param template_name: the filename of the template to use, must be in
the `./template/` directory.
:param template_dir: the directory where the template is located.
Defaults to `./template/`.
:param default_system_prompt: Used if no system prompt is
given in the messages.
"""
default_system_prompt = default_system_prompt or DEFAULT_SYSTEM_PROMPT
kwargs["default_system_prompt"] = default_system_prompt
super().__init__(*args, **kwargs)
self._add_generation_prompt = add_generation_prompt
def raise_exception(message: str) -> None:
raise TemplateError(message)
if template_dir is None:
self.template_dir = THIS_DIRECTORY_RELATIVE / "template"
else:
self.template_dir = Path(template_dir)
self._jinja_fs_loader = FileSystemLoader(searchpath=self.template_dir)
self._jinja_env = ImmutableSandboxedEnvironment(
loader=self._jinja_fs_loader, trim_blocks=True, lstrip_blocks=True
)
self._jinja_env.globals["raise_exception"] = raise_exception
self.template = self._jinja_env.get_template(template_name)
@property
def _extra_kwargs_render(self) -> dict[str, Any]:
return {
"eos_token": self.eos_token,
"bos_token": self.bos_token,
"nl_token": self.nl_token,
}
@staticmethod
def _j_raise_exception(x: str) -> None:
"""Helper method to let Jinja template raise exceptions."""
raise RuntimeError(x)
def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
messages = self._add_missing_system_prompt(messages)
msgs = [{"role": msg.role.value, "content": msg.content} for msg in messages]
return self.template.render(
messages=msgs,
add_generation_prompt=self._add_generation_prompt,
**self._extra_kwargs_render,
)
def _completion_to_prompt(self, completion: str) -> str:
messages = self._add_missing_system_prompt(
[
ChatMessage(content=completion, role=MessageRole.USER),
]
)
return self._messages_to_prompt(messages)
# TODO Maybe implement an auto-prompt style?
# Pass all the arguments at once
def get_prompt_style(
prompt_style: str | None,
**kwargs: Any,
) -> AbstractPromptStyle:
"""Get the prompt style to use from the given string.
:param prompt_style: The prompt style to use.
:return: The prompt style to use.
"""
if prompt_style is None:
return DefaultPromptStyle(**kwargs)
if prompt_style.startswith("llama_cpp."):
return LlamaCppPromptStyle(prompt_style, **kwargs)
elif prompt_style == "llama2":
return LlamaIndexPromptStyle(**kwargs)
elif prompt_style == "vigogne":
return VigognePromptStyle(**kwargs)
elif prompt_style == "template":
return TemplatePromptStyle(**kwargs)
raise ValueError(f"Unknown prompt_style='{prompt_style}'")

View File

@@ -1,2 +0,0 @@
{# This template is coming from: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/blob/main/tokenizer_config.json #}
{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}

View File

@@ -1,2 +0,0 @@
{# This template is coming from: https://huggingface.co/bofenghuang/vigogne-2-7b-chat/blob/main/tokenizer_config.json #}
{{ bos_token }}{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% elif true == true %}{% set loop_messages = messages %}{% set system_message = 'Vous êtes Vigogne, un assistant IA créé par Zaion Lab. Vous suivez extrêmement bien les instructions. Aidez autant que vous le pouvez.' %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% if system_message != false %}{{ '<|system|>: ' + system_message + '\n' }}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '<|user|>: ' + message['content'].strip() + '\n' }}{% elif message['role'] == 'assistant' %}{{ '<|assistant|>: ' + message['content'].strip() + eos_token + '\n' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>:' }}{% endif %}

View File

@@ -1,2 +0,0 @@
{# This template is coming from: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/tokenizer_config.json #}
{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}

View File

@@ -0,0 +1,308 @@
import abc
import logging
from collections.abc import Sequence
from typing import Any, Literal
from llama_index.core.llms import ChatMessage, MessageRole
logger = logging.getLogger(__name__)
class AbstractPromptStyle(abc.ABC):
"""Abstract class for prompt styles.
This class is used to format a series of messages into a prompt that can be
understood by the models. A series of messages represents the interaction(s)
between a user and an assistant. This series of messages can be considered as a
session between a user X and an assistant Y.This session holds, through the
messages, the state of the conversation. This session, to be understood by the
model, needs to be formatted into a prompt (i.e. a string that the models
can understand). Prompts can be formatted in different ways,
depending on the model.
The implementations of this class represent the different ways to format a
series of messages into a prompt.
"""
def __init__(self, *args: Any, **kwargs: Any) -> None:
logger.debug("Initializing prompt_style=%s", self.__class__.__name__)
@abc.abstractmethod
def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
pass
@abc.abstractmethod
def _completion_to_prompt(self, completion: str) -> str:
pass
def messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
prompt = self._messages_to_prompt(messages)
logger.debug("Got for messages='%s' the prompt='%s'", messages, prompt)
return prompt
def completion_to_prompt(self, completion: str) -> str:
prompt = self._completion_to_prompt(completion)
logger.debug("Got for completion='%s' the prompt='%s'", completion, prompt)
return prompt
class DefaultPromptStyle(AbstractPromptStyle):
"""Default prompt style that uses the defaults from llama_utils.
It basically passes None to the LLM, indicating it should use
the default functions.
"""
def __init__(self, *args: Any, **kwargs: Any) -> None:
super().__init__(*args, **kwargs)
# Hacky way to override the functions
# Override the functions to be None, and pass None to the LLM.
self.messages_to_prompt = None # type: ignore[method-assign, assignment]
self.completion_to_prompt = None # type: ignore[method-assign, assignment]
def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
return ""
def _completion_to_prompt(self, completion: str) -> str:
return ""
class Llama2PromptStyle(AbstractPromptStyle):
"""Simple prompt style that uses llama 2 prompt style.
Inspired by llama_index/legacy/llms/llama_utils.py
It transforms the sequence of messages into a prompt that should look like:
```text
<s> [INST] <<SYS>> your system prompt here. <</SYS>>
user message here [/INST] assistant (model) response here </s>
```
"""
BOS, EOS = "<s>", "</s>"
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
DEFAULT_SYSTEM_PROMPT = """\
You are a helpful, respectful and honest assistant. \
Always answer as helpfully as possible and follow ALL given instructions. \
Do not speculate or make up information. \
Do not reference any given instructions or context. \
"""
def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
string_messages: list[str] = []
if messages[0].role == MessageRole.SYSTEM:
# pull out the system message (if it exists in messages)
system_message_str = messages[0].content or ""
messages = messages[1:]
else:
system_message_str = self.DEFAULT_SYSTEM_PROMPT
system_message_str = f"{self.B_SYS} {system_message_str.strip()} {self.E_SYS}"
for i in range(0, len(messages), 2):
# first message should always be a user
user_message = messages[i]
assert user_message.role == MessageRole.USER
if i == 0:
# make sure system prompt is included at the start
str_message = f"{self.BOS} {self.B_INST} {system_message_str} "
else:
# end previous user-assistant interaction
string_messages[-1] += f" {self.EOS}"
# no need to include system prompt
str_message = f"{self.BOS} {self.B_INST} "
# include user message content
str_message += f"{user_message.content} {self.E_INST}"
if len(messages) > (i + 1):
# if assistant message exists, add to str_message
assistant_message = messages[i + 1]
assert assistant_message.role == MessageRole.ASSISTANT
str_message += f" {assistant_message.content}"
string_messages.append(str_message)
return "".join(string_messages)
def _completion_to_prompt(self, completion: str) -> str:
system_prompt_str = self.DEFAULT_SYSTEM_PROMPT
return (
f"{self.BOS} {self.B_INST} {self.B_SYS} {system_prompt_str.strip()} {self.E_SYS} "
f"{completion.strip()} {self.E_INST}"
)
class Llama3PromptStyle(AbstractPromptStyle):
r"""Template for Meta's Llama 3.1.
The format follows this structure:
<|begin_of_text|>
<|start_header_id|>system<|end_header_id|>
[System message content]<|eot_id|>
<|start_header_id|>user<|end_header_id|>
[User message content]<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
[Assistant message content]<|eot_id|>
...
(Repeat for each message, including possible 'ipython' role)
"""
BOS, EOS = "<|begin_of_text|>", "<|end_of_text|>"
B_INST, E_INST = "<|start_header_id|>", "<|end_header_id|>"
EOT = "<|eot_id|>"
B_SYS, E_SYS = "<|start_header_id|>system<|end_header_id|>", "<|eot_id|>"
ASSISTANT_INST = "<|start_header_id|>assistant<|end_header_id|>"
DEFAULT_SYSTEM_PROMPT = """\
You are a helpful, respectful and honest assistant. \
Always answer as helpfully as possible and follow ALL given instructions. \
Do not speculate or make up information. \
Do not reference any given instructions or context. \
"""
def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
prompt = ""
has_system_message = False
for i, message in enumerate(messages):
if not message or message.content is None:
continue
if message.role == MessageRole.SYSTEM:
prompt += f"{self.B_SYS}\n\n{message.content.strip()}{self.E_SYS}"
has_system_message = True
else:
role_header = f"{self.B_INST}{message.role.value}{self.E_INST}"
prompt += f"{role_header}\n\n{message.content.strip()}{self.EOT}"
# Add assistant header if the last message is not from the assistant
if i == len(messages) - 1 and message.role != MessageRole.ASSISTANT:
prompt += f"{self.ASSISTANT_INST}\n\n"
# Add default system prompt if no system message was provided
if not has_system_message:
prompt = (
f"{self.B_SYS}\n\n{self.DEFAULT_SYSTEM_PROMPT}{self.E_SYS}" + prompt
)
# TODO: Implement tool handling logic
return prompt
def _completion_to_prompt(self, completion: str) -> str:
return (
f"{self.B_SYS}\n\n{self.DEFAULT_SYSTEM_PROMPT}{self.E_SYS}"
f"{self.B_INST}user{self.E_INST}\n\n{completion.strip()}{self.EOT}"
f"{self.ASSISTANT_INST}\n\n"
)
class TagPromptStyle(AbstractPromptStyle):
"""Tag prompt style (used by Vigogne) that uses the prompt style `<|ROLE|>`.
It transforms the sequence of messages into a prompt that should look like:
```text
<|system|>: your system prompt here.
<|user|>: user message here
(possibly with context and question)
<|assistant|>: assistant (model) response here.
```
FIXME: should we add surrounding `<s>` and `</s>` tags, like in llama2?
"""
def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
"""Format message to prompt with `<|ROLE|>: MSG` style."""
prompt = ""
for message in messages:
role = message.role
content = message.content or ""
message_from_user = f"<|{role.lower()}|>: {content.strip()}"
message_from_user += "\n"
prompt += message_from_user
# we are missing the last <|assistant|> tag that will trigger a completion
prompt += "<|assistant|>: "
return prompt
def _completion_to_prompt(self, completion: str) -> str:
return self._messages_to_prompt(
[ChatMessage(content=completion, role=MessageRole.USER)]
)
class MistralPromptStyle(AbstractPromptStyle):
def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
inst_buffer = []
text = ""
for message in messages:
if message.role == MessageRole.SYSTEM or message.role == MessageRole.USER:
inst_buffer.append(str(message.content).strip())
elif message.role == MessageRole.ASSISTANT:
text += "<s>[INST] " + "\n".join(inst_buffer) + " [/INST]"
text += " " + str(message.content).strip() + "</s>"
inst_buffer.clear()
else:
raise ValueError(f"Unknown message role {message.role}")
if len(inst_buffer) > 0:
text += "<s>[INST] " + "\n".join(inst_buffer) + " [/INST]"
return text
def _completion_to_prompt(self, completion: str) -> str:
return self._messages_to_prompt(
[ChatMessage(content=completion, role=MessageRole.USER)]
)
class ChatMLPromptStyle(AbstractPromptStyle):
def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
prompt = "<|im_start|>system\n"
for message in messages:
role = message.role
content = message.content or ""
if role.lower() == "system":
message_from_user = f"{content.strip()}"
prompt += message_from_user
elif role.lower() == "user":
prompt += "<|im_end|>\n<|im_start|>user\n"
message_from_user = f"{content.strip()}<|im_end|>\n"
prompt += message_from_user
prompt += "<|im_start|>assistant\n"
return prompt
def _completion_to_prompt(self, completion: str) -> str:
return self._messages_to_prompt(
[ChatMessage(content=completion, role=MessageRole.USER)]
)
def get_prompt_style(
prompt_style: Literal["default", "llama2", "llama3", "tag", "mistral", "chatml"]
| None
) -> AbstractPromptStyle:
"""Get the prompt style to use from the given string.
:param prompt_style: The prompt style to use.
:return: The prompt style to use.
"""
if prompt_style is None or prompt_style == "default":
return DefaultPromptStyle()
elif prompt_style == "llama2":
return Llama2PromptStyle()
elif prompt_style == "llama3":
return Llama3PromptStyle()
elif prompt_style == "tag":
return TagPromptStyle()
elif prompt_style == "mistral":
return MistralPromptStyle()
elif prompt_style == "chatml":
return ChatMLPromptStyle()
raise ValueError(f"Unknown prompt_style='{prompt_style}'")

View File

@@ -1,11 +1,12 @@
import logging
from injector import inject, singleton
from llama_index.storage.docstore import BaseDocumentStore, SimpleDocumentStore
from llama_index.storage.index_store import SimpleIndexStore
from llama_index.storage.index_store.types import BaseIndexStore
from llama_index.core.storage.docstore import BaseDocumentStore, SimpleDocumentStore
from llama_index.core.storage.index_store import SimpleIndexStore
from llama_index.core.storage.index_store.types import BaseIndexStore
from private_gpt.paths import local_data_path
from private_gpt.settings.settings import Settings
logger = logging.getLogger(__name__)
@@ -16,19 +17,51 @@ class NodeStoreComponent:
doc_store: BaseDocumentStore
@inject
def __init__(self) -> None:
try:
self.index_store = SimpleIndexStore.from_persist_dir(
persist_dir=str(local_data_path)
)
except FileNotFoundError:
logger.debug("Local index store not found, creating a new one")
self.index_store = SimpleIndexStore()
def __init__(self, settings: Settings) -> None:
match settings.nodestore.database:
case "simple":
try:
self.index_store = SimpleIndexStore.from_persist_dir(
persist_dir=str(local_data_path)
)
except FileNotFoundError:
logger.debug("Local index store not found, creating a new one")
self.index_store = SimpleIndexStore()
try:
self.doc_store = SimpleDocumentStore.from_persist_dir(
persist_dir=str(local_data_path)
)
except FileNotFoundError:
logger.debug("Local document store not found, creating a new one")
self.doc_store = SimpleDocumentStore()
try:
self.doc_store = SimpleDocumentStore.from_persist_dir(
persist_dir=str(local_data_path)
)
except FileNotFoundError:
logger.debug("Local document store not found, creating a new one")
self.doc_store = SimpleDocumentStore()
case "postgres":
try:
from llama_index.core.storage.docstore.postgres_docstore import (
PostgresDocumentStore,
)
from llama_index.core.storage.index_store.postgres_index_store import (
PostgresIndexStore,
)
except ImportError:
raise ImportError(
"Postgres dependencies not found, install with `poetry install --extras storage-nodestore-postgres`"
) from None
if settings.postgres is None:
raise ValueError("Postgres index/doc store settings not found.")
self.index_store = PostgresIndexStore.from_params(
**settings.postgres.model_dump(exclude_none=True)
)
self.doc_store = PostgresDocumentStore.from_params(
**settings.postgres.model_dump(exclude_none=True)
)
case _:
# Should be unreachable
# The settings validator should have caught this
raise ValueError(
f"Database {settings.nodestore.database} not supported"
)

View File

@@ -1,12 +1,28 @@
from collections.abc import Generator
from typing import Any
from llama_index.schema import BaseNode, MetadataMode
from llama_index.vector_stores import ChromaVectorStore
from llama_index.vector_stores.chroma import chunk_list
from llama_index.vector_stores.utils import node_to_metadata_dict
from llama_index.core.schema import BaseNode, MetadataMode
from llama_index.core.vector_stores.utils import node_to_metadata_dict
from llama_index.vector_stores.chroma import ChromaVectorStore # type: ignore
class BatchedChromaVectorStore(ChromaVectorStore):
def chunk_list(
lst: list[BaseNode], max_chunk_size: int
) -> Generator[list[BaseNode], None, None]:
"""Yield successive max_chunk_size-sized chunks from lst.
Args:
lst (List[BaseNode]): list of nodes with embeddings
max_chunk_size (int): max chunk size
Yields:
Generator[List[BaseNode], None, None]: list of nodes with embeddings
"""
for i in range(0, len(lst), max_chunk_size):
yield lst[i : i + max_chunk_size]
class BatchedChromaVectorStore(ChromaVectorStore): # type: ignore
"""Chroma vector store, batching additions to avoid reaching the max batch limit.
In this vector store, embeddings are stored within a ChromaDB collection.

View File

@@ -2,11 +2,14 @@ import logging
import typing
from injector import inject, singleton
from llama_index import VectorStoreIndex
from llama_index.indices.vector_store import VectorIndexRetriever
from llama_index.vector_stores.types import VectorStore
from llama_index.core.indices.vector_store import VectorIndexRetriever, VectorStoreIndex
from llama_index.core.vector_stores.types import (
BasePydanticVectorStore,
FilterCondition,
MetadataFilter,
MetadataFilters,
)
from private_gpt.components.vector_store.batched_chroma import BatchedChromaVectorStore
from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.paths import local_data_path
from private_gpt.settings.settings import Settings
@@ -14,43 +17,64 @@ from private_gpt.settings.settings import Settings
logger = logging.getLogger(__name__)
@typing.no_type_check
def _chromadb_doc_id_metadata_filter(
def _doc_id_metadata_filter(
context_filter: ContextFilter | None,
) -> dict | None:
if context_filter is None or context_filter.docs_ids is None:
return {} # No filter
elif len(context_filter.docs_ids) < 1:
return {"doc_id": "-"} # Effectively filtering out all docs
else:
doc_filter_items = []
if len(context_filter.docs_ids) > 1:
doc_filter = {"$or": doc_filter_items}
for doc_id in context_filter.docs_ids:
doc_filter_items.append({"doc_id": doc_id})
else:
doc_filter = {"doc_id": context_filter.docs_ids[0]}
return doc_filter
) -> MetadataFilters:
filters = MetadataFilters(filters=[], condition=FilterCondition.OR)
if context_filter is not None and context_filter.docs_ids is not None:
for doc_id in context_filter.docs_ids:
filters.filters.append(MetadataFilter(key="doc_id", value=doc_id))
return filters
@singleton
class VectorStoreComponent:
vector_store: VectorStore
settings: Settings
vector_store: BasePydanticVectorStore
@inject
def __init__(self, settings: Settings) -> None:
self.settings = settings
match settings.vectorstore.database:
case "postgres":
try:
from llama_index.vector_stores.postgres import ( # type: ignore
PGVectorStore,
)
except ImportError as e:
raise ImportError(
"Postgres dependencies not found, install with `poetry install --extras vector-stores-postgres`"
) from e
if settings.postgres is None:
raise ValueError(
"Postgres settings not found. Please provide settings."
)
self.vector_store = typing.cast(
BasePydanticVectorStore,
PGVectorStore.from_params(
**settings.postgres.model_dump(exclude_none=True),
table_name="embeddings",
embed_dim=settings.embedding.embed_dim,
),
)
case "chroma":
try:
import chromadb # type: ignore
from chromadb.config import ( # type: ignore
Settings as ChromaSettings,
)
from private_gpt.components.vector_store.batched_chroma import (
BatchedChromaVectorStore,
)
except ImportError as e:
raise ImportError(
"'chromadb' is not installed."
"To use PrivateGPT with Chroma, install the 'chroma' extra."
"`poetry install --extras chroma`"
"ChromaDB dependencies not found, install with `poetry install --extras vector-stores-chroma`"
) from e
chroma_settings = ChromaSettings(anonymized_telemetry=False)
@@ -63,15 +87,22 @@ class VectorStoreComponent:
) # TODO
self.vector_store = typing.cast(
VectorStore,
BasePydanticVectorStore,
BatchedChromaVectorStore(
chroma_client=chroma_client, chroma_collection=chroma_collection
),
)
case "qdrant":
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
try:
from llama_index.vector_stores.qdrant import ( # type: ignore
QdrantVectorStore,
)
from qdrant_client import QdrantClient # type: ignore
except ImportError as e:
raise ImportError(
"Qdrant dependencies not found, install with `poetry install --extras vector-stores-qdrant`"
) from e
if settings.qdrant is None:
logger.info(
@@ -84,12 +115,78 @@ class VectorStoreComponent:
**settings.qdrant.model_dump(exclude_none=True)
)
self.vector_store = typing.cast(
VectorStore,
BasePydanticVectorStore,
QdrantVectorStore(
client=client,
collection_name="make_this_parameterizable_per_api_call",
), # TODO
)
case "milvus":
try:
from llama_index.vector_stores.milvus import ( # type: ignore
MilvusVectorStore,
)
except ImportError as e:
raise ImportError(
"Milvus dependencies not found, install with `poetry install --extras vector-stores-milvus`"
) from e
if settings.milvus is None:
logger.info(
"Milvus config not found. Using default settings.\n"
"Trying to connect to Milvus at local_data/private_gpt/milvus/milvus_local.db "
"with collection 'make_this_parameterizable_per_api_call'."
)
self.vector_store = typing.cast(
BasePydanticVectorStore,
MilvusVectorStore(
dim=settings.embedding.embed_dim,
collection_name="make_this_parameterizable_per_api_call",
overwrite=True,
),
)
else:
self.vector_store = typing.cast(
BasePydanticVectorStore,
MilvusVectorStore(
dim=settings.embedding.embed_dim,
uri=settings.milvus.uri,
token=settings.milvus.token,
collection_name=settings.milvus.collection_name,
overwrite=settings.milvus.overwrite,
),
)
case "clickhouse":
try:
from clickhouse_connect import ( # type: ignore
get_client,
)
from llama_index.vector_stores.clickhouse import ( # type: ignore
ClickHouseVectorStore,
)
except ImportError as e:
raise ImportError(
"ClickHouse dependencies not found, install with `poetry install --extras vector-stores-clickhouse`"
) from e
if settings.clickhouse is None:
raise ValueError(
"ClickHouse settings not found. Please provide settings."
)
clickhouse_client = get_client(
host=settings.clickhouse.host,
port=settings.clickhouse.port,
username=settings.clickhouse.username,
password=settings.clickhouse.password,
)
self.vector_store = ClickHouseVectorStore(
clickhouse_client=clickhouse_client
)
case _:
# Should be unreachable
# The settings validator should have caught this
@@ -97,20 +194,22 @@ class VectorStoreComponent:
f"Vectorstore database {settings.vectorstore.database} not supported"
)
@staticmethod
def get_retriever(
self,
index: VectorStoreIndex,
context_filter: ContextFilter | None = None,
similarity_top_k: int = 2,
) -> VectorIndexRetriever:
# This way we support qdrant (using doc_ids) and chroma (using where clause)
# This way we support qdrant (using doc_ids) and the rest (using filters)
return VectorIndexRetriever(
index=index,
similarity_top_k=similarity_top_k,
doc_ids=context_filter.docs_ids if context_filter else None,
vector_store_kwargs={
"where": _chromadb_doc_id_metadata_filter(context_filter)
},
filters=(
_doc_id_metadata_filter(context_filter)
if self.settings.vectorstore.database != "qdrant"
else None
),
)
def close(self) -> None:

View File

@@ -1,19 +1,21 @@
"""FastAPI app creation, logger configuration and main API routes."""
import logging
from typing import Any
from fastapi import Depends, FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware
from fastapi.openapi.utils import get_openapi
from injector import Injector
from llama_index.core.callbacks import CallbackManager
from llama_index.core.callbacks.global_handlers import create_global_handler
from llama_index.core.settings import Settings as LlamaIndexSettings
from private_gpt.paths import docs_path
from private_gpt.server.chat.chat_router import chat_router
from private_gpt.server.chunks.chunks_router import chunks_router
from private_gpt.server.completions.completions_router import completions_router
from private_gpt.server.embeddings.embeddings_router import embeddings_router
from private_gpt.server.health.health_router import health_router
from private_gpt.server.ingest.ingest_router import ingest_router
from private_gpt.server.recipes.summarize.summarize_router import summarize_router
from private_gpt.settings.settings import Settings
logger = logging.getLogger(__name__)
@@ -22,107 +24,46 @@ logger = logging.getLogger(__name__)
def create_app(root_injector: Injector) -> FastAPI:
# Start the API
with open(docs_path / "description.md") as description_file:
description = description_file.read()
async def bind_injector_to_request(request: Request) -> None:
request.state.injector = root_injector
tags_metadata = [
{
"name": "Ingestion",
"description": "High-level APIs covering document ingestion -internally "
"managing document parsing, splitting,"
"metadata extraction, embedding generation and storage- and ingested "
"documents CRUD."
"Each ingested document is identified by an ID that can be used to filter the "
"context"
"used in *Contextual Completions* and *Context Chunks* APIs.",
},
{
"name": "Contextual Completions",
"description": "High-level APIs covering contextual Chat and Completions. They "
"follow OpenAI's format, extending it to "
"allow using the context coming from ingested documents to create the "
"response. Internally"
"manage context retrieval, prompt engineering and the response generation.",
},
{
"name": "Context Chunks",
"description": "Low-level API that given a query return relevant chunks of "
"text coming from the ingested"
"documents.",
},
{
"name": "Embeddings",
"description": "Low-level API to obtain the vector representation of a given "
"text, using an Embeddings model."
"Follows OpenAI's embeddings API format.",
},
{
"name": "Health",
"description": "Simple health API to make sure the server is up and running.",
},
]
app = FastAPI(dependencies=[Depends(bind_injector_to_request)])
async def bind_injector_to_request(request: Request) -> None:
request.state.injector = root_injector
app.include_router(completions_router)
app.include_router(chat_router)
app.include_router(chunks_router)
app.include_router(ingest_router)
app.include_router(summarize_router)
app.include_router(embeddings_router)
app.include_router(health_router)
app = FastAPI(dependencies=[Depends(bind_injector_to_request)])
# Add LlamaIndex simple observability
global_handler = create_global_handler("simple")
if global_handler:
LlamaIndexSettings.callback_manager = CallbackManager([global_handler])
def custom_openapi() -> dict[str, Any]:
if app.openapi_schema:
return app.openapi_schema
openapi_schema = get_openapi(
title="PrivateGPT",
description=description,
version="0.1.0",
summary="PrivateGPT is a production-ready AI project that allows you to "
"ask questions to your documents using the power of Large Language "
"Models (LLMs), even in scenarios without Internet connection. "
"100% private, no data leaves your execution environment at any point.",
contact={
"url": "https://github.com/imartinez/privateGPT",
},
license_info={
"name": "Apache 2.0",
"url": "https://www.apache.org/licenses/LICENSE-2.0.html",
},
routes=app.routes,
tags=tags_metadata,
)
openapi_schema["info"]["x-logo"] = {
"url": "https://lh3.googleusercontent.com/drive-viewer"
"/AK7aPaD_iNlMoTquOBsw4boh4tIYxyEuhz6EtEs8nzq3yNkNAK00xGj"
"E1KUCmPJSk3TYOjcs6tReG6w_cLu1S7L_gPgT9z52iw=s2560"
}
settings = root_injector.get(Settings)
if settings.server.cors.enabled:
logger.debug("Setting up CORS middleware")
app.add_middleware(
CORSMiddleware,
allow_credentials=settings.server.cors.allow_credentials,
allow_origins=settings.server.cors.allow_origins,
allow_origin_regex=settings.server.cors.allow_origin_regex,
allow_methods=settings.server.cors.allow_methods,
allow_headers=settings.server.cors.allow_headers,
)
app.openapi_schema = openapi_schema
return app.openapi_schema
app.openapi = custom_openapi # type: ignore[method-assign]
app.include_router(completions_router)
app.include_router(chat_router)
app.include_router(chunks_router)
app.include_router(ingest_router)
app.include_router(embeddings_router)
app.include_router(health_router)
settings = root_injector.get(Settings)
if settings.server.cors.enabled:
logger.debug("Setting up CORS middleware")
app.add_middleware(
CORSMiddleware,
allow_credentials=settings.server.cors.allow_credentials,
allow_origins=settings.server.cors.allow_origins,
allow_origin_regex=settings.server.cors.allow_origin_regex,
allow_methods=settings.server.cors.allow_methods,
allow_headers=settings.server.cors.allow_headers,
)
if settings.ui.enabled:
logger.debug("Importing the UI module")
if settings.ui.enabled:
logger.debug("Importing the UI module")
try:
from private_gpt.ui.ui import PrivateGptUi
except ImportError as e:
raise ImportError(
"UI dependencies not found, install with `poetry install --extras ui`"
) from e
ui = root_injector.get(PrivateGptUi)
ui.mount_in_app(app, settings.ui.path)
ui = root_injector.get(PrivateGptUi)
ui.mount_in_app(app, settings.ui.path)
return app
return app

View File

@@ -1,11 +1,6 @@
"""FastAPI app creation, logger configuration and main API routes."""
import llama_index
from private_gpt.di import global_injector
from private_gpt.launcher import create_app
# Add LlamaIndex simple observability
llama_index.set_global_handler("simple")
app = create_app(global_injector)

View File

@@ -3,7 +3,7 @@ import uuid
from collections.abc import Iterator
from typing import Literal
from llama_index.llms import ChatResponse, CompletionResponse
from llama_index.core.llms import ChatResponse, CompletionResponse
from pydantic import BaseModel, Field
from private_gpt.server.chunks.chunks_service import Chunk
@@ -118,5 +118,5 @@ def to_openai_sse_stream(
yield f"data: {OpenAICompletion.json_from_delta(text=response.delta)}\n\n"
else:
yield f"data: {OpenAICompletion.json_from_delta(text=response, sources=sources)}\n\n"
yield f"data: {OpenAICompletion.json_from_delta(text=None, finish_reason='stop')}\n\n"
yield f"data: {OpenAICompletion.json_from_delta(text='', finish_reason='stop')}\n\n"
yield "data: [DONE]\n\n"

View File

@@ -1,5 +1,5 @@
from fastapi import APIRouter, Depends, Request
from llama_index.llms import ChatMessage, MessageRole
from llama_index.core.llms import ChatMessage, MessageRole
from pydantic import BaseModel
from starlette.responses import StreamingResponse
@@ -54,6 +54,13 @@ class ChatBody(BaseModel):
response_model=None,
responses={200: {"model": OpenAICompletion}},
tags=["Contextual Completions"],
openapi_extra={
"x-fern-streaming": {
"stream-condition": "stream",
"response": {"$ref": "#/components/schemas/OpenAICompletion"},
"response-stream": {"$ref": "#/components/schemas/OpenAICompletion"},
}
},
)
def chat_completion(
request: Request, body: ChatBody

View File

@@ -1,14 +1,19 @@
from dataclasses import dataclass
from injector import inject, singleton
from llama_index import ServiceContext, StorageContext, VectorStoreIndex
from llama_index.chat_engine import ContextChatEngine, SimpleChatEngine
from llama_index.chat_engine.types import (
from llama_index.core.chat_engine import ContextChatEngine, SimpleChatEngine
from llama_index.core.chat_engine.types import (
BaseChatEngine,
)
from llama_index.indices.postprocessor import MetadataReplacementPostProcessor
from llama_index.llms import ChatMessage, MessageRole
from llama_index.types import TokenGen
from llama_index.core.indices import VectorStoreIndex
from llama_index.core.indices.postprocessor import MetadataReplacementPostProcessor
from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.core.postprocessor import (
SentenceTransformerRerank,
SimilarityPostprocessor,
)
from llama_index.core.storage import StorageContext
from llama_index.core.types import TokenGen
from pydantic import BaseModel
from private_gpt.components.embedding.embedding_component import EmbeddingComponent
@@ -19,6 +24,7 @@ from private_gpt.components.vector_store.vector_store_component import (
)
from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.server.chunks.chunks_service import Chunk
from private_gpt.settings.settings import Settings
class Completion(BaseModel):
@@ -67,28 +73,31 @@ class ChatEngineInput:
@singleton
class ChatService:
settings: Settings
@inject
def __init__(
self,
settings: Settings,
llm_component: LLMComponent,
vector_store_component: VectorStoreComponent,
embedding_component: EmbeddingComponent,
node_store_component: NodeStoreComponent,
) -> None:
self.llm_service = llm_component
self.settings = settings
self.llm_component = llm_component
self.embedding_component = embedding_component
self.vector_store_component = vector_store_component
self.storage_context = StorageContext.from_defaults(
vector_store=vector_store_component.vector_store,
docstore=node_store_component.doc_store,
index_store=node_store_component.index_store,
)
self.service_context = ServiceContext.from_defaults(
llm=llm_component.llm, embed_model=embedding_component.embedding_model
)
self.index = VectorStoreIndex.from_vector_store(
vector_store_component.vector_store,
storage_context=self.storage_context,
service_context=self.service_context,
llm=llm_component.llm,
embed_model=embedding_component.embedding_model,
show_progress=True,
)
@@ -98,22 +107,36 @@ class ChatService:
use_context: bool = False,
context_filter: ContextFilter | None = None,
) -> BaseChatEngine:
settings = self.settings
if use_context:
vector_index_retriever = self.vector_store_component.get_retriever(
index=self.index, context_filter=context_filter
index=self.index,
context_filter=context_filter,
similarity_top_k=self.settings.rag.similarity_top_k,
)
node_postprocessors = [
MetadataReplacementPostProcessor(target_metadata_key="window"),
SimilarityPostprocessor(
similarity_cutoff=settings.rag.similarity_value
),
]
if settings.rag.rerank.enabled:
rerank_postprocessor = SentenceTransformerRerank(
model=settings.rag.rerank.model, top_n=settings.rag.rerank.top_n
)
node_postprocessors.append(rerank_postprocessor)
return ContextChatEngine.from_defaults(
system_prompt=system_prompt,
retriever=vector_index_retriever,
service_context=self.service_context,
node_postprocessors=[
MetadataReplacementPostProcessor(target_metadata_key="window"),
],
llm=self.llm_component.llm, # Takes no effect at the moment
node_postprocessors=node_postprocessors,
)
else:
return SimpleChatEngine.from_defaults(
system_prompt=system_prompt,
service_context=self.service_context,
llm=self.llm_component.llm,
)
def stream_chat(

View File

@@ -1,8 +1,9 @@
from typing import TYPE_CHECKING, Literal
from injector import inject, singleton
from llama_index import ServiceContext, StorageContext, VectorStoreIndex
from llama_index.schema import NodeWithScore
from llama_index.core.indices import VectorStoreIndex
from llama_index.core.schema import NodeWithScore
from llama_index.core.storage import StorageContext
from pydantic import BaseModel, Field
from private_gpt.components.embedding.embedding_component import EmbeddingComponent
@@ -15,7 +16,7 @@ from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.server.ingest.model import IngestedDoc
if TYPE_CHECKING:
from llama_index.schema import RelatedNodeInfo
from llama_index.core.schema import RelatedNodeInfo
class Chunk(BaseModel):
@@ -63,14 +64,13 @@ class ChunksService:
node_store_component: NodeStoreComponent,
) -> None:
self.vector_store_component = vector_store_component
self.llm_component = llm_component
self.embedding_component = embedding_component
self.storage_context = StorageContext.from_defaults(
vector_store=vector_store_component.vector_store,
docstore=node_store_component.doc_store,
index_store=node_store_component.index_store,
)
self.query_service_context = ServiceContext.from_defaults(
llm=llm_component.llm, embed_model=embedding_component.embedding_model
)
def _get_sibling_nodes_text(
self, node_with_score: NodeWithScore, related_number: int, forward: bool = True
@@ -103,7 +103,8 @@ class ChunksService:
index = VectorStoreIndex.from_vector_store(
self.vector_store_component.vector_store,
storage_context=self.storage_context,
service_context=self.query_service_context,
llm=self.llm_component.llm,
embed_model=self.embedding_component.embedding_model,
show_progress=True,
)
vector_index_retriever = self.vector_store_component.get_retriever(

View File

@@ -42,6 +42,13 @@ class CompletionsBody(BaseModel):
summary="Completion",
responses={200: {"model": OpenAICompletion}},
tags=["Contextual Completions"],
openapi_extra={
"x-fern-streaming": {
"stream-condition": "stream",
"response": {"$ref": "#/components/schemas/OpenAICompletion"},
"response-stream": {"$ref": "#/components/schemas/OpenAICompletion"},
}
},
)
def prompt_completion(
request: Request, body: CompletionsBody

View File

@@ -1,7 +1,7 @@
from typing import Literal
from fastapi import APIRouter, Depends, HTTPException, Request, UploadFile
from pydantic import BaseModel
from pydantic import BaseModel, Field
from private_gpt.server.ingest.ingest_service import IngestService
from private_gpt.server.ingest.model import IngestedDoc
@@ -10,14 +10,35 @@ from private_gpt.server.utils.auth import authenticated
ingest_router = APIRouter(prefix="/v1", dependencies=[Depends(authenticated)])
class IngestTextBody(BaseModel):
file_name: str = Field(examples=["Avatar: The Last Airbender"])
text: str = Field(
examples=[
"Avatar is set in an Asian and Arctic-inspired world in which some "
"people can telekinetically manipulate one of the four elements—water, "
"earth, fire or air—through practices known as 'bending', inspired by "
"Chinese martial arts."
]
)
class IngestResponse(BaseModel):
object: Literal["list"]
model: Literal["private-gpt"]
data: list[IngestedDoc]
@ingest_router.post("/ingest", tags=["Ingestion"])
@ingest_router.post("/ingest", tags=["Ingestion"], deprecated=True)
def ingest(request: Request, file: UploadFile) -> IngestResponse:
"""Ingests and processes a file.
Deprecated. Use ingest/file instead.
"""
return ingest_file(request, file)
@ingest_router.post("/ingest/file", tags=["Ingestion"])
def ingest_file(request: Request, file: UploadFile) -> IngestResponse:
"""Ingests and processes a file, storing its chunks to be used as context.
The context obtained from files is later used in
@@ -40,6 +61,26 @@ def ingest(request: Request, file: UploadFile) -> IngestResponse:
return IngestResponse(object="list", model="private-gpt", data=ingested_documents)
@ingest_router.post("/ingest/text", tags=["Ingestion"])
def ingest_text(request: Request, body: IngestTextBody) -> IngestResponse:
"""Ingests and processes a text, storing its chunks to be used as context.
The context obtained from files is later used in
`/chat/completions`, `/completions`, and `/chunks` APIs.
A Document will be generated with the given text. The Document
ID is returned in the response, together with the
extracted Metadata (which is later used to improve context retrieval). That ID
can be used to filter the context used to create responses in
`/chat/completions`, `/completions`, and `/chunks` APIs.
"""
service = request.state.injector.get(IngestService)
if len(body.file_name) == 0:
raise HTTPException(400, "No file name provided")
ingested_documents = service.ingest_text(body.file_name, body.text)
return IngestResponse(object="list", model="private-gpt", data=ingested_documents)
@ingest_router.get("/ingest/list", tags=["Ingestion"])
def list_ingested(request: Request) -> IngestResponse:
"""Lists already ingested Documents including their Document ID and metadata.

View File

@@ -1,14 +1,11 @@
import logging
import tempfile
from pathlib import Path
from typing import BinaryIO
from typing import TYPE_CHECKING, AnyStr, BinaryIO
from injector import inject, singleton
from llama_index import (
ServiceContext,
StorageContext,
)
from llama_index.node_parser import SentenceWindowNodeParser
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.storage import StorageContext
from private_gpt.components.embedding.embedding_component import EmbeddingComponent
from private_gpt.components.ingest.ingest_component import get_ingestion_component
@@ -20,6 +17,9 @@ from private_gpt.components.vector_store.vector_store_component import (
from private_gpt.server.ingest.model import IngestedDoc
from private_gpt.settings.settings import settings
if TYPE_CHECKING:
from llama_index.core.storage.docstore.types import RefDocInfo
logger = logging.getLogger(__name__)
@@ -40,29 +40,15 @@ class IngestService:
index_store=node_store_component.index_store,
)
node_parser = SentenceWindowNodeParser.from_defaults()
self.ingest_service_context = ServiceContext.from_defaults(
llm=self.llm_service.llm,
embed_model=embedding_component.embedding_model,
node_parser=node_parser,
# Embeddings done early in the pipeline of node transformations, right
# after the node parsing
transformations=[node_parser, embedding_component.embedding_model],
)
self.ingest_component = get_ingestion_component(
self.storage_context, self.ingest_service_context, settings=settings()
self.storage_context,
embed_model=embedding_component.embedding_model,
transformations=[node_parser, embedding_component.embedding_model],
settings=settings(),
)
def ingest(self, file_name: str, file_data: Path) -> list[IngestedDoc]:
logger.info("Ingesting file_name=%s", file_name)
documents = self.ingest_component.ingest(file_name, file_data)
return [IngestedDoc.from_document(document) for document in documents]
def ingest_bin_data(
self, file_name: str, raw_file_data: BinaryIO
) -> list[IngestedDoc]:
logger.debug("Ingesting binary data with file_name=%s", file_name)
file_data = raw_file_data.read()
def _ingest_data(self, file_name: str, file_data: AnyStr) -> list[IngestedDoc]:
logger.debug("Got file data of size=%s to ingest", len(file_data))
# llama-index mainly supports reading from files, so
# we have to create a tmp file to read for it to work
@@ -74,28 +60,44 @@ class IngestService:
path_to_tmp.write_bytes(file_data)
else:
path_to_tmp.write_text(str(file_data))
return self.ingest(file_name, path_to_tmp)
return self.ingest_file(file_name, path_to_tmp)
finally:
tmp.close()
path_to_tmp.unlink()
def ingest_file(self, file_name: str, file_data: Path) -> list[IngestedDoc]:
logger.info("Ingesting file_name=%s", file_name)
documents = self.ingest_component.ingest(file_name, file_data)
logger.info("Finished ingestion file_name=%s", file_name)
return [IngestedDoc.from_document(document) for document in documents]
def ingest_text(self, file_name: str, text: str) -> list[IngestedDoc]:
logger.debug("Ingesting text data with file_name=%s", file_name)
return self._ingest_data(file_name, text)
def ingest_bin_data(
self, file_name: str, raw_file_data: BinaryIO
) -> list[IngestedDoc]:
logger.debug("Ingesting binary data with file_name=%s", file_name)
file_data = raw_file_data.read()
return self._ingest_data(file_name, file_data)
def bulk_ingest(self, files: list[tuple[str, Path]]) -> list[IngestedDoc]:
logger.info("Ingesting file_names=%s", [f[0] for f in files])
documents = self.ingest_component.bulk_ingest(files)
logger.info("Finished ingestion file_name=%s", [f[0] for f in files])
return [IngestedDoc.from_document(document) for document in documents]
def list_ingested(self) -> list[IngestedDoc]:
ingested_docs = []
ingested_docs: list[IngestedDoc] = []
try:
docstore = self.storage_context.docstore
ingested_docs_ids: set[str] = set()
ref_docs: dict[str, RefDocInfo] | None = docstore.get_all_ref_doc_info()
for node in docstore.docs.values():
if node.ref_doc_id is not None:
ingested_docs_ids.add(node.ref_doc_id)
if not ref_docs:
return ingested_docs
for doc_id in ingested_docs_ids:
ref_doc_info = docstore.get_ref_doc_info(ref_doc_id=doc_id)
for doc_id, ref_doc_info in ref_docs.items():
doc_metadata = None
if ref_doc_info is not None and ref_doc_info.metadata is not None:
doc_metadata = IngestedDoc.curate_metadata(ref_doc_info.metadata)

View File

@@ -3,10 +3,9 @@ from pathlib import Path
from typing import Any
from watchdog.events import (
DirCreatedEvent,
DirModifiedEvent,
FileCreatedEvent,
FileModifiedEvent,
FileSystemEvent,
FileSystemEventHandler,
)
from watchdog.observers import Observer
@@ -20,11 +19,11 @@ class IngestWatcher:
self.on_file_changed = on_file_changed
class Handler(FileSystemEventHandler):
def on_modified(self, event: DirModifiedEvent | FileModifiedEvent) -> None:
def on_modified(self, event: FileSystemEvent) -> None:
if isinstance(event, FileModifiedEvent):
on_file_changed(Path(event.src_path))
def on_created(self, event: DirCreatedEvent | FileCreatedEvent) -> None:
def on_created(self, event: FileSystemEvent) -> None:
if isinstance(event, FileCreatedEvent):
on_file_changed(Path(event.src_path))

View File

@@ -1,6 +1,6 @@
from typing import Any, Literal
from llama_index import Document
from llama_index.core.schema import Document
from pydantic import BaseModel, Field

View File

@@ -0,0 +1,86 @@
from fastapi import APIRouter, Depends, Request
from pydantic import BaseModel
from starlette.responses import StreamingResponse
from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.open_ai.openai_models import (
to_openai_sse_stream,
)
from private_gpt.server.recipes.summarize.summarize_service import SummarizeService
from private_gpt.server.utils.auth import authenticated
summarize_router = APIRouter(prefix="/v1", dependencies=[Depends(authenticated)])
class SummarizeBody(BaseModel):
text: str | None = None
use_context: bool = False
context_filter: ContextFilter | None = None
prompt: str | None = None
instructions: str | None = None
stream: bool = False
class SummarizeResponse(BaseModel):
summary: str
@summarize_router.post(
"/summarize",
response_model=None,
summary="Summarize",
responses={200: {"model": SummarizeResponse}},
tags=["Recipes"],
)
def summarize(
request: Request, body: SummarizeBody
) -> SummarizeResponse | StreamingResponse:
"""Given a text, the model will return a summary.
Optionally include `instructions` to influence the way the summary is generated.
If `use_context`
is set to `true`, the model will also use the content coming from the ingested
documents in the summary. The documents being used can
be filtered by their metadata using the `context_filter`.
Ingested documents metadata can be found using `/ingest/list` endpoint.
If you want all ingested documents to be used, remove `context_filter` altogether.
If `prompt` is set, it will be used as the prompt for the summarization,
otherwise the default prompt will be used.
When using `'stream': true`, the API will return data chunks following [OpenAI's
streaming model](https://platform.openai.com/docs/api-reference/chat/streaming):
```
{"id":"12345","object":"completion.chunk","created":1694268190,
"model":"private-gpt","choices":[{"index":0,"delta":{"content":"Hello"},
"finish_reason":null}]}
```
"""
service: SummarizeService = request.state.injector.get(SummarizeService)
if body.stream:
completion_gen = service.stream_summarize(
text=body.text,
instructions=body.instructions,
use_context=body.use_context,
context_filter=body.context_filter,
prompt=body.prompt,
)
return StreamingResponse(
to_openai_sse_stream(
response_generator=completion_gen,
),
media_type="text/event-stream",
)
else:
completion = service.summarize(
text=body.text,
instructions=body.instructions,
use_context=body.use_context,
context_filter=body.context_filter,
prompt=body.prompt,
)
return SummarizeResponse(
summary=completion,
)

View File

@@ -0,0 +1,172 @@
from itertools import chain
from injector import inject, singleton
from llama_index.core import (
Document,
StorageContext,
SummaryIndex,
)
from llama_index.core.base.response.schema import Response, StreamingResponse
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.response_synthesizers import ResponseMode
from llama_index.core.storage.docstore.types import RefDocInfo
from llama_index.core.types import TokenGen
from private_gpt.components.embedding.embedding_component import EmbeddingComponent
from private_gpt.components.llm.llm_component import LLMComponent
from private_gpt.components.node_store.node_store_component import NodeStoreComponent
from private_gpt.components.vector_store.vector_store_component import (
VectorStoreComponent,
)
from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.settings.settings import Settings
DEFAULT_SUMMARIZE_PROMPT = (
"Provide a comprehensive summary of the provided context information. "
"The summary should cover all the key points and main ideas presented in "
"the original text, while also condensing the information into a concise "
"and easy-to-understand format. Please ensure that the summary includes "
"relevant details and examples that support the main ideas, while avoiding "
"any unnecessary information or repetition."
)
@singleton
class SummarizeService:
@inject
def __init__(
self,
settings: Settings,
llm_component: LLMComponent,
node_store_component: NodeStoreComponent,
vector_store_component: VectorStoreComponent,
embedding_component: EmbeddingComponent,
) -> None:
self.settings = settings
self.llm_component = llm_component
self.node_store_component = node_store_component
self.vector_store_component = vector_store_component
self.embedding_component = embedding_component
self.storage_context = StorageContext.from_defaults(
vector_store=vector_store_component.vector_store,
docstore=node_store_component.doc_store,
index_store=node_store_component.index_store,
)
@staticmethod
def _filter_ref_docs(
ref_docs: dict[str, RefDocInfo], context_filter: ContextFilter | None
) -> list[RefDocInfo]:
if context_filter is None or not context_filter.docs_ids:
return list(ref_docs.values())
return [
ref_doc
for doc_id, ref_doc in ref_docs.items()
if doc_id in context_filter.docs_ids
]
def _summarize(
self,
use_context: bool = False,
stream: bool = False,
text: str | None = None,
instructions: str | None = None,
context_filter: ContextFilter | None = None,
prompt: str | None = None,
) -> str | TokenGen:
nodes_to_summarize = []
# Add text to summarize
if text:
text_documents = [Document(text=text)]
nodes_to_summarize += (
SentenceSplitter.from_defaults().get_nodes_from_documents(
text_documents
)
)
# Add context documents to summarize
if use_context:
# 1. Recover all ref docs
ref_docs: dict[
str, RefDocInfo
] | None = self.storage_context.docstore.get_all_ref_doc_info()
if ref_docs is None:
raise ValueError("No documents have been ingested yet.")
# 2. Filter documents based on context_filter (if provided)
filtered_ref_docs = self._filter_ref_docs(ref_docs, context_filter)
# 3. Get all nodes from the filtered documents
filtered_node_ids = chain.from_iterable(
[ref_doc.node_ids for ref_doc in filtered_ref_docs]
)
filtered_nodes = self.storage_context.docstore.get_nodes(
node_ids=list(filtered_node_ids),
)
nodes_to_summarize += filtered_nodes
# Create a SummaryIndex to summarize the nodes
summary_index = SummaryIndex(
nodes=nodes_to_summarize,
storage_context=StorageContext.from_defaults(), # In memory SummaryIndex
show_progress=True,
)
# Make a tree summarization query
# above the set of all candidate nodes
query_engine = summary_index.as_query_engine(
llm=self.llm_component.llm,
response_mode=ResponseMode.TREE_SUMMARIZE,
streaming=stream,
use_async=self.settings.summarize.use_async,
)
prompt = prompt or DEFAULT_SUMMARIZE_PROMPT
summarize_query = prompt + "\n" + (instructions or "")
response = query_engine.query(summarize_query)
if isinstance(response, Response):
return response.response or ""
elif isinstance(response, StreamingResponse):
return response.response_gen
else:
raise TypeError(f"The result is not of a supported type: {type(response)}")
def summarize(
self,
use_context: bool = False,
text: str | None = None,
instructions: str | None = None,
context_filter: ContextFilter | None = None,
prompt: str | None = None,
) -> str:
return self._summarize(
use_context=use_context,
stream=False,
text=text,
instructions=instructions,
context_filter=context_filter,
prompt=prompt,
) # type: ignore
def stream_summarize(
self,
use_context: bool = False,
text: str | None = None,
instructions: str | None = None,
context_filter: ContextFilter | None = None,
prompt: str | None = None,
) -> TokenGen:
return self._summarize(
use_context=use_context,
stream=True,
text=text,
instructions=instructions,
context_filter=context_filter,
prompt=prompt,
) # type: ignore

View File

@@ -12,6 +12,7 @@ Authorization can be done by following fastapi's guides:
* https://fastapi.tiangolo.com/tutorial/security/
* https://fastapi.tiangolo.com/tutorial/dependencies/dependencies-in-path-operation-decorators/
"""
# mypy: ignore-errors
# Disabled mypy error: All conditional function variants must have identical signatures
# We are changing the implementation of the authenticated method, based on

View File

@@ -1,4 +1,4 @@
from typing import Literal
from typing import Any, Literal
from pydantic import BaseModel, Field
@@ -59,6 +59,27 @@ class AuthSettings(BaseModel):
)
class IngestionSettings(BaseModel):
"""Ingestion configuration.
This configuration is used to control the ingestion of data into the system
using non-server methods. This is useful for local development and testing;
or to ingest in bulk from a folder.
Please note that this configuration is not secure and should be used in
a controlled environment only (setting right permissions, etc.).
"""
enabled: bool = Field(
description="Flag indicating if local ingestion is enabled or not.",
default=False,
)
allow_ingest_from: list[str] = Field(
description="A list of folders that should be permitted to make ingest requests.",
default=[],
)
class ServerSettings(BaseModel):
env_name: str = Field(
description="Name of the environment (prod, staging, local...)"
@@ -74,6 +95,10 @@ class ServerSettings(BaseModel):
class DataSettings(BaseModel):
local_ingestion: IngestionSettings = Field(
description="Ingestion configuration",
default_factory=lambda: IngestionSettings(allow_ingest_from=["*"]),
)
local_data_folder: str = Field(
description="Path to local storage."
"It will be treated as an absolute path if it starts with /"
@@ -81,80 +106,107 @@ class DataSettings(BaseModel):
class LLMSettings(BaseModel):
mode: Literal["local", "openai", "sagemaker", "mock"]
mode: Literal[
"llamacpp",
"openai",
"openailike",
"azopenai",
"sagemaker",
"mock",
"ollama",
"gemini",
]
max_new_tokens: int = Field(
256,
description="The maximum number of token that the LLM is authorized to generate in one completion.",
)
class VectorstoreSettings(BaseModel):
database: Literal["chroma", "qdrant"]
class LocalSettings(BaseModel):
llm_hf_repo_id: str
llm_hf_model_file: str
embedding_hf_model_name: str = Field(
description="Name of the HuggingFace model to use for embeddings"
context_window: int = Field(
3900,
description="The maximum number of context tokens for the model.",
)
tokenizer: str = Field(
None,
description="The model id of a predefined tokenizer hosted inside a model repo on "
"huggingface.co. Valid model ids can be located at the root-level, like "
"`bert-base-uncased`, or namespaced under a user or organization name, "
"like `HuggingFaceH4/zephyr-7b-beta`. If not set, will load a tokenizer matching "
"gpt-3.5-turbo LLM.",
)
temperature: float = Field(
0.1,
description="The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual.",
)
prompt_style: Literal[
"llama_cpp.llama-2",
"llama_cpp.alpaca",
"llama_cpp.vicuna",
"llama_cpp.oasst_llama",
"llama_cpp.baichuan-2",
"llama_cpp.baichuan",
"llama_cpp.openbuddy",
"llama_cpp.redpajama-incite",
"llama_cpp.snoozy",
"llama_cpp.phind",
"llama_cpp.intel",
"llama_cpp.open-orca",
"llama_cpp.mistrallite",
"llama_cpp.zephyr",
"llama_cpp.chatml",
"llama_cpp.openchat",
"default", "llama2", "llama3", "tag", "mistral", "chatml"
] = Field(
"llama2",
"vigogne",
"template",
] | None = Field(
None,
description=(
"The prompt style to use for the chat engine. "
"If None is given - use the default prompt style from the llama_index. It should look like `role: message`.\n"
"If `default` - use the default prompt style from the llama_index. It should look like `role: message`.\n"
"If `llama2` - use the llama2 prompt style from the llama_index. Based on `<s>`, `[INST]` and `<<SYS>>`.\n"
"If `llama_cpp.<name>` - use the `<name>` prompt style, implemented by `llama-cpp-python`. \n"
"If `llama3` - use the llama3 prompt style from the llama_index."
"If `tag` - use the `tag` prompt style. It should look like `<|role|>: message`. \n"
"If `mistral` - use the `mistral prompt style. It shoudl look like <s>[INST] {System Prompt} [/INST]</s>[INST] { UserInstructions } [/INST]"
"`llama2` is the historic behaviour. `default` might work better with your custom models."
),
)
default_system_prompt: str | None = Field(
None,
description=(
"The default system prompt to use for the chat engine. "
"If none is given - use the default system prompt (from the llama_index). "
"Please note that the default prompt might not be the same for all prompt styles. "
"Also note that this is only used if the first message is not a system message. "
),
class VectorstoreSettings(BaseModel):
database: Literal["chroma", "qdrant", "postgres", "clickhouse", "milvus"]
class NodeStoreSettings(BaseModel):
database: Literal["simple", "postgres"]
class LlamaCPPSettings(BaseModel):
llm_hf_repo_id: str
llm_hf_model_file: str
tfs_z: float = Field(
1.0,
description="Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.",
)
top_k: int = Field(
40,
description="Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)",
)
top_p: float = Field(
0.9,
description="Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)",
)
repeat_penalty: float = Field(
1.1,
description="Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)",
)
template_name: str | None = Field(
class HuggingFaceSettings(BaseModel):
embedding_hf_model_name: str = Field(
description="Name of the HuggingFace model to use for embeddings"
)
access_token: str = Field(
None,
description=(
"The name of the template to use for the chat engine, if the `prompt_style` is `template`."
),
description="Huggingface access token, required to download some models",
)
trust_remote_code: bool = Field(
False,
description="If set to True, the code from the remote model will be trusted and executed.",
)
class EmbeddingSettings(BaseModel):
mode: Literal["local", "openai", "sagemaker", "mock"]
ingest_mode: Literal["simple", "batch", "parallel"] = Field(
mode: Literal[
"huggingface", "openai", "azopenai", "sagemaker", "ollama", "mock", "gemini"
]
ingest_mode: Literal["simple", "batch", "parallel", "pipeline"] = Field(
"simple",
description=(
"The ingest mode to use for the embedding engine:\n"
"If `simple` - ingest files sequentially and one by one. It is the historic behaviour.\n"
"If `batch` - if multiple files, parse all the files in parallel, "
"and send them in batch to the embedding model.\n"
"In `pipeline` - The Embedding engine is kept as busy as possible\n"
"If `parallel` - parse the files in parallel using multiple cores, and embedd them in parallel.\n"
"`parallel` is the fastest mode for local setup, as it parallelize IO RW in the index.\n"
"For modes that leverage parallelization, you can specify the number of "
@@ -167,11 +219,16 @@ class EmbeddingSettings(BaseModel):
"The number of workers to use for file ingestion.\n"
"In `batch` mode, this is the number of workers used to parse the files.\n"
"In `parallel` mode, this is the number of workers used to parse the files and embed them.\n"
"In `pipeline` mode, this is the number of workers that can perform embeddings.\n"
"This is only used if `ingest_mode` is not `simple`.\n"
"Do not go too high with this number, as it might cause memory issues. (especially in `parallel` mode)\n"
"Do not set it higher than your number of threads of your CPU."
),
)
embed_dim: int = Field(
384,
description="The dimension of the embeddings stored in the Postgres database",
)
class SagemakerSettings(BaseModel):
@@ -180,12 +237,268 @@ class SagemakerSettings(BaseModel):
class OpenAISettings(BaseModel):
api_base: str = Field(
None,
description="Base URL of OpenAI API. Example: 'https://api.openai.com/v1'.",
)
api_key: str
model: str = Field(
"gpt-3.5-turbo",
description="OpenAI Model to use. Example: 'gpt-4'.",
)
request_timeout: float = Field(
120.0,
description="Time elapsed until openailike server times out the request. Default is 120s. Format is float. ",
)
embedding_api_base: str = Field(
None,
description="Base URL of OpenAI API. Example: 'https://api.openai.com/v1'.",
)
embedding_api_key: str
embedding_model: str = Field(
"text-embedding-ada-002",
description="OpenAI embedding Model to use. Example: 'text-embedding-3-large'.",
)
class GeminiSettings(BaseModel):
api_key: str
model: str = Field(
"models/gemini-pro",
description="Google Model to use. Example: 'models/gemini-pro'.",
)
embedding_model: str = Field(
"models/embedding-001",
description="Google Embedding Model to use. Example: 'models/embedding-001'.",
)
class OllamaSettings(BaseModel):
api_base: str = Field(
"http://localhost:11434",
description="Base URL of Ollama API. Example: 'https://localhost:11434'.",
)
embedding_api_base: str = Field(
"http://localhost:11434",
description="Base URL of Ollama embedding API. Example: 'https://localhost:11434'.",
)
llm_model: str = Field(
None,
description="Model to use. Example: 'llama2-uncensored'.",
)
embedding_model: str = Field(
None,
description="Model to use. Example: 'nomic-embed-text'.",
)
keep_alive: str = Field(
"5m",
description="Time the model will stay loaded in memory after a request. examples: 5m, 5h, '-1' ",
)
tfs_z: float = Field(
1.0,
description="Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.",
)
num_predict: int = Field(
None,
description="Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)",
)
top_k: int = Field(
40,
description="Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)",
)
top_p: float = Field(
0.9,
description="Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)",
)
repeat_last_n: int = Field(
64,
description="Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)",
)
repeat_penalty: float = Field(
1.1,
description="Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)",
)
request_timeout: float = Field(
120.0,
description="Time elapsed until ollama times out the request. Default is 120s. Format is float. ",
)
autopull_models: bool = Field(
False,
description="If set to True, the Ollama will automatically pull the models from the API base.",
)
class AzureOpenAISettings(BaseModel):
api_key: str
azure_endpoint: str
api_version: str = Field(
"2023_05_15",
description="The API version to use for this operation. This follows the YYYY-MM-DD format.",
)
embedding_deployment_name: str
embedding_model: str = Field(
"text-embedding-ada-002",
description="OpenAI Model to use. Example: 'text-embedding-ada-002'.",
)
llm_deployment_name: str
llm_model: str = Field(
"gpt-35-turbo",
description="OpenAI Model to use. Example: 'gpt-4'.",
)
class UISettings(BaseModel):
enabled: bool
path: str
default_chat_system_prompt: str = Field(
None,
description="The default system prompt to use for the chat mode.",
)
default_query_system_prompt: str = Field(
None, description="The default system prompt to use for the query mode."
)
default_summarization_system_prompt: str = Field(
None,
description="The default system prompt to use for the summarization mode.",
)
delete_file_button_enabled: bool = Field(
True, description="If the button to delete a file is enabled or not."
)
delete_all_files_button_enabled: bool = Field(
False, description="If the button to delete all files is enabled or not."
)
class RerankSettings(BaseModel):
enabled: bool = Field(
False,
description="This value controls whether a reranker should be included in the RAG pipeline.",
)
model: str = Field(
"cross-encoder/ms-marco-MiniLM-L-2-v2",
description="Rerank model to use. Limited to SentenceTransformer cross-encoder models.",
)
top_n: int = Field(
2,
description="This value controls the number of documents returned by the RAG pipeline.",
)
class RagSettings(BaseModel):
similarity_top_k: int = Field(
2,
description="This value controls the number of documents returned by the RAG pipeline or considered for reranking if enabled.",
)
similarity_value: float = Field(
None,
description="If set, any documents retrieved from the RAG must meet a certain match score. Acceptable values are between 0 and 1.",
)
rerank: RerankSettings
class SummarizeSettings(BaseModel):
use_async: bool = Field(
True,
description="If set to True, the summarization will be done asynchronously.",
)
class ClickHouseSettings(BaseModel):
host: str = Field(
"localhost",
description="The server hosting the ClickHouse database",
)
port: int = Field(
8443,
description="The port on which the ClickHouse database is accessible",
)
username: str = Field(
"default",
description="The username to use to connect to the ClickHouse database",
)
password: str = Field(
"",
description="The password to use to connect to the ClickHouse database",
)
database: str = Field(
"__default__",
description="The default database to use for connections",
)
secure: bool | str = Field(
False,
description="Use https/TLS for secure connection to the server",
)
interface: str | None = Field(
None,
description="Must be either 'http' or 'https'. Determines the protocol to use for the connection",
)
settings: dict[str, Any] | None = Field(
None,
description="Specific ClickHouse server settings to be used with the session",
)
connect_timeout: int | None = Field(
None,
description="Timeout in seconds for establishing a connection",
)
send_receive_timeout: int | None = Field(
None,
description="Read timeout in seconds for http connection",
)
verify: bool | None = Field(
None,
description="Verify the server certificate in secure/https mode",
)
ca_cert: str | None = Field(
None,
description="Path to Certificate Authority root certificate (.pem format)",
)
client_cert: str | None = Field(
None,
description="Path to TLS Client certificate (.pem format)",
)
client_cert_key: str | None = Field(
None,
description="Path to the private key for the TLS Client certificate",
)
http_proxy: str | None = Field(
None,
description="HTTP proxy address",
)
https_proxy: str | None = Field(
None,
description="HTTPS proxy address",
)
server_host_name: str | None = Field(
None,
description="Server host name to be checked against the TLS certificate",
)
class PostgresSettings(BaseModel):
host: str = Field(
"localhost",
description="The server hosting the Postgres database",
)
port: int = Field(
5432,
description="The port on which the Postgres database is accessible",
)
user: str = Field(
"postgres",
description="The user to use to connect to the Postgres database",
)
password: str = Field(
"postgres",
description="The password to use to connect to the Postgres database",
)
database: str = Field(
"postgres",
description="The database to use to connect to the Postgres database",
)
schema_name: str = Field(
"public",
description="The name of the schema in the Postgres database to use",
)
class QdrantSettings(BaseModel):
@@ -242,17 +555,48 @@ class QdrantSettings(BaseModel):
)
class MilvusSettings(BaseModel):
uri: str = Field(
"local_data/private_gpt/milvus/milvus_local.db",
description="The URI of the Milvus instance. For example: 'local_data/private_gpt/milvus/milvus_local.db' for Milvus Lite.",
)
token: str = Field(
"",
description=(
"A valid access token to access the specified Milvus instance. "
"This can be used as a recommended alternative to setting user and password separately. "
),
)
collection_name: str = Field(
"make_this_parameterizable_per_api_call",
description="The name of the collection in Milvus. Default is 'make_this_parameterizable_per_api_call'.",
)
overwrite: bool = Field(
True, description="Overwrite the previous collection schema if it exists."
)
class Settings(BaseModel):
server: ServerSettings
data: DataSettings
ui: UISettings
llm: LLMSettings
embedding: EmbeddingSettings
local: LocalSettings
llamacpp: LlamaCPPSettings
huggingface: HuggingFaceSettings
sagemaker: SagemakerSettings
openai: OpenAISettings
gemini: GeminiSettings
ollama: OllamaSettings
azopenai: AzureOpenAISettings
vectorstore: VectorstoreSettings
nodestore: NodeStoreSettings
rag: RagSettings
summarize: SummarizeSettings
qdrant: QdrantSettings | None = None
postgres: PostgresSettings | None = None
clickhouse: ClickHouseSettings | None = None
milvus: MilvusSettings | None = None
"""

View File

@@ -16,7 +16,7 @@ logger = logging.getLogger(__name__)
_settings_folder = os.environ.get("PGPT_SETTINGS_FOLDER", PROJECT_ROOT_PATH)
# if running in unittest, use the test profile
_test_profile = ["test"] if "unittest" in sys.modules else []
_test_profile = ["test"] if "tests.fixtures" in sys.modules else []
active_profiles: list[str] = unique_list(
["default"]

View File

@@ -1,7 +1,9 @@
"""This file should be imported only and only if you want to run the UI locally."""
import itertools
"""This file should be imported if and only if you want to run the UI locally."""
import base64
import logging
import time
from collections.abc import Iterable
from enum import Enum
from pathlib import Path
from typing import Any
@@ -9,14 +11,17 @@ import gradio as gr # type: ignore
from fastapi import FastAPI
from gradio.themes.utils.colors import slate # type: ignore
from injector import inject, singleton
from llama_index.llms import ChatMessage, MessageRole
from llama_index.core.llms import ChatMessage, ChatResponse, MessageRole
from llama_index.core.types import TokenGen
from pydantic import BaseModel
from private_gpt.constants import PROJECT_ROOT_PATH
from private_gpt.di import global_injector
from private_gpt.open_ai.extensions.context_filter import ContextFilter
from private_gpt.server.chat.chat_service import ChatService, CompletionGen
from private_gpt.server.chunks.chunks_service import Chunk, ChunksService
from private_gpt.server.ingest.ingest_service import IngestService
from private_gpt.server.recipes.summarize.summarize_service import SummarizeService
from private_gpt.settings.settings import settings
from private_gpt.ui.images import logo_svg
@@ -28,7 +33,22 @@ AVATAR_BOT = THIS_DIRECTORY_RELATIVE / "avatar-bot.ico"
UI_TAB_TITLE = "My Private GPT"
SOURCES_SEPARATOR = "\n\n Sources: \n"
SOURCES_SEPARATOR = "<hr>Sources: \n"
class Modes(str, Enum):
RAG_MODE = "RAG"
SEARCH_MODE = "Search"
BASIC_CHAT_MODE = "Basic"
SUMMARIZE_MODE = "Summarize"
MODES: list[Modes] = [
Modes.RAG_MODE,
Modes.SEARCH_MODE,
Modes.BASIC_CHAT_MODE,
Modes.SUMMARIZE_MODE,
]
class Source(BaseModel):
@@ -40,8 +60,8 @@ class Source(BaseModel):
frozen = True
@staticmethod
def curate_sources(sources: list[Chunk]) -> set["Source"]:
curated_sources = set()
def curate_sources(sources: list[Chunk]) -> list["Source"]:
curated_sources = []
for chunk in sources:
doc_metadata = chunk.document.doc_metadata
@@ -50,32 +70,14 @@ class Source(BaseModel):
page_label = doc_metadata.get("page_label", "-") if doc_metadata else "-"
source = Source(file=file_name, page=page_label, text=chunk.text)
curated_sources.add(source)
curated_sources.append(source)
curated_sources = list(
dict.fromkeys(curated_sources).keys()
) # Unique sources only
return curated_sources
def yield_deltas(completion_gen: CompletionGen) -> Iterable[str]:
full_response: str = ""
stream = completion_gen.response
for delta in stream:
# if isinstance(delta, str):
full_response += str(delta)
# elif isinstance(delta, ChatResponse):
# full_response += delta.delta or ""
yield full_response
if completion_gen.sources:
full_response += SOURCES_SEPARATOR
cur_sources = Source.curate_sources(completion_gen.sources)
sources_text = "\n\n\n".join(
f"{index}. {source.file} (page {source.page})"
for index, source in enumerate(cur_sources, start=1)
)
full_response += sources_text
yield full_response
@singleton
class PrivateGptUi:
@inject
@@ -84,64 +86,116 @@ class PrivateGptUi:
ingest_service: IngestService,
chat_service: ChatService,
chunks_service: ChunksService,
summarizeService: SummarizeService,
) -> None:
self._ingest_service = ingest_service
self._chat_service = chat_service
self._chunks_service = chunks_service
self._summarize_service = summarizeService
# Cache the UI blocks
self._ui_block = None
def _chat(self, message: str, history: list[list[str]], mode: str, *_: Any) -> Any:
self._selected_filename = None
# Initialize system prompt based on default mode
self.mode = MODES[0]
self._system_prompt = self._get_default_system_prompt(self.mode)
def _chat(
self, message: str, history: list[list[str]], mode: Modes, *_: Any
) -> Any:
def yield_deltas(completion_gen: CompletionGen) -> Iterable[str]:
full_response: str = ""
stream = completion_gen.response
for delta in stream:
if isinstance(delta, str):
full_response += str(delta)
elif isinstance(delta, ChatResponse):
full_response += delta.delta or ""
yield full_response
time.sleep(0.02)
if completion_gen.sources:
full_response += SOURCES_SEPARATOR
cur_sources = Source.curate_sources(completion_gen.sources)
sources_text = "\n\n\n"
used_files = set()
for index, source in enumerate(cur_sources, start=1):
if f"{source.file}-{source.page}" not in used_files:
sources_text = (
sources_text
+ f"{index}. {source.file} (page {source.page}) \n\n"
)
used_files.add(f"{source.file}-{source.page}")
sources_text += "<hr>\n\n"
full_response += sources_text
yield full_response
def yield_tokens(token_gen: TokenGen) -> Iterable[str]:
full_response: str = ""
for token in token_gen:
full_response += str(token)
yield full_response
def build_history() -> list[ChatMessage]:
history_messages: list[ChatMessage] = list(
itertools.chain(
*[
[
ChatMessage(content=interaction[0], role=MessageRole.USER),
ChatMessage(
# Remove from history content the Sources information
content=interaction[1].split(SOURCES_SEPARATOR)[0],
role=MessageRole.ASSISTANT,
),
]
for interaction in history
]
history_messages: list[ChatMessage] = []
for interaction in history:
history_messages.append(
ChatMessage(content=interaction[0], role=MessageRole.USER)
)
)
if len(interaction) > 1 and interaction[1] is not None:
history_messages.append(
ChatMessage(
# Remove from history content the Sources information
content=interaction[1].split(SOURCES_SEPARATOR)[0],
role=MessageRole.ASSISTANT,
)
)
# max 20 messages to try to avoid context overflow
return history_messages[:20]
new_message = ChatMessage(content=message, role=MessageRole.USER)
all_messages = [*build_history(), new_message]
# If a system prompt is set, add it as a system message
if self._system_prompt:
all_messages.insert(
0,
ChatMessage(
content=self._system_prompt,
role=MessageRole.SYSTEM,
),
)
match mode:
case "Query Docs":
# Add a system message to force the behaviour of the LLM
# to answer only questions about the provided context.
all_messages.insert(
0,
ChatMessage(
content="You can only answer questions about the provided context. If you know the answer "
"but it is not based in the provided context, don't provide the answer, just state "
"the answer is not in the context provided.",
role=MessageRole.SYSTEM,
),
)
case Modes.RAG_MODE:
# Use only the selected file for the query
context_filter = None
if self._selected_filename is not None:
docs_ids = []
for ingested_document in self._ingest_service.list_ingested():
if (
ingested_document.doc_metadata["file_name"]
== self._selected_filename
):
docs_ids.append(ingested_document.doc_id)
context_filter = ContextFilter(docs_ids=docs_ids)
query_stream = self._chat_service.stream_chat(
messages=all_messages,
use_context=True,
context_filter=context_filter,
)
yield from yield_deltas(query_stream)
case "LLM Chat":
case Modes.BASIC_CHAT_MODE:
llm_stream = self._chat_service.stream_chat(
messages=all_messages,
use_context=False,
)
yield from yield_deltas(llm_stream)
case "Search in Docs":
case Modes.SEARCH_MODE:
response = self._chunks_service.retrieve_relevant(
text=message, limit=4, prev_next_chunks=0
)
@@ -154,6 +208,76 @@ class PrivateGptUi:
f"{source.text}"
for index, source in enumerate(sources, start=1)
)
case Modes.SUMMARIZE_MODE:
# Summarize the given message, optionally using selected files
context_filter = None
if self._selected_filename:
docs_ids = []
for ingested_document in self._ingest_service.list_ingested():
if (
ingested_document.doc_metadata["file_name"]
== self._selected_filename
):
docs_ids.append(ingested_document.doc_id)
context_filter = ContextFilter(docs_ids=docs_ids)
summary_stream = self._summarize_service.stream_summarize(
use_context=True,
context_filter=context_filter,
instructions=message,
)
yield from yield_tokens(summary_stream)
# On initialization and on mode change, this function set the system prompt
# to the default prompt based on the mode (and user settings).
@staticmethod
def _get_default_system_prompt(mode: Modes) -> str:
p = ""
match mode:
# For query chat mode, obtain default system prompt from settings
case Modes.RAG_MODE:
p = settings().ui.default_query_system_prompt
# For chat mode, obtain default system prompt from settings
case Modes.BASIC_CHAT_MODE:
p = settings().ui.default_chat_system_prompt
# For summarization mode, obtain default system prompt from settings
case Modes.SUMMARIZE_MODE:
p = settings().ui.default_summarization_system_prompt
# For any other mode, clear the system prompt
case _:
p = ""
return p
@staticmethod
def _get_default_mode_explanation(mode: Modes) -> str:
match mode:
case Modes.RAG_MODE:
return "Get contextualized answers from selected files."
case Modes.SEARCH_MODE:
return "Find relevant chunks of text in selected files."
case Modes.BASIC_CHAT_MODE:
return "Chat with the LLM using its training data. Files are ignored."
case Modes.SUMMARIZE_MODE:
return "Generate a summary of the selected files. Prompt to customize the result."
case _:
return ""
def _set_system_prompt(self, system_prompt_input: str) -> None:
logger.info(f"Setting system prompt to: {system_prompt_input}")
self._system_prompt = system_prompt_input
def _set_explanatation_mode(self, explanation_mode: str) -> None:
self._explanation_mode = explanation_mode
def _set_current_mode(self, mode: Modes) -> Any:
self.mode = mode
self._set_system_prompt(self._get_default_system_prompt(mode))
self._set_explanatation_mode(self._get_default_mode_explanation(mode))
interactive = self._system_prompt is not None
return [
gr.update(placeholder=self._system_prompt, interactive=interactive),
gr.update(value=self._explanation_mode),
]
def _list_ingested_files(self) -> list[list[str]]:
files = set()
@@ -170,8 +294,71 @@ class PrivateGptUi:
def _upload_file(self, files: list[str]) -> None:
logger.debug("Loading count=%s files", len(files))
paths = [Path(file) for file in files]
# remove all existing Documents with name identical to a new file upload:
file_names = [path.name for path in paths]
doc_ids_to_delete = []
for ingested_document in self._ingest_service.list_ingested():
if (
ingested_document.doc_metadata
and ingested_document.doc_metadata["file_name"] in file_names
):
doc_ids_to_delete.append(ingested_document.doc_id)
if len(doc_ids_to_delete) > 0:
logger.info(
"Uploading file(s) which were already ingested: %s document(s) will be replaced.",
len(doc_ids_to_delete),
)
for doc_id in doc_ids_to_delete:
self._ingest_service.delete(doc_id)
self._ingest_service.bulk_ingest([(str(path.name), path) for path in paths])
def _delete_all_files(self) -> Any:
ingested_files = self._ingest_service.list_ingested()
logger.debug("Deleting count=%s files", len(ingested_files))
for ingested_document in ingested_files:
self._ingest_service.delete(ingested_document.doc_id)
return [
gr.List(self._list_ingested_files()),
gr.components.Button(interactive=False),
gr.components.Button(interactive=False),
gr.components.Textbox("All files"),
]
def _delete_selected_file(self) -> Any:
logger.debug("Deleting selected %s", self._selected_filename)
# Note: keep looping for pdf's (each page became a Document)
for ingested_document in self._ingest_service.list_ingested():
if (
ingested_document.doc_metadata
and ingested_document.doc_metadata["file_name"]
== self._selected_filename
):
self._ingest_service.delete(ingested_document.doc_id)
return [
gr.List(self._list_ingested_files()),
gr.components.Button(interactive=False),
gr.components.Button(interactive=False),
gr.components.Textbox("All files"),
]
def _deselect_selected_file(self) -> Any:
self._selected_filename = None
return [
gr.components.Button(interactive=False),
gr.components.Button(interactive=False),
gr.components.Textbox("All files"),
]
def _selected_a_file(self, select_data: gr.SelectData) -> Any:
self._selected_filename = select_data.value
return [
gr.components.Button(interactive=True),
gr.components.Button(interactive=True),
gr.components.Textbox(self._selected_filename),
]
def _build_ui_blocks(self) -> gr.Blocks:
logger.debug("Creating the UI blocks")
with gr.Blocks(
@@ -186,17 +373,34 @@ class PrivateGptUi:
"justify-content: center;"
"align-items: center;"
"}"
".logo img { height: 25% }",
".logo img { height: 25% }"
".contain { display: flex !important; flex-direction: column !important; }"
"#component-0, #component-3, #component-10, #component-8 { height: 100% !important; }"
"#chatbot { flex-grow: 1 !important; overflow: auto !important;}"
"#col { height: calc(100vh - 112px - 16px) !important; }"
"hr { margin-top: 1em; margin-bottom: 1em; border: 0; border-top: 1px solid #FFF; }"
".avatar-image { background-color: antiquewhite; border-radius: 2px; }"
".footer { text-align: center; margin-top: 20px; font-size: 14px; display: flex; align-items: center; justify-content: center; }"
".footer-zylon-link { display:flex; margin-left: 5px; text-decoration: auto; color: var(--body-text-color); }"
".footer-zylon-link:hover { color: #C7BAFF; }"
".footer-zylon-ico { height: 20px; margin-left: 5px; background-color: antiquewhite; border-radius: 2px; }",
) as blocks:
with gr.Row():
gr.HTML(f"<div class='logo'/><img src={logo_svg} alt=PrivateGPT></div")
with gr.Row():
with gr.Column(scale=3, variant="compact"):
with gr.Row(equal_height=False):
with gr.Column(scale=3):
default_mode = MODES[0]
mode = gr.Radio(
["Query Docs", "Search in Docs", "LLM Chat"],
[mode.value for mode in MODES],
label="Mode",
value="Query Docs",
value=default_mode,
)
explanation_mode = gr.Textbox(
placeholder=self._get_default_mode_explanation(default_mode),
show_label=False,
max_lines=3,
interactive=False,
)
upload_button = gr.components.UploadButton(
"Upload File(s)",
@@ -208,6 +412,7 @@ class PrivateGptUi:
self._list_ingested_files,
headers=["File name"],
label="Ingested Files",
height=235,
interactive=False,
render=False, # Rendered under the button
)
@@ -221,20 +426,143 @@ class PrivateGptUi:
outputs=ingested_dataset,
)
ingested_dataset.render()
with gr.Column(scale=7):
deselect_file_button = gr.components.Button(
"De-select selected file", size="sm", interactive=False
)
selected_text = gr.components.Textbox(
"All files", label="Selected for Query or Deletion", max_lines=1
)
delete_file_button = gr.components.Button(
"🗑️ Delete selected file",
size="sm",
visible=settings().ui.delete_file_button_enabled,
interactive=False,
)
delete_files_button = gr.components.Button(
"⚠️ Delete ALL files",
size="sm",
visible=settings().ui.delete_all_files_button_enabled,
)
deselect_file_button.click(
self._deselect_selected_file,
outputs=[
delete_file_button,
deselect_file_button,
selected_text,
],
)
ingested_dataset.select(
fn=self._selected_a_file,
outputs=[
delete_file_button,
deselect_file_button,
selected_text,
],
)
delete_file_button.click(
self._delete_selected_file,
outputs=[
ingested_dataset,
delete_file_button,
deselect_file_button,
selected_text,
],
)
delete_files_button.click(
self._delete_all_files,
outputs=[
ingested_dataset,
delete_file_button,
deselect_file_button,
selected_text,
],
)
system_prompt_input = gr.Textbox(
placeholder=self._system_prompt,
label="System Prompt",
lines=2,
interactive=True,
render=False,
)
# When mode changes, set default system prompt, and other stuffs
mode.change(
self._set_current_mode,
inputs=mode,
outputs=[system_prompt_input, explanation_mode],
)
# On blur, set system prompt to use in queries
system_prompt_input.blur(
self._set_system_prompt,
inputs=system_prompt_input,
)
def get_model_label() -> str | None:
"""Get model label from llm mode setting YAML.
Raises:
ValueError: If an invalid 'llm_mode' is encountered.
Returns:
str: The corresponding model label.
"""
# Get model label from llm mode setting YAML
# Labels: local, openai, openailike, sagemaker, mock, ollama
config_settings = settings()
if config_settings is None:
raise ValueError("Settings are not configured.")
# Get llm_mode from settings
llm_mode = config_settings.llm.mode
# Mapping of 'llm_mode' to corresponding model labels
model_mapping = {
"llamacpp": config_settings.llamacpp.llm_hf_model_file,
"openai": config_settings.openai.model,
"openailike": config_settings.openai.model,
"sagemaker": config_settings.sagemaker.llm_endpoint_name,
"mock": llm_mode,
"ollama": config_settings.ollama.llm_model,
"gemini": config_settings.gemini.model,
}
if llm_mode not in model_mapping:
print(f"Invalid 'llm mode': {llm_mode}")
return None
return model_mapping[llm_mode]
with gr.Column(scale=7, elem_id="col"):
# Determine the model label based on the value of PGPT_PROFILES
model_label = get_model_label()
if model_label is not None:
label_text = (
f"LLM: {settings().llm.mode} | Model: {model_label}"
)
else:
label_text = f"LLM: {settings().llm.mode}"
_ = gr.ChatInterface(
self._chat,
chatbot=gr.Chatbot(
label=f"LLM: {settings().llm.mode}",
label=label_text,
show_copy_button=True,
elem_id="chatbot",
render=False,
avatar_images=(
None,
AVATAR_BOT,
),
),
additional_inputs=[mode, upload_button],
additional_inputs=[mode, upload_button, system_prompt_input],
)
with gr.Row():
avatar_byte = AVATAR_BOT.read_bytes()
f_base64 = f"data:image/png;base64,{base64.b64encode(avatar_byte).decode('utf-8')}"
gr.HTML(
f"<div class='footer'><a class='footer-zylon-link' href='https://zylon.ai/'>Maintained by Zylon <img class='footer-zylon-ico' src='{f_base64}' alt=Zylon></a></div>"
)
return blocks
def get_ui_blocks(self) -> gr.Blocks:
@@ -246,7 +574,7 @@ class PrivateGptUi:
blocks = self.get_ui_blocks()
blocks.queue()
logger.info("Mounting the gradio UI, at path=%s", path)
gr.mount_gradio_app(app, blocks, path=path)
gr.mount_gradio_app(app, blocks, path=path, favicon_path=AVATAR_BOT)
if __name__ == "__main__":

122
private_gpt/utils/eta.py Normal file
View File

@@ -0,0 +1,122 @@
import datetime
import logging
import math
import time
from collections import deque
from typing import Any
logger = logging.getLogger(__name__)
def human_time(*args: Any, **kwargs: Any) -> str:
def timedelta_total_seconds(timedelta: datetime.timedelta) -> float:
return (
timedelta.microseconds
+ 0.0
+ (timedelta.seconds + timedelta.days * 24 * 3600) * 10**6
) / 10**6
secs = float(timedelta_total_seconds(datetime.timedelta(*args, **kwargs)))
# We want (ms) precision below 2 seconds
if secs < 2:
return f"{secs * 1000}ms"
units = [("y", 86400 * 365), ("d", 86400), ("h", 3600), ("m", 60), ("s", 1)]
parts = []
for unit, mul in units:
if secs / mul >= 1 or mul == 1:
if mul > 1:
n = int(math.floor(secs / mul))
secs -= n * mul
else:
# >2s we drop the (ms) component.
n = int(secs)
if n:
parts.append(f"{n}{unit}")
return " ".join(parts)
def eta(iterator: list[Any]) -> Any:
"""Report an ETA after 30s and every 60s thereafter."""
total = len(iterator)
_eta = ETA(total)
_eta.needReport(30)
for processed, data in enumerate(iterator, start=1):
yield data
_eta.update(processed)
if _eta.needReport(60):
logger.info(f"{processed}/{total} - ETA {_eta.human_time()}")
class ETA:
"""Predict how long something will take to complete."""
def __init__(self, total: int):
self.total: int = total # Total expected records.
self.rate: float = 0.0 # per second
self._timing_data: deque[tuple[float, int]] = deque(maxlen=100)
self.secondsLeft: float = 0.0
self.nexttime: float = 0.0
def human_time(self) -> str:
if self._calc():
return f"{human_time(seconds=self.secondsLeft)} @ {int(self.rate * 60)}/min"
return "(computing)"
def update(self, count: int) -> None:
# count should be in the range 0 to self.total
assert count > 0
assert count <= self.total
self._timing_data.append((time.time(), count)) # (X,Y) for pearson
def needReport(self, whenSecs: int) -> bool:
now = time.time()
if now > self.nexttime:
self.nexttime = now + whenSecs
return True
return False
def _calc(self) -> bool:
# A sample before a prediction. Need two points to compute slope!
if len(self._timing_data) < 3:
return False
# http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
# Calculate means and standard deviations.
samples = len(self._timing_data)
# column wise sum of the timing tuples to compute their mean.
mean_x, mean_y = (
sum(i) / samples for i in zip(*self._timing_data, strict=False)
)
std_x = math.sqrt(
sum(pow(i[0] - mean_x, 2) for i in self._timing_data) / (samples - 1)
)
std_y = math.sqrt(
sum(pow(i[1] - mean_y, 2) for i in self._timing_data) / (samples - 1)
)
# Calculate coefficient.
sum_xy, sum_sq_v_x, sum_sq_v_y = 0.0, 0.0, 0
for x, y in self._timing_data:
x -= mean_x
y -= mean_y
sum_xy += x * y
sum_sq_v_x += pow(x, 2)
sum_sq_v_y += pow(y, 2)
pearson_r = sum_xy / math.sqrt(sum_sq_v_x * sum_sq_v_y)
# Calculate regression line.
# y = mx + b where m is the slope and b is the y-intercept.
m = self.rate = pearson_r * (std_y / std_x)
y = self.total
b = mean_y - m * mean_x
x = (y - b) / m
# Calculate fitted line (transformed/shifted regression line horizontally).
fitted_b = self._timing_data[-1][1] - (m * self._timing_data[-1][0])
fitted_x = (y - fitted_b) / m
_, count = self._timing_data[-1] # adjust last data point progress count
adjusted_x = ((fitted_x - x) * (count / self.total)) + x
eta_epoch = adjusted_x
self.secondsLeft = max([eta_epoch - time.time(), 0])
return True

View File

@@ -0,0 +1,80 @@
import logging
from collections import deque
from collections.abc import Iterator, Mapping
from typing import Any
from tqdm import tqdm # type: ignore
try:
from ollama import Client # type: ignore
except ImportError as e:
raise ImportError(
"Ollama dependencies not found, install with `poetry install --extras llms-ollama or embeddings-ollama`"
) from e
logger = logging.getLogger(__name__)
def check_connection(client: Client) -> bool:
try:
client.list()
return True
except Exception as e:
logger.error(f"Failed to connect to Ollama: {e!s}")
return False
def process_streaming(generator: Iterator[Mapping[str, Any]]) -> None:
progress_bars = {}
queue = deque() # type: ignore
def create_progress_bar(dgt: str, total: int) -> Any:
return tqdm(
total=total, desc=f"Pulling model {dgt[7:17]}...", unit="B", unit_scale=True
)
current_digest = None
for chunk in generator:
digest = chunk.get("digest")
completed_size = chunk.get("completed", 0)
total_size = chunk.get("total")
if digest and total_size is not None:
if digest not in progress_bars and completed_size > 0:
progress_bars[digest] = create_progress_bar(digest, total=total_size)
if current_digest is None:
current_digest = digest
else:
queue.append(digest)
if digest in progress_bars:
progress_bar = progress_bars[digest]
progress = completed_size - progress_bar.n
if completed_size > 0 and total_size >= progress != progress_bar.n:
if digest == current_digest:
progress_bar.update(progress)
if progress_bar.n >= total_size:
progress_bar.close()
current_digest = queue.popleft() if queue else None
else:
# Store progress for later update
progress_bars[digest].total = total_size
progress_bars[digest].n = completed_size
# Close any remaining progress bars at the end
for progress_bar in progress_bars.values():
progress_bar.close()
def pull_model(client: Client, model_name: str, raise_error: bool = True) -> None:
try:
installed_models = [model["name"] for model in client.list().get("models", {})]
if model_name not in installed_models:
logger.info(f"Pulling model {model_name}. Please wait...")
process_streaming(client.pull(model_name, stream=True))
logger.info(f"Model {model_name} pulled successfully")
except Exception as e:
logger.error(f"Failed to pull model {model_name}: {e!s}")
if raise_error:
raise e

View File

@@ -1,21 +1,95 @@
[tool.poetry]
name = "private-gpt"
version = "0.1.0"
version = "0.5.0"
description = "Private GPT"
authors = ["Zylon <hi@zylon.ai>"]
[tool.poetry.dependencies]
python = ">=3.11,<3.12"
fastapi = { extras = ["all"], version = "^0.103.1" }
boto3 = "^1.28.56"
# PrivateGPT
fastapi = { extras = ["all"], version = "^0.111.0" }
python-multipart = "^0.0.9"
injector = "^0.21.0"
pyyaml = "^6.0.1"
python-multipart = "^0.0.6"
pypdf = "^3.16.2"
llama-index = { extras = ["local_models"], version = "0.9.10" }
watchdog = "^3.0.0"
qdrant-client = "^1.6.9"
chromadb = {version = "^0.4.13", optional = true}
watchdog = "^4.0.1"
transformers = "^4.42.3"
docx2txt = "^0.8"
cryptography = "^3.1"
# LlamaIndex core libs
llama-index-core = "^0.10.52"
llama-index-readers-file = "^0.1.27"
# Optional LlamaIndex integration libs
llama-index-llms-llama-cpp = {version = "^0.1.4", optional = true}
llama-index-llms-openai = {version = "^0.1.25", optional = true}
llama-index-llms-openai-like = {version ="^0.1.3", optional = true}
llama-index-llms-ollama = {version ="^0.2.2", optional = true}
llama-index-llms-azure-openai = {version ="^0.1.8", optional = true}
llama-index-llms-gemini = {version ="^0.1.11", optional = true}
llama-index-embeddings-ollama = {version ="^0.1.2", optional = true}
llama-index-embeddings-huggingface = {version ="^0.2.2", optional = true}
llama-index-embeddings-openai = {version ="^0.1.10", optional = true}
llama-index-embeddings-azure-openai = {version ="^0.1.10", optional = true}
llama-index-embeddings-gemini = {version ="^0.1.8", optional = true}
llama-index-vector-stores-qdrant = {version ="^0.2.10", optional = true}
llama-index-vector-stores-milvus = {version ="^0.1.20", optional = true}
llama-index-vector-stores-chroma = {version ="^0.1.10", optional = true}
llama-index-vector-stores-postgres = {version ="^0.1.11", optional = true}
llama-index-vector-stores-clickhouse = {version ="^0.1.3", optional = true}
llama-index-storage-docstore-postgres = {version ="^0.1.3", optional = true}
llama-index-storage-index-store-postgres = {version ="^0.1.4", optional = true}
# Postgres
psycopg2-binary = {version ="^2.9.9", optional = true}
asyncpg = {version="^0.29.0", optional = true}
# ClickHouse
clickhouse-connect = {version = "^0.7.15", optional = true}
# Optional Sagemaker dependency
boto3 = {version ="^1.34.139", optional = true}
# Optional Qdrant client
qdrant-client = {version ="^1.9.0", optional = true}
# Optional Reranker dependencies
torch = {version ="^2.3.1", optional = true}
sentence-transformers = {version ="^3.0.1", optional = true}
# Optional UI
gradio = {version ="^4.37.2", optional = true}
# Fix: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/16289#issuecomment-2255106490
ffmpy = {git = "https://github.com/EuDs63/ffmpy.git", rev = "333a19ee4d21f32537c0508aa1942ef1aa7afe24", optional = true}
# Optional Google Gemini dependency
google-generativeai = {version ="^0.5.4", optional = true}
# Optional Ollama client
ollama = {version ="^0.3.0", optional = true}
# Optional HF Transformers
einops = {version = "^0.8.0", optional = true}
[tool.poetry.extras]
ui = ["gradio", "ffmpy"]
llms-llama-cpp = ["llama-index-llms-llama-cpp"]
llms-openai = ["llama-index-llms-openai"]
llms-openai-like = ["llama-index-llms-openai-like"]
llms-ollama = ["llama-index-llms-ollama", "ollama"]
llms-sagemaker = ["boto3"]
llms-azopenai = ["llama-index-llms-azure-openai"]
llms-gemini = ["llama-index-llms-gemini", "google-generativeai"]
embeddings-ollama = ["llama-index-embeddings-ollama", "ollama"]
embeddings-huggingface = ["llama-index-embeddings-huggingface", "einops"]
embeddings-openai = ["llama-index-embeddings-openai"]
embeddings-sagemaker = ["boto3"]
embeddings-azopenai = ["llama-index-embeddings-azure-openai"]
embeddings-gemini = ["llama-index-embeddings-gemini"]
vector-stores-qdrant = ["llama-index-vector-stores-qdrant"]
vector-stores-clickhouse = ["llama-index-vector-stores-clickhouse", "clickhouse_connect"]
vector-stores-chroma = ["llama-index-vector-stores-chroma"]
vector-stores-postgres = ["llama-index-vector-stores-postgres"]
vector-stores-milvus = ["llama-index-vector-stores-milvus"]
storage-nodestore-postgres = ["llama-index-storage-docstore-postgres","llama-index-storage-index-store-postgres","psycopg2-binary","asyncpg"]
rerank-sentence-transformers = ["torch", "sentence-transformers"]
[tool.poetry.group.dev.dependencies]
black = "^22"
@@ -27,26 +101,6 @@ ruff = "^0"
pytest-asyncio = "^0.21.1"
types-pyyaml = "^6.0.12.12"
# Dependencies for gradio UI
[tool.poetry.group.ui]
optional = true
[tool.poetry.group.ui.dependencies]
gradio = "^4.7.1"
[tool.poetry.group.local]
optional = true
[tool.poetry.group.local.dependencies]
llama-cpp-python = "^0.2.20"
jinja2 = "^3.1.2"
# numpy = "1.26.0"
sentence-transformers = "^2.2.2"
# https://stackoverflow.com/questions/76327419/valueerror-libcublas-so-0-9-not-found-in-the-system-path
torch = ">=2.0.0, !=2.0.1, !=2.1.0"
transformers = "^4.35.2"
[tool.poetry.extras]
chroma = ["chromadb"]
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
@@ -73,7 +127,7 @@ target-version = ['py311']
target-version = 'py311'
# See all rules at https://beta.ruff.rs/docs/rules/
select = [
lint.select = [
"E", # pycodestyle
"W", # pycodestyle
"F", # Pyflakes
@@ -90,7 +144,7 @@ select = [
"RUF", # Ruff-specific rules
]
ignore = [
lint.ignore = [
"E501", # "Line too long"
# -> line length already regulated by black
"PT011", # "pytest.raises() should specify expected exception"
@@ -108,24 +162,24 @@ ignore = [
# -> "Missing docstring in public function too restrictive"
]
[tool.ruff.pydocstyle]
[tool.ruff.lint.pydocstyle]
# Automatically disable rules that are incompatible with Google docstring convention
convention = "google"
[tool.ruff.pycodestyle]
[tool.ruff.lint.pycodestyle]
max-doc-length = 88
[tool.ruff.flake8-tidy-imports]
[tool.ruff.lint.flake8-tidy-imports]
ban-relative-imports = "all"
[tool.ruff.flake8-type-checking]
[tool.ruff.lint.flake8-type-checking]
strict = true
runtime-evaluated-base-classes = ["pydantic.BaseModel"]
# Pydantic needs to be able to evaluate types at runtime
# see https://pypi.org/project/flake8-type-checking/ for flake8-type-checking documentation
# see https://beta.ruff.rs/docs/settings/#flake8-type-checking-runtime-evaluated-base-classes for ruff documentation
[tool.ruff.per-file-ignores]
[tool.ruff.lint.per-file-ignores]
# Allow missing docstrings for tests
"tests/**/*.py" = ["D1"]
@@ -139,6 +193,9 @@ explicit_package_bases = true
warn_unused_ignores = false
exclude = ["tests"]
[tool.mypy-llama-index]
ignore_missing_imports = true
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]

View File

@@ -1,6 +1,7 @@
import argparse
import json
import sys
import yaml
from uvicorn.importer import import_from_string

View File

@@ -7,33 +7,54 @@ from pathlib import Path
from private_gpt.di import global_injector
from private_gpt.server.ingest.ingest_service import IngestService
from private_gpt.server.ingest.ingest_watcher import IngestWatcher
from private_gpt.settings.settings import Settings
logger = logging.getLogger(__name__)
class LocalIngestWorker:
def __init__(self, ingest_service: IngestService) -> None:
def __init__(self, ingest_service: IngestService, setting: Settings) -> None:
self.ingest_service = ingest_service
self.total_documents = 0
self.current_document_count = 0
self._files_under_root_folder: list[Path] = list()
self._files_under_root_folder: list[Path] = []
def _find_all_files_in_folder(self, root_path: Path) -> None:
self.is_local_ingestion_enabled = setting.data.local_ingestion.enabled
self.allowed_local_folders = setting.data.local_ingestion.allow_ingest_from
def _validate_folder(self, folder_path: Path) -> None:
if not self.is_local_ingestion_enabled:
raise ValueError(
"Local ingestion is disabled."
"You can enable it in settings `ingestion.enabled`"
)
# Allow all folders if wildcard is present
if "*" in self.allowed_local_folders:
return
for allowed_folder in self.allowed_local_folders:
if not folder_path.is_relative_to(allowed_folder):
raise ValueError(f"Folder {folder_path} is not allowed for ingestion")
def _find_all_files_in_folder(self, root_path: Path, ignored: list[str]) -> None:
"""Search all files under the root folder recursively.
Count them at the same time
"""
for file_path in root_path.iterdir():
if file_path.is_file():
if file_path.is_file() and file_path.name not in ignored:
self.total_documents += 1
self._validate_folder(file_path)
self._files_under_root_folder.append(file_path)
elif file_path.is_dir():
self._find_all_files_in_folder(file_path)
elif file_path.is_dir() and file_path.name not in ignored:
self._find_all_files_in_folder(file_path, ignored)
def ingest_folder(self, folder_path: Path) -> None:
def ingest_folder(self, folder_path: Path, ignored: list[str]) -> None:
# Count total documents before ingestion
self._find_all_files_in_folder(folder_path)
self._find_all_files_in_folder(folder_path, ignored)
self._ingest_all(self._files_under_root_folder)
def _ingest_all(self, files_to_ingest: list[Path]) -> None:
@@ -48,7 +69,7 @@ class LocalIngestWorker:
try:
if changed_path.exists():
logger.info(f"Started ingesting file={changed_path}")
self.ingest_service.ingest(changed_path.name, changed_path)
self.ingest_service.ingest_file(changed_path.name, changed_path)
logger.info(f"Completed ingesting file={changed_path}")
except Exception:
logger.exception(
@@ -64,12 +85,19 @@ parser.add_argument(
action=argparse.BooleanOptionalAction,
default=False,
)
parser.add_argument(
"--ignored",
nargs="*",
help="List of files/directories to ignore",
default=[],
)
parser.add_argument(
"--log-file",
help="Optional path to a log file. If provided, logs will be written to this file.",
type=str,
default=None,
)
args = parser.parse_args()
# Set up logging to a file if a path is provided
@@ -84,16 +112,24 @@ if args.log_file:
logger.addHandler(file_handler)
if __name__ == "__main__":
root_path = Path(args.folder)
if not root_path.exists():
raise ValueError(f"Path {args.folder} does not exist")
ingest_service = global_injector.get(IngestService)
worker = LocalIngestWorker(ingest_service)
worker.ingest_folder(root_path)
settings = global_injector.get(Settings)
worker = LocalIngestWorker(ingest_service, settings)
worker.ingest_folder(root_path, args.ignored)
if args.ignored:
logger.info(f"Skipping following files and directories: {args.ignored}")
if args.watch:
logger.info(f"Watching {args.folder} for changes, press Ctrl+C to stop...")
directories_to_watch = [
dir
for dir in root_path.iterdir()
if dir.is_dir() and dir.name not in args.ignored
]
watcher = IngestWatcher(args.folder, worker.ingest_on_watch)
watcher.start()

View File

@@ -3,37 +3,51 @@ import os
import argparse
from huggingface_hub import hf_hub_download, snapshot_download
from transformers import AutoTokenizer
from private_gpt.paths import models_path, models_cache_path
from private_gpt.settings.settings import settings
resume_download = True
if __name__ == '__main__':
parser = argparse.ArgumentParser(prog='Setup: Download models from huggingface')
parser = argparse.ArgumentParser(prog='Setup: Download models from Hugging Face')
parser.add_argument('--resume', default=True, action=argparse.BooleanOptionalAction, help='Enable/Disable resume_download options to restart the download progress interrupted')
args = parser.parse_args()
resume_download = args.resume
os.makedirs(models_path, exist_ok=True)
embedding_path = models_path / "embedding"
print(f"Downloading embedding {settings().local.embedding_hf_model_name}")
# Download Embedding model
embedding_path = models_path / "embedding"
print(f"Downloading embedding {settings().huggingface.embedding_hf_model_name}")
snapshot_download(
repo_id=settings().local.embedding_hf_model_name,
repo_id=settings().huggingface.embedding_hf_model_name,
cache_dir=models_cache_path,
local_dir=embedding_path,
token=settings().huggingface.access_token,
)
print("Embedding model downloaded!")
print("Downloading models for local execution...")
# Download LLM and create a symlink to the model file
print(f"Downloading LLM {settings().llamacpp.llm_hf_model_file}")
hf_hub_download(
repo_id=settings().local.llm_hf_repo_id,
filename=settings().local.llm_hf_model_file,
repo_id=settings().llamacpp.llm_hf_repo_id,
filename=settings().llamacpp.llm_hf_model_file,
cache_dir=models_cache_path,
local_dir=models_path,
resume_download=resume_download,
token=settings().huggingface.access_token,
)
print("LLM model downloaded!")
# Download Tokenizer
if settings().llm.tokenizer:
print(f"Downloading tokenizer {settings().llm.tokenizer}")
AutoTokenizer.from_pretrained(
pretrained_model_name_or_path=settings().llm.tokenizer,
cache_dir=models_cache_path,
token=settings().huggingface.access_token,
)
print("Tokenizer downloaded!")
print("Setup done")

View File

@@ -1,10 +1,22 @@
import argparse
import os
import shutil
from typing import Any, ClassVar
from private_gpt.paths import local_data_path
from private_gpt.settings.settings import settings
def wipe():
path = "local_data"
def wipe_file(file: str) -> None:
if os.path.isfile(file):
os.remove(file)
print(f" - Deleted {file}")
def wipe_tree(path: str) -> None:
if not os.path.exists(path):
print(f"Warning: Path not found {path}")
return
print(f"Wiping {path}...")
all_files = os.listdir(path)
@@ -24,14 +36,149 @@ def wipe():
continue
if __name__ == "__main__":
commands = {
"wipe": wipe,
class Postgres:
tables: ClassVar[dict[str, list[str]]] = {
"nodestore": ["data_docstore", "data_indexstore"],
"vectorstore": ["data_embeddings"],
}
parser = argparse.ArgumentParser()
parser.add_argument(
"mode", help="select a mode to run", choices=list(commands.keys())
def __init__(self) -> None:
try:
import psycopg2
except ModuleNotFoundError:
raise ModuleNotFoundError("Postgres dependencies not found") from None
connection = settings().postgres.model_dump(exclude_none=True)
self.schema = connection.pop("schema_name")
self.conn = psycopg2.connect(**connection)
def wipe(self, storetype: str) -> None:
cur = self.conn.cursor()
try:
for table in self.tables[storetype]:
sql = f"DROP TABLE IF EXISTS {self.schema}.{table}"
cur.execute(sql)
print(f"Table {self.schema}.{table} dropped.")
self.conn.commit()
finally:
cur.close()
def stats(self, store_type: str) -> None:
template = "SELECT '{table}', COUNT(*), pg_size_pretty(pg_total_relation_size('{table}')) FROM {table}"
sql = " UNION ALL ".join(
template.format(table=tbl) for tbl in self.tables[store_type]
)
cur = self.conn.cursor()
try:
print(f"Storage for Postgres {store_type}.")
print("{:<15} | {:>15} | {:>9}".format("Table", "Rows", "Size"))
print("-" * 45) # Print a line separator
cur.execute(sql)
for row in cur.fetchall():
formatted_row_count = f"{row[1]:,}"
print(f"{row[0]:<15} | {formatted_row_count:>15} | {row[2]:>9}")
print()
finally:
cur.close()
def __del__(self):
if hasattr(self, "conn") and self.conn:
self.conn.close()
class Simple:
def wipe(self, store_type: str) -> None:
assert store_type == "nodestore"
from llama_index.core.storage.docstore.types import (
DEFAULT_PERSIST_FNAME as DOCSTORE,
)
from llama_index.core.storage.index_store.types import (
DEFAULT_PERSIST_FNAME as INDEXSTORE,
)
for store in (DOCSTORE, INDEXSTORE):
wipe_file(str((local_data_path / store).absolute()))
class Chroma:
def wipe(self, store_type: str) -> None:
assert store_type == "vectorstore"
wipe_tree(str((local_data_path / "chroma_db").absolute()))
class Qdrant:
COLLECTION = (
"make_this_parameterizable_per_api_call" # ?! see vector_store_component.py
)
def __init__(self) -> None:
try:
from qdrant_client import QdrantClient # type: ignore
except ImportError:
raise ImportError("Qdrant dependencies not found") from None
self.client = QdrantClient(**settings().qdrant.model_dump(exclude_none=True))
def wipe(self, store_type: str) -> None:
assert store_type == "vectorstore"
try:
self.client.delete_collection(self.COLLECTION)
print("Collection dropped successfully.")
except Exception as e:
print("Error dropping collection:", e)
def stats(self, store_type: str) -> None:
print(f"Storage for Qdrant {store_type}.")
try:
collection_data = self.client.get_collection(self.COLLECTION)
if collection_data:
# Collection Info
# https://qdrant.tech/documentation/concepts/collections/
print(f"\tPoints: {collection_data.points_count:,}")
print(f"\tVectors: {collection_data.vectors_count:,}")
print(f"\tIndex Vectors: {collection_data.indexed_vectors_count:,}")
return
except ValueError:
pass
print("\t- Qdrant collection not found or empty")
class Command:
DB_HANDLERS: ClassVar[dict[str, Any]] = {
"simple": Simple, # node store
"chroma": Chroma, # vector store
"postgres": Postgres, # node, index and vector store
"qdrant": Qdrant, # vector store
}
def for_each_store(self, cmd: str):
for store_type in ("nodestore", "vectorstore"):
database = getattr(settings(), store_type).database
handler_class = self.DB_HANDLERS.get(database)
if handler_class is None:
print(f"No handler found for database '{database}'")
continue
handler_instance = handler_class() # Instantiate the class
# If the DB can handle this cmd dispatch it.
if hasattr(handler_instance, cmd) and callable(
func := getattr(handler_instance, cmd)
):
func(store_type)
else:
print(
f"Unable to execute command '{cmd}' on '{store_type}' in database '{database}'"
)
def execute(self, cmd: str) -> None:
if cmd in ("wipe", "stats"):
self.for_each_store(cmd)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("mode", help="select a mode to run", choices=["wipe", "stats"])
args = parser.parse_args()
commands[args.mode.lower()]()
Command().execute(args.mode.lower())

17
settings-azopenai.yaml Normal file
View File

@@ -0,0 +1,17 @@
server:
env_name: ${APP_ENV:azopenai}
llm:
mode: azopenai
embedding:
mode: azopenai
azopenai:
api_key: ${AZ_OPENAI_API_KEY:}
azure_endpoint: ${AZ_OPENAI_ENDPOINT:}
embedding_deployment_name: ${AZ_OPENAI_EMBEDDING_DEPLOYMENT_NAME:}
llm_deployment_name: ${AZ_OPENAI_LLM_DEPLOYMENT_NAME:}
api_version: "2023-05-15"
embedding_model: text-embedding-ada-002
llm_model: gpt-35-turbo

View File

@@ -5,15 +5,33 @@ server:
llm:
mode: ${PGPT_MODE:mock}
local:
llm_hf_repo_id: ${PGPT_HF_REPO_ID:TheBloke/Mistral-7B-Instruct-v0.1-GGUF}
llm_hf_model_file: ${PGPT_HF_MODEL_FILE:mistral-7b-instruct-v0.1.Q4_K_M.gguf}
embedding_hf_model_name: ${PGPT_EMBEDDING_HF_MODEL_NAME:BAAI/bge-small-en-v1.5}
embedding:
mode: ${PGPT_EMBED_MODE:mock}
llamacpp:
llm_hf_repo_id: ${PGPT_HF_REPO_ID:lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF}
llm_hf_model_file: ${PGPT_HF_MODEL_FILE:Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf}
huggingface:
embedding_hf_model_name: ${PGPT_EMBEDDING_HF_MODEL_NAME:nomic-ai/nomic-embed-text-v1.5}
sagemaker:
llm_endpoint_name: ${PGPT_SAGEMAKER_LLM_ENDPOINT_NAME:}
embedding_endpoint_name: ${PGPT_SAGEMAKER_EMBEDDING_ENDPOINT_NAME:}
ollama:
llm_model: ${PGPT_OLLAMA_LLM_MODEL:llama3.1}
embedding_model: ${PGPT_OLLAMA_EMBEDDING_MODEL:nomic-embed-text}
api_base: ${PGPT_OLLAMA_API_BASE:http://ollama:11434}
embedding_api_base: ${PGPT_OLLAMA_EMBEDDING_API_BASE:http://ollama:11434}
tfs_z: ${PGPT_OLLAMA_TFS_Z:1.0}
top_k: ${PGPT_OLLAMA_TOP_K:40}
top_p: ${PGPT_OLLAMA_TOP_P:0.9}
repeat_last_n: ${PGPT_OLLAMA_REPEAT_LAST_N:64}
repeat_penalty: ${PGPT_OLLAMA_REPEAT_PENALTY:1.2}
request_timeout: ${PGPT_OLLAMA_REQUEST_TIMEOUT:600.0}
autopull_models: ${PGPT_OLLAMA_AUTOPULL_MODELS:true}
ui:
enabled: true
path: /
path: /

10
settings-gemini.yaml Normal file
View File

@@ -0,0 +1,10 @@
llm:
mode: gemini
embedding:
mode: gemini
gemini:
api_key: ${GOOGLE_API_KEY:}
model: models/gemini-pro
embedding_model: models/embedding-001

View File

@@ -1,5 +1,27 @@
# poetry install --extras "ui llms-llama-cpp vector-stores-qdrant embeddings-huggingface"
server:
env_name: ${APP_ENV:local}
llm:
mode: local
mode: llamacpp
# Should be matching the selected model
max_new_tokens: 512
context_window: 3900
tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct
prompt_style: "llama3"
llamacpp:
llm_hf_repo_id: lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF
llm_hf_model_file: Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
embedding:
mode: huggingface
huggingface:
embedding_hf_model_name: nomic-ai/nomic-embed-text-v1.5
vectorstore:
database: qdrant
qdrant:
path: local_data/private_gpt/qdrant

View File

@@ -4,5 +4,6 @@ server:
# This configuration allows you to use GPU for creating embeddings while avoiding loading LLM into vRAM
llm:
mode: mock
embedding:
mode: local
mode: huggingface

34
settings-ollama-pg.yaml Normal file
View File

@@ -0,0 +1,34 @@
# Using ollama and postgres for the vector, doc and index store. Ollama is also used for embeddings.
# To use install these extras:
# poetry install --extras "llms-ollama ui vector-stores-postgres embeddings-ollama storage-nodestore-postgres"
server:
env_name: ${APP_ENV:ollama}
llm:
mode: ollama
max_new_tokens: 512
context_window: 3900
embedding:
mode: ollama
embed_dim: 768
ollama:
llm_model: llama3.1
embedding_model: nomic-embed-text
api_base: http://localhost:11434
nodestore:
database: postgres
vectorstore:
database: postgres
postgres:
host: localhost
port: 5432
database: postgres
user: postgres
password: admin
schema_name: private_gpt

30
settings-ollama.yaml Normal file
View File

@@ -0,0 +1,30 @@
server:
env_name: ${APP_ENV:ollama}
llm:
mode: ollama
max_new_tokens: 512
context_window: 3900
temperature: 0.1 #The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)
embedding:
mode: ollama
ollama:
llm_model: llama3.1
embedding_model: nomic-embed-text
api_base: http://localhost:11434
embedding_api_base: http://localhost:11434 # change if your embedding model runs on another ollama
keep_alive: 5m
tfs_z: 1.0 # Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.
top_k: 40 # Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)
top_p: 0.9 # Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
repeat_last_n: 64 # Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)
repeat_penalty: 1.2 # Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)
request_timeout: 120.0 # Time elapsed until ollama times out the request. Default is 120s. Format is float.
vectorstore:
database: qdrant
qdrant:
path: local_data/private_gpt/qdrant

12
settings-openai.yaml Normal file
View File

@@ -0,0 +1,12 @@
server:
env_name: ${APP_ENV:openai}
llm:
mode: openai
embedding:
mode: openai
openai:
api_key: ${OPENAI_API_KEY:}
model: gpt-3.5-turbo

View File

@@ -1,5 +1,5 @@
server:
env_name: ${APP_ENV:prod}
env_name: ${APP_ENV:sagemaker}
port: ${PORT:8001}
ui:
@@ -9,6 +9,9 @@ ui:
llm:
mode: sagemaker
embedding:
mode: sagemaker
sagemaker:
llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140
embedding_endpoint_name: huggingface-pytorch-inference-2023-11-03-07-41-36-479
llm_endpoint_name: llm
embedding_endpoint_name: embedding

View File

@@ -14,5 +14,8 @@ qdrant:
llm:
mode: mock
embedding:
mode: mock
ui:
enabled: false

21
settings-vllm.yaml Normal file
View File

@@ -0,0 +1,21 @@
server:
env_name: ${APP_ENV:vllm}
llm:
mode: openailike
max_new_tokens: 512
tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct
temperature: 0.1
embedding:
mode: huggingface
ingest_mode: simple
huggingface:
embedding_hf_model_name: nomic-ai/nomic-embed-text-v1.5
openai:
api_base: http://localhost:8000/v1
api_key: EMPTY
model: facebook/opt-125m
request_timeout: 600.0

View File

@@ -5,7 +5,7 @@ server:
env_name: ${APP_ENV:prod}
port: ${PORT:8001}
cors:
enabled: false
enabled: true
allow_origins: ["*"]
allow_methods: ["*"]
allow_headers: ["*"]
@@ -17,31 +17,105 @@ server:
secret: "Basic c2VjcmV0OmtleQ=="
data:
local_ingestion:
enabled: ${LOCAL_INGESTION_ENABLED:false}
allow_ingest_from: ["*"]
local_data_folder: local_data/private_gpt
ui:
enabled: true
path: /
default_chat_system_prompt: >
You are a helpful, respectful and honest assistant.
Always answer as helpfully as possible and follow ALL given instructions.
Do not speculate or make up information.
Do not reference any given instructions or context.
default_query_system_prompt: >
You can only answer questions about the provided context.
If you know the answer but it is not based in the provided context, don't provide
the answer, just state the answer is not in the context provided.
default_summarization_system_prompt: >
Provide a comprehensive summary of the provided context information.
The summary should cover all the key points and main ideas presented in
the original text, while also condensing the information into a concise
and easy-to-understand format. Please ensure that the summary includes
relevant details and examples that support the main ideas, while avoiding
any unnecessary information or repetition.
delete_file_button_enabled: true
delete_all_files_button_enabled: true
llm:
mode: local
mode: llamacpp
prompt_style: "llama3"
# Should be matching the selected model
max_new_tokens: 512
context_window: 3900
# Select your tokenizer. Llama-index tokenizer is the default.
# tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct
temperature: 0.1 # The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)
rag:
similarity_top_k: 2
#This value controls how many "top" documents the RAG returns to use in the context.
#similarity_value: 0.45
#This value is disabled by default. If you enable this settings, the RAG will only use articles that meet a certain percentage score.
rerank:
enabled: false
model: cross-encoder/ms-marco-MiniLM-L-2-v2
top_n: 1
summarize:
use_async: true
clickhouse:
host: localhost
port: 8443
username: admin
password: clickhouse
database: embeddings
llamacpp:
llm_hf_repo_id: lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF
llm_hf_model_file: Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
tfs_z: 1.0 # Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting
top_k: 40 # Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)
top_p: 1.0 # Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
repeat_penalty: 1.1 # Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)
embedding:
# Should be matching the value above in most cases
mode: local
mode: huggingface
ingest_mode: simple
embed_dim: 768 # 768 is for nomic-ai/nomic-embed-text-v1.5
huggingface:
embedding_hf_model_name: nomic-ai/nomic-embed-text-v1.5
access_token: ${HF_TOKEN:}
# Warning: Enabling this option will allow the model to download and execute code from the internet.
# Nomic AI requires this option to be enabled to use the model, be aware if you are using a different model.
trust_remote_code: true
vectorstore:
database: qdrant
nodestore:
database: simple
milvus:
uri: local_data/private_gpt/milvus/milvus_local.db
collection_name: milvus_db
overwrite: false
qdrant:
path: local_data/private_gpt/qdrant
local:
prompt_style: "llama2"
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.1-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.1.Q4_K_M.gguf
embedding_hf_model_name: BAAI/bge-small-en-v1.5
postgres:
host: localhost
port: 5432
database: postgres
user: postgres
password: postgres
schema_name: private_gpt
sagemaker:
llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140
@@ -49,3 +123,28 @@ sagemaker:
openai:
api_key: ${OPENAI_API_KEY:}
model: gpt-3.5-turbo
embedding_api_key: ${OPENAI_API_KEY:}
ollama:
llm_model: llama3.1
embedding_model: nomic-embed-text
api_base: http://localhost:11434
embedding_api_base: http://localhost:11434 # change if your embedding model runs on another ollama
keep_alive: 5m
request_timeout: 120.0
autopull_models: true
azopenai:
api_key: ${AZ_OPENAI_API_KEY:}
azure_endpoint: ${AZ_OPENAI_ENDPOINT:}
embedding_deployment_name: ${AZ_OPENAI_EMBEDDING_DEPLOYMENT_NAME:}
llm_deployment_name: ${AZ_OPENAI_LLM_DEPLOYMENT_NAME:}
api_version: "2023-05-15"
embedding_model: text-embedding-ada-002
llm_model: gpt-35-turbo
gemini:
api_key: ${GOOGLE_API_KEY:}
model: models/gemini-pro
embedding_model: models/embedding-001

Some files were not shown because too many files have changed in this diff Show More