Commit Graph

135 Commits

Author SHA1 Message Date
Pablo Orgaz
51cc638758
Next version of PrivateGPT (#1077)
* Dockerize private-gpt

* Use port 8001 for local development

* Add setup script

* Add CUDA Dockerfile

* Create README.md

* Make the API use OpenAI response format

* Truncate prompt

* refactor: add models and __pycache__ to .gitignore

* Better naming

* Update readme

* Move models ignore to it's folder

* Add scaffolding

* Apply formatting

* Fix tests

* Working sagemaker custom llm

* Fix linting

* Fix linting

* Enable streaming

* Allow all 3.11 python versions

* Use llama 2 prompt format and fix completion

* Restructure (#3)

Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local>

* Fix Dockerfile

* Use a specific build stage

* Cleanup

* Add FastAPI skeleton

* Cleanup openai package

* Fix DI and tests

* Split tests and tests with coverage

* Remove old scaffolding

* Add settings logic (#4)

* Add settings logic

* Add settings for sagemaker

---------

Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local>

* Local LLM (#5)

* Add settings logic

* Add settings for sagemaker

* Add settings-local-example.yaml

* Delete terraform files

* Refactor tests to use fixtures

* Join deltas

* Add local model support

---------

Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local>

* Update README.md

* Fix tests

* Version bump

* Enable simple llamaindex observability (#6)

* Enable simple llamaindex observability

* Improve code through linting

* Update README.md

* Move to async (#7)

* Migrate implementation to use asyncio

* Formatting

* Cleanup

* Linting

---------

Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local>

* Query Docs and gradio UI

* Remove unnecessary files

* Git ignore chromadb folder

* Async migration + DI Cleanup

* Fix tests

* Add integration test

* Use fastapi responses

* Retrieval service with partial implementation

* Cleanup

* Run formatter

* Fix types

* Fetch nodes asynchronously

* Install local dependencies in tests

* Install ui dependencies in tests

* Install dependencies for llama-cpp

* Fix sudo

* Attempt to fix cuda issues

* Attempt to fix cuda issues

* Try to reclaim some space from ubuntu machine

* Retrieval with context

* Fix lint and imports

* Fix mypy

* Make retrieval API a POST

* Make Completions body a dataclass

* Fix LLM chat message order

* Add Query Chunks to Gradio UI

* Improve rag query prompt

* Rollback CI Changes

* Move to sync code

* Using Llamaindex abstraction for query retrieval

* Fix types

* Default to CONDENSED chat mode for contextualized chat

* Rename route function

* Add Chat endpoint

* Remove webhooks

* Add IntelliJ run config to gitignore

* .gitignore applied

* Sync chat completion

* Refactor total

* Typo in context_files.py

* Add embeddings component and service

* Remove wrong dataclass from IngestService

* Filter by context file id implementation

* Fix typing

* Implement context_filter and separate from the bool use_context in the API

* Change chunks api to avoid conceptual class of the context concept

* Deprecate completions and fix tests

* Remove remaining dataclasses

* Use embedding component in ingest service

* Fix ingestion to have multipart and local upload

* Fix ingestion API

* Add chunk tests

* Add configurable paths

* Cleaning up

* Add more docs

* IngestResponse includes a list of IngestedDocs

* Use IngestedDoc in the Chunk document reference

* Rename ingest routes to ingest_router.py

* Fix test working directory for intellij

* Set testpaths for pytest

* Remove unused as_chat_engine

* Add .fleet ide to gitignore

* Make LLM and Embedding model configurable

* Fix imports and checks

* Let local_data folder exist empty in the repository

* Don't use certain metadata in LLM

* Remove long lines

* Fix windows installation

* Typos

* Update poetry.lock

* Add TODO for linux

* Script and first version of docs

* No jekill build

* Fix relative url to openapi json

* Change default docs values

* Move chromadb dependency to the general group

* Fix tests to use separate local_data

* Create CNAME

* Update CNAME

* Fix openapi.json relative path

* PrivateGPT logo

* WIP OpenAPI documentation metadata

* Add ingest script (#11)

* Add ingest script

* Fix broken name refactor

* Add ingest docs and Makefile script

* Linting

* Move transformers to main dependency

* Move torch to main dependencies

* Don't load HuggingFaceEmbedding in tests

* Fix lint

---------

Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local>

* Rename file to camel_case

* Commit settings-local.yaml

* Move documentation to public docs

* Fix docker image for linux

* Installation and Running the Server documentation

* Move back to docs folder, as it is the only supported by github pages

* Delete CNAME

* Create CNAME

* Delete CNAME

* Create CNAME

* Improved API documentation

* Fix lint

* Completions documentation

* Updated openapi scheme

* Ingestion API doc

* Minor doc changes

* Updated openapi scheme

* Chunks API documentation

* Embeddings and Health API, and homogeneous responses

* Revamp README with new skeleton of content

* More docs

* PrivateGPT logo

* Improve UI

* Update ingestion docu

* Update README with new sections

* Use context window in the retriever

* Gradio Documentation

* Add logo to UI

* Include Contributing and Community sections to README

* Update links to resources in the README

* Small README.md updates

* Wrap lines of README.md

* Don't put health under /v1

* Add copy button to Chat

* Architecture documentation

* Updated openapi.json

* Updated openapi.json

* Updated openapi.json

* Change UI label

* Update documentation

* Add releases link to README.md

* Gradio avatar and stop debug

* Readme update

* Clean old files

* Remove unused terraform checks

* Update twitter link.

* Disable minimum coverage

* Clean install message in README.md

---------

Co-authored-by: Pablo Orgaz <pablo@Pablos-MacBook-Pro.local>
Co-authored-by: Iván Martínez <ivanmartit@gmail.com>
Co-authored-by: RubenGuerrero <ruben.guerrero@boopos.com>
Co-authored-by: Daniel Gallego Vico <daniel.gallego@bq.com>
2023-10-19 16:04:35 +02:00
Iván Martínez
78d1ef44ad
Update README.md 2023-09-25 16:03:09 +02:00
Iván Martínez
0b5a6687e3
Merge pull request #999 from imartinez/990-cannot-submit-more-than-166-embeddings-at-once-while-ingesting
Batch embeddings to be processed by chromadb
2023-09-25 11:59:19 +02:00
Iván Martínez
0b8dc5d248 Add chroma db file to gitignore 2023-09-25 11:58:35 +02:00
Iván Martínez
c848fcff22 Merge poetry file from master 2023-09-25 11:52:34 +02:00
Iván Martínez
0db5aebf2f Use chromadb max_batch_size public attribute 2023-09-25 11:42:16 +02:00
Iván Martínez
c7d42fbe9e
Merge pull request #1015 from LUN000/main
fix poetry python packages for ingest.py
2023-09-11 10:34:45 +02:00
LUN000
46ce74317d fix poetry python packages for ingest.py 2023-09-10 13:17:52 +08:00
Iván Martínez
91163a247b Batch embeddings to be processed by chromadb 2023-08-31 16:36:19 +02:00
Iván Martínez
2940f987c0
Merge pull request #822 from VaiTon/fix/env-not-existing
Better error message if .env is empty/does not exist.
2023-08-28 17:41:47 +02:00
Iván Martínez
7b294ed31f Update dependencies. Upgrade chromadb integration. 2023-08-28 17:32:56 +02:00
Iván Martínez
54d3eee657
Merge pull request #960 from TheCubicleJockey/Add_sentence_transformers_2.2.2_to_required
Add_sentence_transformers_2.2.2_to_required
2023-08-21 15:33:01 +02:00
NickHaven
c07364ca58 Add_sentence_transformers_2.2.2_to_required 2023-08-18 14:59:04 -04:00
Iván Martínez
3f5f82db5c
Merge pull request #957 from parampavar/main
Adding support to ingest files with extensions in uppercase
2023-08-18 17:20:32 +02:00
parampavar
6dc494d30f
Merge pull request #1 from parampavar/parampavar-support-ingestion-of-uppercase-fileextensions
Adding support to ingest files with extensions in uppercase
2023-08-16 16:09:14 -07:00
parampavar
8f369dd2b9
Adding support to ingest files with extensions in uppercase
Files in the source_directory where ignored if their extensions where in uppercase like (*.PDF).
This change supports ingestion of files that match either lowercase or uppercase extensions like *.pdf or *.PDF. 
This can be enhanced further to support camelcase like *.Pdf at a later stage. The assumption is that this scenario is probably less than 5%.
2023-08-16 16:03:56 -07:00
Iván Martínez
8b9f7589c3
Merge pull request #881 from html-css-js-art/main
FIX : validation error for GPT4All n_ctx
2023-07-24 12:21:16 +02:00
Saurabh
e98e86ee99
Update privateGPT.py 2023-07-21 20:37:37 +05:30
Iván Martínez
86c2dcfe1b Update dependencies 2023-07-20 13:12:01 +02:00
Iván Martínez
f31ee47844
Merge pull request #837 from morsamatias/add-poetry
Add Poetry
2023-07-20 09:32:19 +02:00
Matias Morsa
bccd252594 Support Poetry. Added poetry to the README.md 2023-07-17 10:47:18 -03:00
Matias Morsa
dd11002028 Support Poetry. Added .toml and .lock 2023-07-08 21:10:37 -03:00
VaiTon
28537b6a84 Better error message if .env is empty/does not exist. 2023-07-06 00:16:11 +02:00
Iván Martínez
b1057afdf8
Merge pull request #727 from djm93dev/raise-exception
raise exception instead of print & exit
2023-06-16 19:29:18 +02:00
Daniel McDonald
d540adedcc raise exeception 2023-06-16 09:18:49 -04:00
Iván Martínez
76e5b48ea1
Merge pull request #709 from FearTheBadger/691-invalid-model-file
Update gpt4all to 0.3.4
2023-06-14 09:36:15 +02:00
Brock Renninger
c759aafa63 Update gpt4all to 0.3.4
gpt4all==0.3.2 was yanked according to https://pypi.org/project/gpt4all/#history
2023-06-13 14:32:19 -06:00
ivan-ontruck
ad661933cb Support n_batch to improve inference performance 2023-06-11 21:33:35 +02:00
ivan-ontruck
52eb020256 Add inference time to output 2023-06-11 21:16:17 +02:00
ivan-ontruck
d1de57291e Update langchain and gpt4all dependencies 2023-06-11 21:14:21 +02:00
Iván Martínez
5943ad1bf7
Merge pull request #659 from doskoi/patch-2
Skip for empty query
2023-06-11 19:13:58 +02:00
Iván Martínez
b2b5fd4298
Merge pull request #675 from nb-programmer/main
Update README.md instructions of .env file
2023-06-11 19:11:09 +02:00
Iván Martínez
51fa989679
Merge pull request #660 from doskoi/master
Improving performance for PDF loader
2023-06-11 19:10:08 +02:00
imaprogrammer
c4b247d696
Update README.md instructions of .env file
Clarified to create a copy of example.env instead of renaming it to prevent accidentally removing from repo
2023-06-10 10:25:50 +05:30
sj
05c7330643 Enhancement better performance for PDF loader 2023-06-07 23:51:05 +08:00
Jiang Sheng
ddfb95a32e
Skip for empty query 2023-06-07 23:29:55 +08:00
Iván Martínez
9d47d03d18
Merge pull request #560 from ravindraprasad75/fix-csv-issue
fixed the the csv file reading issue
2023-06-01 10:25:37 +02:00
Ravi
e9b31f7dd9
Update ingest.py
Co-authored-by: Bailey Matthews <bailey@hey.com>
2023-05-31 22:42:10 +05:30
Ravindra Prasad
db341e2a40 fixed the the csv file reading issue 2023-05-31 00:04:56 +05:30
Iván Martínez
60e6bd25eb
Merge pull request #474 from maozdemir/patch-3
fix: Add `TARGET_SOURCE_CHUNKS` to `example.env`
2023-05-25 12:44:03 +02:00
maozdemir
2027ac563b
fix: Add TARGET_SOURCE_CHUNKS to example.env
@imartinez
2023-05-25 13:36:43 +03:00
Iván Martínez
e6d6af4f82
Merge pull request #460 from maozdemir/feat/documents
feat: Get answers using preferred number of chunks
2023-05-25 08:26:20 +02:00
impulsivus
cf709a6b7a
feat: Get answers using preferred number of chunks 2023-05-24 21:16:58 +03:00
Iván Martínez
573c4363c4 Update LangChain to 0.0.177 and GPT4ALL bindings library 2023-05-23 12:42:27 +02:00
Iván Martínez
fb94b9d1d4
Merge pull request #387 from maozdemir/patch-1
typo: Change `pip` to `pip3` in README.md
2023-05-22 23:23:29 +02:00
maozdemir
6065918d0f
typo: Change pip to pip3 2023-05-22 19:04:01 +03:00
Iván Martínez
2f3aab9cfd Formatting fixes 2023-05-20 12:29:36 +02:00
Iván Martínez
e74a11119c Show ingestion logs in readme 2023-05-20 12:15:13 +02:00
Iván Martínez
80b9b1d03e Better logs during ingestion 2023-05-20 12:11:21 +02:00
Iván Martínez
4a0e0d2e70 Use chunk_size variable in logs. Make vectorstore check more flexible 2023-05-20 12:02:40 +02:00