Commit Graph

210 Commits

Author SHA1 Message Date
vilaca
79a3c00313 remove duplicate 2023-05-17 23:45:27 +01:00
Fabio Rossini Sluzala
652401cf29
Add the formats to the README.md 2023-05-17 13:53:46 -03:00
Fabio Rossini Sluzala
66a9f9cde0
Add .doc .ppt (Word and PowerPoint 97/2003 formats) 2023-05-17 12:04:16 -03:00
Iván Martínez
355b4be7c0
Merge pull request #224 from imartinez/feature/sentence-transformers-embeddings
Feature/sentence transformers embeddings
2023-05-17 10:56:34 +02:00
Iván Martínez
83797ec08b
Merge pull request #240 from zishon89us/patch-1
pypandoc-binary replacing pandoc-binary
2023-05-17 09:25:14 +02:00
Zeeshan Hassan Memon
dd144bba16
pypandoc-binary replacing pandoc-binary 2023-05-17 11:27:43 +05:00
milescattini
380b119581
Add fix for clang install of non m1 mac 2023-05-17 11:48:35 +10:00
Iván Martínez
90798f1986 Merge branch 'main' into feature/sentence-transformers-embeddings 2023-05-17 01:00:13 +02:00
Iván Martínez
bf3bddfbb6 More loaders, generic method
- Update the README with extra formats
- Add Powerpoint, requested in #138
- Add ePub requested in #138 comment - https://github.com/imartinez/privateGPT/pull/138#issuecomment-1549564535
- Update requirements
2023-05-17 00:55:21 +02:00
Iván Martínez
fdb45741e5
Merge pull request #211 from mdeweerd/extra_loaders
More loaders, generic method
2023-05-17 00:39:37 +02:00
Iván Martínez
23d24c88e9 Update code to use sentence-transformers through huggingfaceembeddings 2023-05-17 00:32:41 +02:00
Iván Martínez
8a5b2f453b Use faster and better embeddings: sentenceTransformers 2023-05-17 00:19:21 +02:00
Iván Martínez
2217b5f0e3 More loaders, generic method
- Update the README with extra formats
- Add Powerpoint, requested in #138
- Add ePub requested in #138 comment - https://github.com/imartinez/privateGPT/pull/138#issuecomment-1549564535
- Update requirements
2023-05-16 23:58:58 +02:00
Iván Martínez
b6f007dbb8
Update issue templates 2023-05-16 20:44:30 +02:00
Iván Martínez
9e94a3cd40
Update issue templates 2023-05-16 20:12:34 +02:00
Iván Martínez
f42d3e0ce2
Merge pull request #168 from andreakiro/fix/requirements
Add python-dotenv to requirements
2023-05-16 19:32:11 +02:00
Andrea Pinto
7ae80e6629 add python-dotenv to requirements 2023-05-15 19:19:10 +02:00
Iván Martínez
5a695e9767
Merge pull request #93 from katojunichi893/main
Update README.md
2023-05-14 10:55:12 +02:00
Iván Martínez
a061270bf0
Merge pull request #105 from koushkv/patch-1
fixed a typo
2023-05-14 10:42:25 +02:00
Iván Martínez
7612193031
Merge pull request #64 from FluffyDietEngine/main
added library for parsing PDFs
2023-05-14 10:39:38 +02:00
katojunichi893
9c3832c156 Update README.md 2023-05-14 17:36:40 +09:00
Koushik
2dac62c5aa
fixed a typo 2023-05-14 10:26:13 +05:30
ひかる
24e464f51b
Update README.md 2023-05-14 04:18:17 +09:00
Iván Martínez
b76a240714
Merge pull request #74 from andreakiro/fix/load-documents
Ingest unlimited number of documents
2023-05-13 10:36:57 +02:00
Andrea Pinto
d0aa57178a ingest unlimited number of documents 2023-05-12 15:36:20 +02:00
Iván Martínez
271673ffcc
Merge pull request #68 from andreakiro/readme/updates
Note on instructions for .env
2023-05-12 11:33:51 +02:00
Iván Martínez
034fde4c3e
Merge pull request #67 from andreakiro/fix/persist-dir
Fix persist db directory at ingestion
2023-05-12 11:31:53 +02:00
Andrea Pinto
718b67715c note on instructions for .env 2023-05-12 11:15:51 +02:00
Andrea Pinto
01f55441e7 fix persist db directory at ingestion 2023-05-12 10:37:10 +02:00
Santhosh Solomon
6419d0aa1c
added library for parsing PDFs
pdfminer.six==20221105
2023-05-12 09:33:05 +05:30
Iván Martínez
39df61ca07
Merge pull request #58 from sorin/sorin-fix-env
Load .env file
2023-05-12 00:37:05 +02:00
Sorin Neacsu
544ddd9631
load .env 2023-05-11 15:34:17 -07:00
Sorin Neacsu
e947ca1d0f
load .env 2023-05-11 15:33:56 -07:00
Iván Martínez
bc7ce4395b
Merge pull request #53 from alxspiker/main
.env + LlamaCpp + PDF/CSV + Ingest All
2023-05-11 23:22:27 +02:00
alxspiker
39d00b840d
Update README.md 2023-05-11 15:05:07 -06:00
alxspiker
9722ef4356
Update README.md 2023-05-11 15:01:57 -06:00
alxspiker
51f01d850a
Update README.md 2023-05-11 14:53:10 -06:00
alxspiker
f60dbb520e
Merge branch 'main' into main 2023-05-11 14:34:13 -06:00
alxspiker
52ae6c0866 .env + LlamaCpp + PDF/CSV + Ingest All
.env

Added an env file to make configuration easier

LlamaCpp

Added support for LlamaCpp in .env (MODEL_TYPE=LlamaCpp)

PDF/CSV

Added support for PDF and CSV files.

Ingest All

All files in source_documents will automatically get stored in vector store based on their file type when running ingest, no longer need a path argument.
2023-05-11 14:24:39 -06:00
Iván Martínez
56c1be36ad
Merge pull request #44 from R-Y-M-R/Fix/DisableChromaTelemetry
Disable chroma telemetry. Extract constants.
2023-05-11 19:38:43 +02:00
Iván Martínez
9c0321235b
Merge pull request #39 from R-Y-M-R/Update/Requirements
Update langchain and llama versions
2023-05-11 19:35:31 +02:00
R-Y-M-R
85528db743 Update langchain to 0.0.166
Tested.

Release: https://github.com/hwchase17/langchain/releases/tag/v0.0.166
2023-05-11 12:37:00 -04:00
R-Y-M-R
f12ea568e5 Use constants.py file 2023-05-11 10:29:07 -04:00
R-Y-M-R
8c6a81a07f Fix: Disable Chroma Telemetry
Opts-out of anonymized telemetry being tracked in Chroma.

See: https://docs.trychroma.com/telemetry
2023-05-11 10:17:18 -04:00
R-Y-M-R
918b384e38 Update langchain and llama versions
Bumped versions in requirements.txt, tested OK.

langchain 0.0.165 release: https://github.com/hwchase17/langchain/releases/tag/v0.0.165

llama 0.1.48 release: https://github.com/abetlen/llama-cpp-python/releases/tag/v0.1.48
2023-05-11 09:50:40 -04:00
Iván Martínez
60225698b6
Merge pull request #35 from R-Y-M-R/Fix/urllib3
Add urllib3 fix to requirements.txt
2023-05-11 14:32:28 +02:00
R-Y-M-R
54d14a6cb6 Resolve #17: Add urllib3 fix to requirements.txt
Applied fix from @abereghici to requirements.txt
2023-05-11 06:26:04 -04:00
Iván Martínez
2841fe45e1
Merge pull request #22 from 0mlml/patch-1
Fix typo in README.md
2023-05-10 14:52:11 +02:00
Max
e3769a060e
Fix typo in README.md 2023-05-10 08:17:39 -04:00
Iván Martínez
026b9f895c Use RecursiveCharacterTextSplitter to avoid llama_tokenize: too many tokens error during ingestion 2023-05-09 00:21:02 +02:00