Commit Graph

12 Commits

Author SHA1 Message Date
Iván Martínez
bf3bddfbb6 More loaders, generic method
- Update the README with extra formats
- Add Powerpoint, requested in #138
- Add ePub requested in #138 comment - https://github.com/imartinez/privateGPT/pull/138#issuecomment-1549564535
- Update requirements
2023-05-17 00:55:21 +02:00
Iván Martínez
23d24c88e9 Update code to use sentence-transformers through huggingfaceembeddings 2023-05-17 00:32:41 +02:00
Andrea Pinto
d0aa57178a ingest unlimited number of documents 2023-05-12 15:36:20 +02:00
Andrea Pinto
01f55441e7 fix persist db directory at ingestion 2023-05-12 10:37:10 +02:00
Sorin Neacsu
544ddd9631
load .env 2023-05-11 15:34:17 -07:00
alxspiker
f60dbb520e
Merge branch 'main' into main 2023-05-11 14:34:13 -06:00
alxspiker
52ae6c0866 .env + LlamaCpp + PDF/CSV + Ingest All
.env

Added an env file to make configuration easier

LlamaCpp

Added support for LlamaCpp in .env (MODEL_TYPE=LlamaCpp)

PDF/CSV

Added support for PDF and CSV files.

Ingest All

All files in source_documents will automatically get stored in vector store based on their file type when running ingest, no longer need a path argument.
2023-05-11 14:24:39 -06:00
R-Y-M-R
f12ea568e5 Use constants.py file 2023-05-11 10:29:07 -04:00
R-Y-M-R
8c6a81a07f Fix: Disable Chroma Telemetry
Opts-out of anonymized telemetry being tracked in Chroma.

See: https://docs.trychroma.com/telemetry
2023-05-11 10:17:18 -04:00
Iván Martínez
026b9f895c Use RecursiveCharacterTextSplitter to avoid llama_tokenize: too many tokens error during ingestion 2023-05-09 00:21:02 +02:00
Iván Martínez
92244a90b4 Use a different text splitter to improve results. Ingest takes an argument pointing to the doc to ingest. 2023-05-05 17:32:31 +02:00
Iván martínez
55338b8f6e End-to-end working version 2023-05-02 20:32:28 +02:00