feat(ingest): Created a faster ingestion mode - pipeline (#1750)

* Unify pgvector and postgres connection settings

* Remove local changes

* Update file pgvector->postgres

* postgresql should be postgres

* Adding pipeline ingestion mode

* disable hugging face parallelism.  Continue on file to doc transform failure

* Semaphore to limit docq async workers. ETA reporting
This commit is contained in:
Brett England
2024-03-19 16:24:46 -04:00
committed by GitHub
parent 1efac6a3fe
commit 134fc54d7d
5 changed files with 301 additions and 2 deletions

View File

@@ -62,6 +62,7 @@ The following ingestion mode exist:
* `simple`: historic behavior, ingest one document at a time, sequentially
* `batch`: read, parse, and embed multiple documents using batches (batch read, and then batch parse, and then batch embed)
* `parallel`: read, parse, and embed multiple documents in parallel. This is the fastest ingestion mode for local setup.
* `pipeline`: Alternative to parallel.
To change the ingestion mode, you can use the `embedding.ingest_mode` configuration value. The default value is `simple`.
To configure the number of workers used for parallel or batched ingestion, you can use