mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-04 20:46:45 +00:00
community: Fix FastEmbedEmbeddings (#24462)
## Description This PR: - Fixes the validation error in `FastEmbedEmbeddings`. - Adds support for `batch_size`, `parallel` params. - Removes support for very old FastEmbed versions. - Updates the FastEmbed doc with the new params. Associated Issues: - Resolves #24039 - Resolves #https://github.com/qdrant/fastembed/issues/296
This commit is contained in:
@@ -73,16 +73,25 @@
|
||||
"- `max_length: int` (default: 512)\n",
|
||||
" > The maximum number of tokens. Unknown behavior for values > 512.\n",
|
||||
"\n",
|
||||
"- `cache_dir: Optional[str]`\n",
|
||||
"- `cache_dir: Optional[str]` (default: None)\n",
|
||||
" > The path to the cache directory. Defaults to `local_cache` in the parent directory.\n",
|
||||
"\n",
|
||||
"- `threads: Optional[int]`\n",
|
||||
" > The number of threads a single onnxruntime session can use. Defaults to None.\n",
|
||||
"- `threads: Optional[int]` (default: None)\n",
|
||||
" > The number of threads a single onnxruntime session can use.\n",
|
||||
"\n",
|
||||
"- `doc_embed_type: Literal[\"default\", \"passage\"]` (default: \"default\")\n",
|
||||
" > \"default\": Uses FastEmbed's default embedding method.\n",
|
||||
" \n",
|
||||
" > \"passage\": Prefixes the text with \"passage\" before embedding."
|
||||
" > \"passage\": Prefixes the text with \"passage\" before embedding.\n",
|
||||
"\n",
|
||||
"- `batch_size: int` (default: 256)\n",
|
||||
" > Batch size for encoding. Higher values will use more memory, but be faster.\n",
|
||||
"\n",
|
||||
"- `parallel: Optional[int]` (default: None)\n",
|
||||
"\n",
|
||||
" > If `>1`, data-parallel encoding will be used, recommended for offline encoding of large datasets.\n",
|
||||
" > If `0`, use all available cores.\n",
|
||||
" > If `None`, don't use data-parallel processing, use default onnxruntime threading instead."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@@ -317,7 +317,7 @@
|
||||
"To search with only dense vectors,\n",
|
||||
"\n",
|
||||
"- The `retrieval_mode` parameter should be set to `RetrievalMode.DENSE`(default).\n",
|
||||
"- A [dense embeddings](https://python.langchain.com/v0.2/docs/integrations/text_embedding/) value should be provided for the `embedding` parameter."
|
||||
"- A [dense embeddings](https://python.langchain.com/v0.2/docs/integrations/text_embedding/) value should be provided to the `embedding` parameter."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -407,7 +407,7 @@
|
||||
"To perform a hybrid search using dense and sparse vectors with score fusion,\n",
|
||||
"\n",
|
||||
"- The `retrieval_mode` parameter should be set to `RetrievalMode.HYBRID`.\n",
|
||||
"- A [dense embeddings](https://python.langchain.com/v0.2/docs/integrations/text_embedding/) value should be provided for the `embedding` parameter.\n",
|
||||
"- A [dense embeddings](https://python.langchain.com/v0.2/docs/integrations/text_embedding/) value should be provided to the `embedding` parameter.\n",
|
||||
"- An implementation of the [`SparseEmbeddings`](https://github.com/langchain-ai/langchain/blob/master/libs/partners/qdrant/langchain_qdrant/sparse_embeddings.py) interface using any sparse embeddings provider has to be provided as value to the `sparse_embedding` parameter.\n",
|
||||
"\n",
|
||||
"Note that if you've added documents with the `HYBRID` mode, you can switch to any retrieval mode when searching. Since both the dense and sparse vectors are available in the collection."
|
||||
|
Reference in New Issue
Block a user