Support named vectors in Qdrant (#6871)

# Description

This PR makes it possible to use named vectors from Qdrant in Langchain.
That was requested multiple times, as people want to reuse externally
created collections in Langchain. It doesn't change anything for the
existing applications. The changes were covered with some integration
tests and included in the docs.

## Example

```python
Qdrant.from_documents(
    docs,
    embeddings,
    location=":memory:",
    collection_name="my_documents",
    vector_name="custom_vector",
)
```

### Issue: #2594 

Tagging @rlancemartin & @eyurtsev. I'd appreciate your review.
This commit is contained in:
Kacper Łukawski
2023-06-30 00:14:22 +02:00
committed by GitHub
parent 9ca1cf003c
commit 140ba682f1
3 changed files with 215 additions and 91 deletions

View File

@@ -626,6 +626,44 @@
"source": [
"## Customizing Qdrant\n",
"\n",
"There are some options to use an existing Qdrant collection within your Langchain application. In such cases you may need to define how to map Qdrant point into the Langchain `Document`.\n",
"\n",
"### Named vectors\n",
"\n",
"Qdrant supports [multiple vectors per point](https://qdrant.tech/documentation/concepts/collections/#collection-with-multiple-vectors) by named vectors. Langchain requires just a single embedding per document and, by default, uses a single vector. However, if you work with a collection created externally or want to have the named vector used, you can configure it by providing its name.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"Qdrant.from_documents(\n",
" docs,\n",
" embeddings,\n",
" location=\":memory:\",\n",
" collection_name=\"my_documents_2\",\n",
" vector_name=\"custom_vector\",\n",
")"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"As a Langchain user, you won't see any difference whether you use named vectors or not. Qdrant integration will handle the conversion under the hood."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"### Metadata\n",
"\n",
"Qdrant stores your vector embeddings along with the optional JSON-like payload. Payloads are optional, but since LangChain assumes the embeddings are generated from the documents, we keep the context data, so you can extract the original texts as well.\n",
"\n",
"By default, your document is going to be stored in the following payload structure:\n",
@@ -639,8 +677,11 @@
"}\n",
"```\n",
"\n",
"You can, however, decide to use different keys for the page content and metadata. That's useful if you already have a collection that you'd like to reuse. You can always change the "
]
"You can, however, decide to use different keys for the page content and metadata. That's useful if you already have a collection that you'd like to reuse."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",