Implement max_marginal_relevance_search in VectorStore of Pinecone (#6056)

This adds implementation of MMR search in pinecone; and I have two
semi-related observations about this vector store class:
- Maybe we should also have a
`similarity_search_by_vector_returning_embeddings` like in supabase, but
it's not in the base `VectorStore` class so I didn't implement
- Talking about the base class, there's
`similarity_search_with_relevance_scores`, but in pinecone it is called
`similarity_search_with_score`; maybe we should consider renaming it to
align with other `VectorStore` base and sub classes (or add that as an
alias for backward compatibility)

#### Who can review?

Tag maintainers/contributors who might be interested:
 - VectorStores / Retrievers / Memory - @dev2049
This commit is contained in:
Wenchen Li
2023-06-14 01:46:45 +08:00
committed by GitHub
parent 970b2f9d38
commit f9edf76e7c
2 changed files with 124 additions and 4 deletions

View File

@@ -24,7 +24,7 @@
},
"outputs": [],
"source": [
"!pip install pinecone-client"
"!pip install pinecone-client openai tiktoken"
]
},
{
@@ -70,7 +70,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"id": "aac9563e",
"metadata": {
"tags": []
@@ -85,7 +85,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
@@ -135,13 +135,51 @@
"print(docs[0].page_content)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d46d1452",
"metadata": {},
"source": [
"### Maximal Marginal Relevance Searches\n",
"\n",
"In addition to using similarity search in the retriever object, you can also use `mmr` as retriever.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a359ed74",
"metadata": {},
"outputs": [],
"source": []
"source": [
"retriever = docsearch.as_retriever(search_type=\"mmr\")\n",
"matched_docs = retriever.get_relevant_documents(query)\n",
"for i, d in enumerate(matched_docs):\n",
" print(f\"\\n## Document {i}\\n\")\n",
" print(d.page_content)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "7c477287",
"metadata": {},
"source": [
"Or use `max_marginal_relevance_search` directly:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ca82740",
"metadata": {},
"outputs": [],
"source": [
"found_docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10)\n",
"for i, doc in enumerate(found_docs):\n",
" print(f\"{i + 1}.\", doc.page_content, \"\\n\")"
]
}
],
"metadata": {