community: add support for callable filters in FAISS (#16190)

- **Description:**
Filtering in a FAISS vectorstores is very inflexible and doesn't allow
that many use case. I think supporting callable like this enables a lot:
regular expressions, condition on multiple keys etc. **Note** I had to
manually alter a test. I don't understand if it was falty to begin with
or if there is something funky going on.
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** None

Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
This commit is contained in:
thiswillbeyourgithub
2024-01-30 05:05:56 +01:00
committed by GitHub
parent 1703fe2361
commit 1d082359ee
3 changed files with 116 additions and 37 deletions

View File

@@ -416,7 +416,7 @@
"metadata": {},
"source": [
"## Similarity Search with filtering\n",
"FAISS vectorstore can also support filtering, since the FAISS does not natively support filtering we have to do it manually. This is done by first fetching more results than `k` and then filtering them. You can filter the documents based on metadata. You can also set the `fetch_k` parameter when calling any search method to set how many documents you want to fetch before filtering. Here is a small example:"
"FAISS vectorstore can also support filtering, since the FAISS does not natively support filtering we have to do it manually. This is done by first fetching more results than `k` and then filtering them. This filter is either a callble that takes as input a metadata dict and returns a bool, or a metadata dict where each missing key is ignored and each present k must be in a list of values. You can also set the `fetch_k` parameter when calling any search method to set how many documents you want to fetch before filtering. Here is a small example:"
]
},
{
@@ -480,6 +480,8 @@
],
"source": [
"results_with_scores = db.similarity_search_with_score(\"foo\", filter=dict(page=1))\n",
"# Or with a callable:\n",
"# results_with_scores = db.similarity_search_with_score(\"foo\", filter=lambda d: d[\"page\"] == 1)\n",
"for doc, score in results_with_scores:\n",
" print(f\"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}\")"
]