Fix Document Similarity Check with passed Threshold (#6845)

Converting the Similarity obtained in the
similarity_search_with_score_by_vector method whilst comparing to the
passed
threshold. This is because the passed threshold is a number between 0 to
1 and is already in the relevance_score_fn format.
As of now, the function is comparing two different scoring parameters
and that wouldn't work.

Dependencies
None

Issue:
Different scores being compared in
similarity_search_with_score_by_vector method in FAISS.

Tag maintainer
@hwchase17



<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
Sidchat95 2023-07-13 00:30:47 -05:00 committed by GitHub
parent a08baa97c5
commit c5e50c40c9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -698,6 +698,9 @@ class FAISS(VectorStore):
**kwargs: Any,
) -> List[Tuple[Document, float]]:
"""Return docs and their similarity scores on a scale from 0 to 1."""
# Pop score threshold so that only relevancy scores, not raw scores, are
# filtered.
score_threshold = kwargs.pop("score_threshold", None)
relevance_score_fn = self._select_relevance_score_fn()
if relevance_score_fn is None:
raise ValueError(
@ -711,4 +714,13 @@ class FAISS(VectorStore):
fetch_k=fetch_k,
**kwargs,
)
return [(doc, relevance_score_fn(score)) for doc, score in docs_and_scores]
docs_and_rel_scores = [
(doc, relevance_score_fn(score)) for doc, score in docs_and_scores
]
if score_threshold is not None:
docs_and_rel_scores = [
(doc, similarity)
for doc, similarity in docs_and_rel_scores
if similarity >= score_threshold
]
return docs_and_rel_scores