community[patch]: ElasticsearchStore: add relevance function selector (#16378)

Implement similarity function selector for ElasticsearchStore. The
scores coming back from Elasticsearch are already similarities (not
distances) and they are already normalized (see
[docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#dense-vector-params)).
Hence we leave the scores untouched and just forward them.

This fixes #11539.

However, in hybrid mode (when keyword search and vector search are
involved) Elasticsearch currently returns no scores. This PR adds an
error message around this fact. We need to think a bit more to come up
with a solution for this case.

This PR also corrects a small error in the Elasticsearch integration
test.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
This commit is contained in:
Max Jakob
2024-01-22 19:52:20 +01:00
committed by GitHub
parent 54f90fc6bc
commit de209af533
5 changed files with 137 additions and 33 deletions

View File

@@ -0,0 +1,32 @@
"""Test Elasticsearch functionality."""
import pytest
from langchain_community.vectorstores.elasticsearch import (
ApproxRetrievalStrategy,
ElasticsearchStore,
)
from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings
@pytest.mark.requires("elasticsearch")
def test_elasticsearch_hybrid_scores_guard() -> None:
"""Ensure an error is raised when search with score in hybrid mode
because in this case Elasticsearch does not return any score.
"""
from elasticsearch import Elasticsearch
query_string = "foo"
embeddings = FakeEmbeddings()
store = ElasticsearchStore(
index_name="dummy_index",
es_connection=Elasticsearch(hosts=["http://dummy-host:9200"]),
embedding=embeddings,
strategy=ApproxRetrievalStrategy(hybrid=True),
)
with pytest.raises(ValueError):
store.similarity_search_with_score(query_string)
embedded_query = embeddings.embed_query(query_string)
with pytest.raises(ValueError):
store.similarity_search_by_vector_with_relevance_scores(embedded_query)