langchain[minor], community[minor]: add CrossEncoderReranker with HuggingFaceCrossEncoder and SagemakerEndpointCrossEncoder (#13687)

- **Description:** Support reranking based on cross encoder models
available from HuggingFace.
      - Added `CrossEncoder` schema
- Implemented `HuggingFaceCrossEncoder` and
`SagemakerEndpointCrossEncoder`
- Implemented `CrossEncoderReranker` that performs similar functionality
to `CohereRerank`
- Added `cross-encoder-reranker.ipynb` to demonstrate how to use it.
Please let me know if anything else needs to be done to make it visible
on the table-of-contents navigation bar on the left, or on the card list
on [retrievers documentation
page](https://python.langchain.com/docs/integrations/retrievers).
  - **Issue:** N/A
  - **Dependencies:** None other than the existing ones.

---------

Co-authored-by: Kenny Choe <kchoe@amazon.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
Kenneth Choe
2024-03-31 15:51:31 -05:00
committed by GitHub
parent 3f7da03dd8
commit f98d7f7494
11 changed files with 660 additions and 0 deletions

View File

@@ -0,0 +1 @@
"""Test cross encoder integrations."""

View File

@@ -0,0 +1,22 @@
"""Test huggingface cross encoders."""
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
def _assert(encoder: HuggingFaceCrossEncoder) -> None:
query = "I love you"
texts = ["I love you", "I like you", "I don't like you", "I hate you"]
output = encoder.score([(query, text) for text in texts])
for i in range(len(texts) - 1):
assert output[i] > output[i + 1]
def test_huggingface_cross_encoder() -> None:
encoder = HuggingFaceCrossEncoder()
_assert(encoder)
def test_huggingface_cross_encoder_with_designated_model_name() -> None:
encoder = HuggingFaceCrossEncoder(model_name="cross-encoder/ms-marco-MiniLM-L-6-v2")
_assert(encoder)