langchain/libs
shimajiroxyz 3e835a1aa1
langchain: add id_key option to EnsembleRetriever for metadata-based document merging (#22950)
**Description:**
- What I changed
- By specifying the `id_key` during the initialization of
`EnsembleRetriever`, it is now possible to determine which documents to
merge scores for based on the value corresponding to the `id_key`
element in the metadata, instead of `page_content`. Below is an example
of how to use the modified `EnsembleRetriever`:
    ```python
retriever = EnsembleRetriever(retrievers=[ret1, ret2], id_key="id") #
The Document returned by each retriever must keep the "id" key in its
metadata.
    ```

- Additionally, I added a script to easily test the behavior of the
`invoke` method of the modified `EnsembleRetriever`.

- Why I changed
- There are cases where you may want to calculate scores by treating
Documents with different `page_content` as the same when using
`EnsembleRetriever`. For example, when you want to ensemble the search
results of the same document described in two different languages.
- The previous `EnsembleRetriever` used `page_content` as the basis for
score aggregation, making the above usage difficult. Therefore, the
score is now calculated based on the specified key value in the
Document's metadata.

**Twitter handle:** @shimajiroxyz
2024-06-18 03:29:17 +00:00
..
cli cli[minor]: remove redefined DEFAULT_GIT_REF (#21471) 2024-06-14 15:49:15 -07:00
community docs: Standardize DocumentLoader docstrings (#22932) 2024-06-18 03:26:36 +00:00
core Include "no escape" and "inverted section" mustache vars in Prompt.input_variables and Prompt.input_schema (#22981) 2024-06-17 19:24:13 -07:00
experimental Improve llm graph transformer docstring (#22939) 2024-06-15 15:33:26 -04:00
langchain langchain: add id_key option to EnsembleRetriever for metadata-based document merging (#22950) 2024-06-18 03:29:17 +00:00
partners standard-tests[patch]: Update chat model standard tests (#22378) 2024-06-17 13:37:41 -07:00
standard-tests standard-tests[patch]: Update chat model standard tests (#22378) 2024-06-17 13:37:41 -07:00
text-splitters text-splitters[patch]: Fix HTMLSectionSplitter (#22812) 2024-06-14 22:40:39 +00:00