Files
langchain/libs
Lorenzo 00a7c31ffd Fix: Nested Dicts Handling of Document Metadata (#9880)
## Description
When the `MultiQueryRetriever` is used to get the list of documents
relevant according to a query, inside a vector store, and at least one
of these contain metadata with nested dictionaries, a `TypeError:
unhashable type: 'dict'` exception is thrown.
This is caused by the `unique_union` function which, to guarantee the
uniqueness of the returned documents, tries, unsuccessfully, to hash the
nested dictionaries and use them as a part of key.
```python
unique_documents_dict = {
    (doc.page_content, tuple(sorted(doc.metadata.items()))): doc
    for doc in documents
}
```

## Issue
#9872 (MultiQueryRetriever (get_relevant_documents) raises TypeError:
unhashable type: 'dict' with dic metadata)

## Solution
A possible solution is to dump the metadata dict to a string and use it
as a part of hashed key.
```python
unique_documents_dict = {
    (doc.page_content, json.dumps(doc.metadata, sort_keys=True)): doc
    for doc in documents
}
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-09-03 15:27:46 -07:00
..
2023-09-03 15:02:58 -07:00