mirror of
https://github.com/hwchase17/langchain.git
synced 2025-07-17 18:23:59 +00:00
community: fix duplicate content (#28003)
Thank you for reading my first PR! **Description:** Deduplicate content in AzureSearch vectorstore. Currently, by default, the content of the retrieval is placed both in metadata and page_content of a Document. This PR removes the content from metadata, and leaves it in page_content. **Issue:**: Previously, the content was popped from result before metadata was populated. In #25828 , the order was changed which leads to a response with duplicated content. This was not the intention of that PR and seems undesirable. Looking forward to seeing my contribution in the next version! Cheers, Renzo
This commit is contained in:
parent
abaea28417
commit
567dc1e422
@ -1798,7 +1798,9 @@ def _result_to_document(result: Dict) -> Document:
|
||||
fields_metadata = json.loads(result[FIELDS_METADATA])
|
||||
else:
|
||||
fields_metadata = {
|
||||
key: value for key, value in result.items() if key != FIELDS_CONTENT_VECTOR
|
||||
key: value
|
||||
for key, value in result.items()
|
||||
if key not in [FIELDS_CONTENT_VECTOR, FIELDS_CONTENT]
|
||||
}
|
||||
# IDs
|
||||
if FIELDS_ID in result:
|
||||
@ -1806,7 +1808,7 @@ def _result_to_document(result: Dict) -> Document:
|
||||
else:
|
||||
fields_id = {}
|
||||
return Document(
|
||||
page_content=result.pop(FIELDS_CONTENT),
|
||||
page_content=result[FIELDS_CONTENT],
|
||||
metadata={
|
||||
**fields_id,
|
||||
**fields_metadata,
|
||||
|
Loading…
Reference in New Issue
Block a user