community[fix]: Handle None value in raw_content from Tavily API response (#30021)

## **Description:**

When using the Tavily retriever with include_raw_content=True, the
retriever occasionally fails with a Pydantic ValidationError because
raw_content can be None.

The Document model in langchain_core/documents/base.py requires
page_content to be a non-None value, but the Tavily API sometimes
returns None for raw_content.

This PR fixes the issue by ensuring that even when raw_content is None,
an empty string is used instead:

```python
page_content=result.get("content", "")
            if not self.include_raw_content
            else (result.get("raw_content") or ""),
This commit is contained in:
kawamou 2025-02-28 00:53:53 +09:00 committed by GitHub
parent d0c9b98171
commit 8977ac5ab0
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -123,7 +123,7 @@ class TavilySearchAPIRetriever(BaseRetriever):
Document(
page_content=result.get("content", "")
if not self.include_raw_content
else result.get("raw_content", ""),
else (result.get("raw_content") or ""),
metadata={
"title": result.get("title", ""),
"source": result.get("url", ""),