mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-14 02:48:54 +00:00
community: Google Vertex AI Search now returns the website title as part of the document metadata (#30688)
Google vertex ai search will now return the title of the found website as part of the document metadata, if available. Thank you for contributing to LangChain! - **Description**: Vertex AI Search can be used to index websites and then develop chatbots that use these websites to answer questions. At present, the document metadata includes an `id` and `source` (which is the URL). While the URL is enough to create a link, the ID is not descriptive enough to show users. Therefore, I propose we return `title` as well, when available (e.g., it will not be available in `.txt` documents found during the website indexing). - **Issue**: No bug in particular, but it would be better if this was here. - **Dependencies**: None - I do not use twitter. Format, Lint and Test seem to be all good.
This commit is contained in:
parent
636d831d27
commit
5fb261ce27
@ -167,6 +167,8 @@ class _BaseGoogleVertexAISearchRetriever(BaseModel):
|
|||||||
doc_metadata = document_dict.get("struct_data", {})
|
doc_metadata = document_dict.get("struct_data", {})
|
||||||
doc_metadata["id"] = document_dict["id"]
|
doc_metadata["id"] = document_dict["id"]
|
||||||
doc_metadata["source"] = derived_struct_data.get("link", "")
|
doc_metadata["source"] = derived_struct_data.get("link", "")
|
||||||
|
if derived_struct_data.get("title") is not None:
|
||||||
|
doc_metadata["title"] = derived_struct_data.get("title")
|
||||||
|
|
||||||
if chunk_type not in derived_struct_data:
|
if chunk_type not in derived_struct_data:
|
||||||
continue
|
continue
|
||||||
|
Loading…
Reference in New Issue
Block a user