mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-31 03:59:25 +00:00
community: Google Vertex AI Search now returns the website title as part of the document metadata (#30688)
Google vertex ai search will now return the title of the found website as part of the document metadata, if available. Thank you for contributing to LangChain! - **Description**: Vertex AI Search can be used to index websites and then develop chatbots that use these websites to answer questions. At present, the document metadata includes an `id` and `source` (which is the URL). While the URL is enough to create a link, the ID is not descriptive enough to show users. Therefore, I propose we return `title` as well, when available (e.g., it will not be available in `.txt` documents found during the website indexing). - **Issue**: No bug in particular, but it would be better if this was here. - **Dependencies**: None - I do not use twitter. Format, Lint and Test seem to be all good.
This commit is contained in:
parent
636d831d27
commit
5fb261ce27
@ -167,6 +167,8 @@ class _BaseGoogleVertexAISearchRetriever(BaseModel):
|
||||
doc_metadata = document_dict.get("struct_data", {})
|
||||
doc_metadata["id"] = document_dict["id"]
|
||||
doc_metadata["source"] = derived_struct_data.get("link", "")
|
||||
if derived_struct_data.get("title") is not None:
|
||||
doc_metadata["title"] = derived_struct_data.get("title")
|
||||
|
||||
if chunk_type not in derived_struct_data:
|
||||
continue
|
||||
|
Loading…
Reference in New Issue
Block a user