mirror of
https://github.com/hwchase17/langchain.git
synced 2025-07-16 09:48:04 +00:00
community: Fix ConfluenceLoader load() failure caused by deleted pages (#29232)
## Description This PR modifies the is_public_page function in ConfluenceLoader to prevent exceptions caused by deleted pages during the execution of ConfluenceLoader.process_pages(). **Example scenario:** Consider the following usage of ConfluenceLoader: ```python import os from langchain_community.document_loaders import ConfluenceLoader loader = ConfluenceLoader( url=os.getenv("BASE_URL"), token=os.getenv("TOKEN"), max_pages=1000, cql=f'type=page and lastmodified >= "2020-01-01 00:00"', include_restricted_content=False, ) # Raised Exception : HTTPError: Outdated version/old_draft/trashed? Cannot find content Please provide valid ContentId. documents = loader.load() ``` If a deleted page exists within the query result, the is_public_page function would previously raise an exception when calling get_all_restrictions_for_content, causing the loader.load() process to fail for all pages. By adding a pre-check for the page's "current" status, unnecessary API calls to get_all_restrictions_for_content for non-current pages are avoided. This fix ensures that such pages are skipped without affecting the rest of the loading process. ## Issue N/A (No specific issue number) ## Dependencies No new dependencies are introduced with this change. ## Twitter handle [@zenoengine](https://x.com/zenoengine)
This commit is contained in:
parent
21eb39dff0
commit
05554265b4
@ -523,11 +523,14 @@ class ConfluenceLoader(BaseLoader):
|
||||
|
||||
def is_public_page(self, page: dict) -> bool:
|
||||
"""Check if a page is publicly accessible."""
|
||||
|
||||
if page["status"] != "current":
|
||||
return False
|
||||
|
||||
restrictions = self.confluence.get_all_restrictions_for_content(page["id"])
|
||||
|
||||
return (
|
||||
page["status"] == "current"
|
||||
and not restrictions["read"]["restrictions"]["user"]["results"]
|
||||
not restrictions["read"]["restrictions"]["user"]["results"]
|
||||
and not restrictions["read"]["restrictions"]["group"]["results"]
|
||||
)
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user