fix: Imports for the ConfluenceLoader:process_page (#9432)

### Description When we're loading documents using `ConfluenceLoader`:`load` function and, if both `include_comments=True` and `keep_markdown_format=True`, we're getting an error saying `NameError: free variable 'BeautifulSoup' referenced before assignment in enclosing scope`. loader = ConfluenceLoader(url="URI", token="TOKEN") documents = loader.load( space_key="SPACE", include_comments=True, keep_markdown_format=True, ) This happens because previous imports only consider the `keep_markdown_format` parameter, however to include the comments, it's using `BeautifulSoup` Now it's fixed to handle all four scenarios considering both `include_comments` and `keep_markdown_format`. ### Twitter `@SathinduGA` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
2025-07-03 03:38:06 +00:00 · 2023-08-21 16:44:52 -04:00 · 2023-08-21 16:44:52 -04:00 · 652c542b2f
commit 652c542b2f
parent 7c0b1b8171
1 changed files with 1 additions and 2 deletions
--- a/libs/langchain/langchain/document_loaders/confluence.py
+++ b/libs/langchain/langchain/document_loaders/confluence.py
@ -460,7 +460,7 @@ class ConfluenceLoader(BaseLoader):
                    "`markdownify` package not found, please run "
                    "`pip install markdownify`"
                )
-        else:
+        if include_comments or not keep_markdown_format:
            try:
                from bs4 import BeautifulSoup  # type: ignore
            except ImportError:
@ -468,7 +468,6 @@ class ConfluenceLoader(BaseLoader):
                    "`beautifulsoup4` package not found, please run "
                    "`pip install beautifulsoup4`"
                )
-
        if include_attachments:
            attachment_texts = self.process_attachment(page["id"], ocr_languages)
        else: