fix: Imports for the ConfluenceLoader:process_page (#9432)

### Description
When we're loading documents using `ConfluenceLoader`:`load` function
and, if both `include_comments=True` and `keep_markdown_format=True`,
we're getting an error saying `NameError: free variable 'BeautifulSoup'
referenced before assignment in enclosing scope`.
    
    loader = ConfluenceLoader(url="URI", token="TOKEN")
    documents = loader.load(
        space_key="SPACE", 
        include_comments=True, 
        keep_markdown_format=True, 
    )

This happens because previous imports only consider the
`keep_markdown_format` parameter, however to include the comments, it's
using `BeautifulSoup`

Now it's fixed to handle all four scenarios considering both
`include_comments` and `keep_markdown_format`.

### Twitter
`@SathinduGA`

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
Sathindu 2023-08-21 16:44:52 -04:00 committed by GitHub
parent 7c0b1b8171
commit 652c542b2f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -460,7 +460,7 @@ class ConfluenceLoader(BaseLoader):
"`markdownify` package not found, please run " "`markdownify` package not found, please run "
"`pip install markdownify`" "`pip install markdownify`"
) )
else: if include_comments or not keep_markdown_format:
try: try:
from bs4 import BeautifulSoup # type: ignore from bs4 import BeautifulSoup # type: ignore
except ImportError: except ImportError:
@ -468,7 +468,6 @@ class ConfluenceLoader(BaseLoader):
"`beautifulsoup4` package not found, please run " "`beautifulsoup4` package not found, please run "
"`pip install beautifulsoup4`" "`pip install beautifulsoup4`"
) )
if include_attachments: if include_attachments:
attachment_texts = self.process_attachment(page["id"], ocr_languages) attachment_texts = self.process_attachment(page["id"], ocr_languages)
else: else: