docs: fix typo in PDF loader guide (#27977)

Fixed duplicate "py" in hyperlink to `pypdf` docs
This commit is contained in:
Zapiron 2024-11-08 22:38:32 +08:00 committed by GitHub
parent b509747c7f
commit 138f360b25
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -48,7 +48,7 @@
"\n",
"## Simple and fast text extraction\n",
"\n",
"If you are looking for a simple string representation of text that is embedded in a PDF, the method below is appropriate. It will return a list of [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) objects-- one per page-- containing a single string of the page's text in the Document's `page_content` attribute. It will not parse text in images or scanned PDF pages. Under the hood it uses the [pypydf](https://pypdf.readthedocs.io/en/stable/) Python library.\n",
"If you are looking for a simple string representation of text that is embedded in a PDF, the method below is appropriate. It will return a list of [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) objects-- one per page-- containing a single string of the page's text in the Document's `page_content` attribute. It will not parse text in images or scanned PDF pages. Under the hood it uses the [pypdf](https://pypdf.readthedocs.io/en/stable/) Python library.\n",
"\n",
"LangChain [document loaders](/docs/concepts/document_loaders) implement `lazy_load` and its async variant, `alazy_load`, which return iterators of `Document` objects. We will use these below."
]