From 138f360b250419af78ca3911bc62509a1cc52a69 Mon Sep 17 00:00:00 2001 From: Zapiron <125368863+DangerousPotential@users.noreply.github.com> Date: Fri, 8 Nov 2024 22:38:32 +0800 Subject: [PATCH] docs: fix typo in PDF loader guide (#27977) Fixed duplicate "py" in hyperlink to `pypdf` docs --- docs/docs/how_to/document_loader_pdf.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/docs/how_to/document_loader_pdf.ipynb b/docs/docs/how_to/document_loader_pdf.ipynb index 83ccefdb47c..f13edbc99db 100644 --- a/docs/docs/how_to/document_loader_pdf.ipynb +++ b/docs/docs/how_to/document_loader_pdf.ipynb @@ -48,7 +48,7 @@ "\n", "## Simple and fast text extraction\n", "\n", - "If you are looking for a simple string representation of text that is embedded in a PDF, the method below is appropriate. It will return a list of [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) objects-- one per page-- containing a single string of the page's text in the Document's `page_content` attribute. It will not parse text in images or scanned PDF pages. Under the hood it uses the [pypydf](https://pypdf.readthedocs.io/en/stable/) Python library.\n", + "If you are looking for a simple string representation of text that is embedded in a PDF, the method below is appropriate. It will return a list of [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) objects-- one per page-- containing a single string of the page's text in the Document's `page_content` attribute. It will not parse text in images or scanned PDF pages. Under the hood it uses the [pypdf](https://pypdf.readthedocs.io/en/stable/) Python library.\n", "\n", "LangChain [document loaders](/docs/concepts/document_loaders) implement `lazy_load` and its async variant, `alazy_load`, which return iterators of `Document` objects. We will use these below." ]