mirror of
https://github.com/hwchase17/langchain.git
synced 2025-08-24 20:12:11 +00:00
docs: fix typo in PDF loader guide (#27977)
Fixed duplicate "py" in hyperlink to `pypdf` docs
This commit is contained in:
parent
b509747c7f
commit
138f360b25
@ -48,7 +48,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"## Simple and fast text extraction\n",
|
"## Simple and fast text extraction\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If you are looking for a simple string representation of text that is embedded in a PDF, the method below is appropriate. It will return a list of [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) objects-- one per page-- containing a single string of the page's text in the Document's `page_content` attribute. It will not parse text in images or scanned PDF pages. Under the hood it uses the [pypydf](https://pypdf.readthedocs.io/en/stable/) Python library.\n",
|
"If you are looking for a simple string representation of text that is embedded in a PDF, the method below is appropriate. It will return a list of [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) objects-- one per page-- containing a single string of the page's text in the Document's `page_content` attribute. It will not parse text in images or scanned PDF pages. Under the hood it uses the [pypdf](https://pypdf.readthedocs.io/en/stable/) Python library.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"LangChain [document loaders](/docs/concepts/document_loaders) implement `lazy_load` and its async variant, `alazy_load`, which return iterators of `Document` objects. We will use these below."
|
"LangChain [document loaders](/docs/concepts/document_loaders) implement `lazy_load` and its async variant, `alazy_load`, which return iterators of `Document` objects. We will use these below."
|
||||||
]
|
]
|
||||||
|
Loading…
Reference in New Issue
Block a user