docs: [Retrieval > .. > PDF] update package installation instructions for Unstructured and PDFMiner (#20723)

**Description:** Adds the command to install packages required before
using _Unstructured_ and _PDFMiner_ from `langchain.community`
**Documentation Page Being Updated:** [LangChain > Retrieval > Document
loaders > PDF > Using
Unstructured](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/#using-unstructured)
**Issue:** #20719 
**Dependencies:** no dependencies
**Twitter handle:** SalikaDave

<!--
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17. -->

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
This commit is contained in:
Salika Dave 2024-04-24 18:24:11 -04:00 committed by GitHub
parent a9e2e98708
commit 6353991498
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -129,6 +129,11 @@ data = loader.load()
## Using Unstructured
The `unstructured[all-docs]` package currently supports loading of text files, powerpoints, html, pdfs, images, and more.
```bash
pip install unstructured[pdf]
```
```python
from langchain_community.document_loaders import UnstructuredPDFLoader
@ -225,6 +230,11 @@ data = loader.load()
## Using PDFMiner
PDFMiner is a tool that can help with extracting information and analyzing data from PDF documents.
```bash
pip install pdfminer.six
```
```python
from langchain_community.document_loaders import PDFMinerLoader