From 635399149814596bdfc96fd89389c5cb441e94b6 Mon Sep 17 00:00:00 2001 From: Salika Dave Date: Wed, 24 Apr 2024 18:24:11 -0400 Subject: [PATCH] docs: [Retrieval > .. > PDF] update package installation instructions for Unstructured and PDFMiner (#20723) **Description:** Adds the command to install packages required before using _Unstructured_ and _PDFMiner_ from `langchain.community` **Documentation Page Being Updated:** [LangChain > Retrieval > Document loaders > PDF > Using Unstructured](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/#using-unstructured) **Issue:** #20719 **Dependencies:** no dependencies **Twitter handle:** SalikaDave --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> --- .../modules/data_connection/document_loaders/pdf.mdx | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/docs/modules/data_connection/document_loaders/pdf.mdx b/docs/docs/modules/data_connection/document_loaders/pdf.mdx index 936aafdd89c..ec264f61f3f 100644 --- a/docs/docs/modules/data_connection/document_loaders/pdf.mdx +++ b/docs/docs/modules/data_connection/document_loaders/pdf.mdx @@ -129,6 +129,11 @@ data = loader.load() ## Using Unstructured +The `unstructured[all-docs]` package currently supports loading of text files, powerpoints, html, pdfs, images, and more. + +```bash +pip install unstructured[pdf] +``` ```python from langchain_community.document_loaders import UnstructuredPDFLoader @@ -225,6 +230,11 @@ data = loader.load() ## Using PDFMiner +PDFMiner is a tool that can help with extracting information and analyzing data from PDF documents. + +```bash +pip install pdfminer.six +``` ```python from langchain_community.document_loaders import PDFMinerLoader