From 9bd956598dd9508e02813322006ad9374fda58b6 Mon Sep 17 00:00:00 2001
From: Michael Li <michaelli65535@gmail.com>
Date: Wed, 28 May 2025 05:56:55 +1000
Subject: [PATCH] docs: fix pdfloaders' descriptions at
 https://python.langchain.com/docs/integrations/document_loaders/ All document
 loaders section (#31371)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

…cs/integrations/document_loaders/ All document loaders section

Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
  - Example: "core: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
---
 .../document_loaders/unstructured_pdfloader.ipynb           | 6 +++---
 .../docs/integrations/document_loaders/zeroxpdfloader.ipynb | 1 -
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/docs/docs/integrations/document_loaders/unstructured_pdfloader.ipynb b/docs/docs/integrations/document_loaders/unstructured_pdfloader.ipynb
index 60cb109496b..d22fb7b25b5 100644
--- a/docs/docs/integrations/document_loaders/unstructured_pdfloader.ipynb
+++ b/docs/docs/integrations/document_loaders/unstructured_pdfloader.ipynb
@@ -6,8 +6,6 @@
       "source": [
         "# UnstructuredPDFLoader\n",
         "\n",
-        "## Overview\n",
-        "\n",
         "[Unstructured](https://unstructured-io.github.io/unstructured/) supports a common interface for working with unstructured or semi-structured file formats, such as Markdown or PDF. LangChain's [UnstructuredPDFLoader](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.pdf.UnstructuredPDFLoader.html) integrates with Unstructured to parse PDF documents into LangChain [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) objects.\n",
         "\n",
         "Please see [this page](/docs/integrations/providers/unstructured/) for more information on installing system requirements.\n",
@@ -34,7 +32,9 @@
     {
       "cell_type": "markdown",
       "metadata": {},
-      "source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
+      "source": [
+        "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
+      ]
     },
     {
       "cell_type": "code",
diff --git a/docs/docs/integrations/document_loaders/zeroxpdfloader.ipynb b/docs/docs/integrations/document_loaders/zeroxpdfloader.ipynb
index ffaf82e6897..5be3611829c 100644
--- a/docs/docs/integrations/document_loaders/zeroxpdfloader.ipynb
+++ b/docs/docs/integrations/document_loaders/zeroxpdfloader.ipynb
@@ -6,7 +6,6 @@
    "source": [
     "# ZeroxPDFLoader\n",
     "\n",
-    "## Overview\n",
     "`ZeroxPDFLoader` is a document loader that leverages the [Zerox](https://github.com/getomni-ai/zerox) library. Zerox converts PDF documents into images, processes them using a vision-capable language model, and generates a structured Markdown representation. This loader allows for asynchronous operations and provides page-level document extraction.\n",
     "\n",
     "### Integration details\n",