From 9bd956598dd9508e02813322006ad9374fda58b6 Mon Sep 17 00:00:00 2001 From: Michael Li Date: Wed, 28 May 2025 05:56:55 +1000 Subject: [PATCH] docs: fix pdfloaders' descriptions at https://python.langchain.com/docs/integrations/document_loaders/ All document loaders section (#31371) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit …cs/integrations/document_loaders/ All document loaders section Thank you for contributing to LangChain! - [x] **PR title**: "package: description" - Where "package" is whichever of langchain, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "core: add foobar LLM" - [x] **PR message**: ***Delete this entire checklist*** and replace with - **Description:** a description of the change - **Issue:** the issue # it fixes, if applicable - **Dependencies:** any dependencies required for this change - **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] **Add tests and docs**: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. If no one reviews your PR within a few days, please @-mention one of baskaryan, eyurtsev, ccurme, vbarda, hwchase17. --- .../document_loaders/unstructured_pdfloader.ipynb | 6 +++--- .../docs/integrations/document_loaders/zeroxpdfloader.ipynb | 1 - 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/docs/docs/integrations/document_loaders/unstructured_pdfloader.ipynb b/docs/docs/integrations/document_loaders/unstructured_pdfloader.ipynb index 60cb109496b..d22fb7b25b5 100644 --- a/docs/docs/integrations/document_loaders/unstructured_pdfloader.ipynb +++ b/docs/docs/integrations/document_loaders/unstructured_pdfloader.ipynb @@ -6,8 +6,6 @@ "source": [ "# UnstructuredPDFLoader\n", "\n", - "## Overview\n", - "\n", "[Unstructured](https://unstructured-io.github.io/unstructured/) supports a common interface for working with unstructured or semi-structured file formats, such as Markdown or PDF. LangChain's [UnstructuredPDFLoader](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.pdf.UnstructuredPDFLoader.html) integrates with Unstructured to parse PDF documents into LangChain [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) objects.\n", "\n", "Please see [this page](/docs/integrations/providers/unstructured/) for more information on installing system requirements.\n", @@ -34,7 +32,9 @@ { "cell_type": "markdown", "metadata": {}, - "source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:" + "source": [ + "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:" + ] }, { "cell_type": "code", diff --git a/docs/docs/integrations/document_loaders/zeroxpdfloader.ipynb b/docs/docs/integrations/document_loaders/zeroxpdfloader.ipynb index ffaf82e6897..5be3611829c 100644 --- a/docs/docs/integrations/document_loaders/zeroxpdfloader.ipynb +++ b/docs/docs/integrations/document_loaders/zeroxpdfloader.ipynb @@ -6,7 +6,6 @@ "source": [ "# ZeroxPDFLoader\n", "\n", - "## Overview\n", "`ZeroxPDFLoader` is a document loader that leverages the [Zerox](https://github.com/getomni-ai/zerox) library. Zerox converts PDF documents into images, processes them using a vision-capable language model, and generates a structured Markdown representation. This loader allows for asynchronous operations and provides page-level document extraction.\n", "\n", "### Integration details\n",