docs: fix pdfloaders' descriptions at https://python.langchain.com/docs/integrations/document_loaders/ All document loaders section (#31371)

…cs/integrations/document_loaders/ All document loaders section

Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
  - Example: "core: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
This commit is contained in:
Michael Li 2025-05-28 05:56:55 +10:00 committed by GitHub
parent 0478f544d5
commit 9bd956598d
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 3 additions and 4 deletions

View File

@ -6,8 +6,6 @@
"source": [
"# UnstructuredPDFLoader\n",
"\n",
"## Overview\n",
"\n",
"[Unstructured](https://unstructured-io.github.io/unstructured/) supports a common interface for working with unstructured or semi-structured file formats, such as Markdown or PDF. LangChain's [UnstructuredPDFLoader](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.pdf.UnstructuredPDFLoader.html) integrates with Unstructured to parse PDF documents into LangChain [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) objects.\n",
"\n",
"Please see [this page](/docs/integrations/providers/unstructured/) for more information on installing system requirements.\n",
@ -34,7 +32,9 @@
{
"cell_type": "markdown",
"metadata": {},
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
"source": [
"To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
]
},
{
"cell_type": "code",

View File

@ -6,7 +6,6 @@
"source": [
"# ZeroxPDFLoader\n",
"\n",
"## Overview\n",
"`ZeroxPDFLoader` is a document loader that leverages the [Zerox](https://github.com/getomni-ai/zerox) library. Zerox converts PDF documents into images, processes them using a vision-capable language model, and generates a structured Markdown representation. This loader allows for asynchronous operations and provides page-level document extraction.\n",
"\n",
"### Integration details\n",