mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-14 19:42:45 +00:00
`OnlinePDFLoader` and `PagedPDFSplitter` lived separate from the rest of the pdf loaders. Because they're all similar, I propose moving all to `pdy.py` and the same docs/examples page. Additionally, `PagedPDFSplitter` naming doesn't match the pattern the rest of the loaders follow, so I renamed to `PyPDFLoader` and had it inherit from `BasePDFLoader` so it can now load from remote file sources.
68 lines
3.2 KiB
ReStructuredText
68 lines
3.2 KiB
ReStructuredText
How To Guides
|
|
====================================
|
|
|
|
There are a lot of different document loaders that LangChain supports. Below are how-to guides for working with them
|
|
|
|
`File Loader <./examples/unstructured_file.html>`_: A walkthrough of how to use Unstructured to load files of arbitrary types (pdfs, txt, html, etc).
|
|
|
|
`Directory Loader <./examples/directory_loader.html>`_: A walkthrough of how to use Unstructured load files from a given directory.
|
|
|
|
`Notion <./examples/notion.html>`_: A walkthrough of how to load data for an arbitrary Notion DB.
|
|
|
|
`ReadTheDocs <./examples/readthedocs_documentation.html>`_: A walkthrough of how to load data for documentation generated by ReadTheDocs.
|
|
|
|
`HTML <./examples/html.html>`_: A walkthrough of how to load data from an html file.
|
|
|
|
`PDF <./examples/pdf.html>`_: A walkthrough of how to load data from a PDF file.
|
|
|
|
`PowerPoint <./examples/powerpoint.html>`_: A walkthrough of how to load data from a powerpoint file.
|
|
|
|
`Email <./examples/email.html>`_: A walkthrough of how to load data from an email (`.eml`) file.
|
|
|
|
`GoogleDrive <./examples/googledrive.html>`_: A walkthrough of how to load data from Google drive.
|
|
|
|
`Microsoft Word <./examples/microsoft_word.html>`_: A walkthrough of how to load data from Microsoft Word files.
|
|
|
|
`Obsidian <./examples/obsidian.html>`_: A walkthrough of how to load data from an Obsidian file dump.
|
|
|
|
`Roam <./examples/roam.html>`_: A walkthrough of how to load data from a Roam file export.
|
|
|
|
`EverNote <./examples/evernote.html>`_: A walkthrough of how to load data from a EverNote (`.enex`) file.
|
|
|
|
`YouTube <./examples/youtube.html>`_: A walkthrough of how to load the transcript from a YouTube video.
|
|
|
|
`Hacker News <./examples/hn.html>`_: A walkthrough of how to load a Hacker News page.
|
|
|
|
`GitBook <./examples/gitbook.html>`_: A walkthrough of how to load a GitBook page.
|
|
|
|
`s3 File <./examples/s3_file.html>`_: A walkthrough of how to load a file from s3.
|
|
|
|
`s3 Directory <./examples/s3_directory.html>`_: A walkthrough of how to load all files in a directory from s3.
|
|
|
|
`GCS File <./examples/gcs_file.html>`_: A walkthrough of how to load a file from Google Cloud Storage (GCS).
|
|
|
|
`GCS Directory <./examples/gcs_directory.html>`_: A walkthrough of how to load all files in a directory from Google Cloud Storage (GCS).
|
|
|
|
`Web Base <./examples/web_base.html>`_: A walkthrough of how to load all text data from webpages.
|
|
|
|
`IMSDb <./examples/imsdb.html>`_: A walkthrough of how to load all text data from IMSDb webpage.
|
|
|
|
`AZLyrics <./examples/azlyrics.html>`_: A walkthrough of how to load all text data from AZLyrics webpage.
|
|
|
|
`College Confidential <./examples/college_confidential.html>`_: A walkthrough of how to load all text data from College Confidential webpage.
|
|
|
|
`Gutenberg <./examples/gutenberg.html>`_: A walkthrough of how to load data from a Gutenberg ebook text.
|
|
|
|
`Airbyte Json <./examples/airbyte_json.html>`_: A walkthrough of how to load data from a local Airbyte JSON file.
|
|
|
|
`CoNLL-U <./examples/CoNLL-U.html>`_: A walkthrough of how to load data from a ConLL-U file.
|
|
|
|
`iFixit <./examples/ifixit.html>`_: A walkthrough of how to search and load data like guides, technical Q&A's, and device wikis from iFixit.com
|
|
|
|
.. toctree::
|
|
:maxdepth: 1
|
|
:glob:
|
|
:hidden:
|
|
|
|
examples/*
|