langchain/docs/modules/indexes/document_loaders/examples
Chetanya Rastogi 50c511d75f
Add new loader to load pdf as html content (#2607)
Adds a new pdf loader using the existing dependency on PDFMiner. 

The new loader can be helpful for chunking texts semantically into
sections as the output html content can be parsed via `BeautifulSoup` to
get more structured and rich information about font size, page numbers,
pdf headers/footers, etc. which may not be available otherwise with
other pdf loaders
2023-04-09 17:57:25 -07:00
..
example_data Harrison/msg files (#2375) 2023-04-04 06:48:34 -07:00
airbyte_json.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
apify_dataset.ipynb Harrison/apify (#2215) 2023-03-30 20:58:14 -07:00
azlyrics.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
azure_blob_storage_container.ipynb Harrison/site map (#2061) 2023-03-27 16:28:08 -07:00
azure_blob_storage_file.ipynb Harrison/site map (#2061) 2023-03-27 16:28:08 -07:00
bigquery.ipynb Harrison/big query (#2100) 2023-03-28 08:17:22 -07:00
blackboard.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
college_confidential.ipynb Harrison/site map (#2061) 2023-03-27 16:28:08 -07:00
CoNLL-U.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
copypaste.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
csv.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
dataframe.ipynb Harrison/document cleanup (#2062) 2023-03-27 16:32:55 -07:00
directory_loader.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
duckdb.ipynb Harrison/duckdb (#2064) 2023-03-27 19:51:34 -07:00
email.ipynb Harrison/msg files (#2375) 2023-04-04 06:48:34 -07:00
epub.ipynb bump version to 128 (#2236) 2023-03-31 11:16:21 -07:00
evernote.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
facebook_chat.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
figma.ipynb [Documents] Updated Figma docs and added example (#2172) 2023-03-29 22:11:45 -07:00
gcs_directory.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
gcs_file.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
gitbook.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
googledrive.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
gutenberg.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
hn.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
html.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
ifixit.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
image.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
imsdb.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
markdown.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
notebook.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
notion.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
notiondb.ipynb feat: Add Notion database document loader (#2056) 2023-03-28 08:07:09 -07:00
obsidian.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
pdf.ipynb Add new loader to load pdf as html content (#2607) 2023-04-09 17:57:25 -07:00
powerpoint.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
readthedocs_documentation.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
roam.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
s3_directory.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
s3_file.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
sitemap.ipynb docs: tiny fix on docs verbiage (#2124) 2023-03-28 22:56:29 -07:00
srt.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
telegram.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
unstructured_file.ipynb feat: document loader for epublications (#2202) 2023-03-30 20:45:31 -07:00
url.ipynb Introduces SeleniumURLLoader for JavaScript-Dependent Web Page Data Retrieval (#2291) 2023-04-02 14:05:00 -07:00
web_base.ipynb Harrison/site map (#2061) 2023-03-27 16:28:08 -07:00
whatsapp_chat.ipynb Harrison/whatsapp loader (#2085) 2023-03-27 23:43:45 -07:00
word_document.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
youtube.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00