mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-07 16:18:09 +00:00
MHTML is a very interesting format since it's used both for emails but also for archived webpages. Some scraping projects want to store pages in disk to process them later, mhtml is perfect for that use case. This is heavily inspired from the beautifulsoup html loader, but extracting the html part from the mhtml file. --------- Co-authored-by: rlm <pexpresss31@gmail.com> |
||
---|---|---|
.. | ||
document_loaders/integrations | ||
document_transformers/text_splitters | ||
retrievers | ||
text_embedding/integrations | ||
vectorstores/integrations |