## Description
This pull request introduces the `DocumentLoaderAsParser` class, which
acts as an adapter to transform document loaders into parsers within the
LangChain framework. The class enables document loaders that accept a
`file_path` parameter to be utilized as blob parsers. This is
particularly useful for integrating various document loading
capabilities seamlessly into the LangChain ecosystem.
When merged in together with PR
https://github.com/langchain-ai/langchain/pull/27716 It opens options
for `SharePointLoader` / `OneDriveLoader` to process any filetype that
has a document loader.
### Features
- **Flexible Parsing**: The `DocumentLoaderAsParser` class can adapt any
document loader that meets the criteria of accepting a `file_path`
argument, allowing for lazy parsing of documents.
- **Compatibility**: The class has been designed to work with various
document loaders, making it versatile for different use cases.
### Usage Example
To use the `DocumentLoaderAsParser`, you would initialize it with a
suitable document loader class and any required parameters. Here’s an
example of how to do this with the `UnstructuredExcelLoader`:
```python
from langchain_community.document_loaders.blob_loaders import Blob
from langchain_community.document_loaders.parsers.documentloader_adapter import DocumentLoaderAsParser
from langchain_community.document_loaders.excel import UnstructuredExcelLoader
# Initialize the parser adapter with UnstructuredExcelLoader
xlsx_parser = DocumentLoaderAsParser(UnstructuredExcelLoader, mode="paged")
# Use parser, for ex. pass it to MimeTypeBasedParser
MimeTypeBasedParser(
handlers={
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": xlsx_parser
}
)
```
- **Dependencies:** None
- **Twitter handle:** @martintriska1
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>