langchain/docs/modules/indexes
Matt Robinson a97e4252e3
feat: add UnstructuredExcelLoader for .xlsx and .xls files (#5617)
# Unstructured Excel Loader

Adds an `UnstructuredExcelLoader` class for `.xlsx` and `.xls` files.
Works with `unstructured>=0.6.7`. A plain text representation of the
Excel file will be available under the `page_content` attribute in the
doc. If you use the loader in `"elements"` mode, an HTML representation
of the Excel file will be available under the `text_as_html` metadata
key. Each sheet in the Excel document is its own document.

### Testing

```python
from langchain.document_loaders import UnstructuredExcelLoader

loader = UnstructuredExcelLoader(
    "example_data/stanley-cups.xlsx",
    mode="elements"
)
docs = loader.load()
```

## Who can review?

@hwchase17
@eyurtsev
2023-06-03 12:44:12 -07:00
..
document_loaders/examples feat: add UnstructuredExcelLoader for .xlsx and .xls files (#5617) 2023-06-03 12:44:12 -07:00
retrievers/examples Qdrant self query (#5567) 2023-06-01 08:40:31 -07:00
text_splitters code splitter docs (#5480) 2023-05-31 07:11:53 -07:00
vectorstores Es knn index search 5346 (#5569) 2023-06-02 08:40:35 -07:00
document_loaders.rst Documentation fixes (linting and broken links) (#5563) 2023-06-01 13:06:17 -07:00
getting_started.ipynb Update getting_started.ipynb (#4850) 2023-05-17 13:19:14 -07:00
retrievers.rst big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
text_splitters.rst code splitter docs (#5480) 2023-05-31 07:11:53 -07:00
vectorstores.rst big docs refactor (#1978) 2023-03-26 19:49:46 -07:00