feat: add loader for open office odt files (#4405)

# ODF File Loader

Adds a data loader for handling Open Office ODT files. Requires
`unstructured>=0.6.3`.

### Testing

The following should work using the `fake.odt` example doc from the
[`unstructured` repo](https://github.com/Unstructured-IO/unstructured).

```python
from langchain.document_loaders import UnstructuredODTLoader

loader = UnstructuredODTLoader(file_path="fake.odt", mode="elements")
loader.load()

loader = UnstructuredODTLoader(file_path="fake.odt", mode="single")
loader.load()
```
This commit is contained in:
Matt Robinson
2023-05-10 04:37:17 -04:00
committed by GitHub
parent 65f85af242
commit 3637d6da6e
7 changed files with 121 additions and 0 deletions

View File

@@ -0,0 +1,12 @@
from pathlib import Path
from langchain.document_loaders import UnstructuredODTLoader
def test_unstructured_odt_loader() -> None:
"""Test unstructured loader."""
file_path = Path(__file__).parent.parent / "examples/fake.odt"
loader = UnstructuredODTLoader(str(file_path))
docs = loader.load()
assert len(docs) == 1

Binary file not shown.