mirror of
https://github.com/hwchase17/langchain.git
synced 2026-01-05 07:55:18 +00:00
### Summary
Adds a post-processing method for Unstructured loaders that allows users
to optionally modify or clean extracted elements.
### Testing
```python
from langchain.document_loaders import UnstructuredFileLoader
from unstructured.cleaners.core import clean_extra_whitespace
loader = UnstructuredFileLoader(
"./example_data/layout-parser-paper.pdf",
mode="elements",
post_processors=[clean_extra_whitespace],
)
docs = loader.load()
docs[:5]
```
### Reviewrs
- @rlancemartin
- @eyurtsev
- @hwchase17