Matt Robinson
3c489be773
feat: optional post-processing for Unstructured loaders (#7850)
### Summary
Adds a post-processing method for Unstructured loaders that allows users
to optionally modify or clean extracted elements.
### Testing
```python
from langchain.document_loaders import UnstructuredFileLoader
from unstructured.cleaners.core import clean_extra_whitespace
loader = UnstructuredFileLoader(
"./example_data/layout-parser-paper.pdf",
mode="elements",
post_processors=[clean_extra_whitespace],
)
docs = loader.load()
docs[:5]
```
### Reviewrs
- @rlancemartin
- @eyurtsev
- @hwchase17
2023-07-17 12:13:05 -07:00
..
2023-07-12 16:20:08 -04:00
2023-07-02 12:13:04 -07:00
2023-06-20 14:06:50 -07:00
2023-07-11 22:05:14 -04:00
2023-07-13 13:18:28 -07:00
2023-07-05 15:18:12 -07:00
2023-07-11 22:05:14 -04:00
2023-06-20 14:06:50 -07:00
2023-07-01 06:09:26 -07:00
2023-07-11 22:05:14 -04:00
2023-07-11 22:05:14 -04:00
2023-06-21 09:53:31 -07:00
2023-06-27 23:08:05 -07:00
2023-07-11 22:05:14 -04:00
2023-07-11 22:05:14 -04:00
2023-06-21 09:53:31 -07:00
2023-07-11 22:05:14 -04:00
2023-07-11 22:05:14 -04:00
2023-07-11 22:05:14 -04:00
2023-06-20 14:06:50 -07:00
2023-07-13 21:55:20 -07:00
2023-06-20 14:06:50 -07:00
2023-07-14 07:58:13 -07:00
2023-07-11 22:05:14 -04:00
2023-07-11 22:05:14 -04:00
2023-06-21 09:53:31 -07:00
2023-06-21 09:53:31 -07:00
2023-07-11 22:05:14 -04:00
2023-07-11 22:05:14 -04:00
2023-07-10 03:07:10 -04:00
2023-07-17 12:13:05 -07:00
2023-06-18 17:47:00 -07:00
2023-07-11 22:05:14 -04:00
2023-07-10 04:24:47 -04:00