Matt Robinson
3c489be773
feat: optional post-processing for Unstructured loaders (#7850)
### Summary
Adds a post-processing method for Unstructured loaders that allows users
to optionally modify or clean extracted elements.
### Testing
```python
from langchain.document_loaders import UnstructuredFileLoader
from unstructured.cleaners.core import clean_extra_whitespace
loader = UnstructuredFileLoader(
"./example_data/layout-parser-paper.pdf",
mode="elements",
post_processors=[clean_extra_whitespace],
)
docs = loader.load()
docs[:5]
```
### Reviewrs
- @rlancemartin
- @eyurtsev
- @hwchase17
2023-07-17 12:13:05 -07:00
..
2023-05-25 14:23:11 -07:00
2023-05-25 19:13:21 -07:00
2023-07-05 11:11:38 -07:00
2023-07-12 16:20:08 -04:00
2023-07-14 02:03:04 -04:00
2023-07-17 12:13:05 -07:00
2023-06-28 23:04:11 -07:00
2023-07-13 02:13:06 -07:00
2023-07-10 03:07:10 -04:00
2023-07-04 10:21:21 -06:00
2023-07-17 07:27:17 -07:00
2023-07-01 13:39:19 -04:00
2023-04-21 09:44:09 -07:00
2023-07-13 23:04:40 -04:00
2023-07-13 02:13:06 -07:00
2023-07-05 16:02:02 -04:00
2023-07-15 09:33:26 -04:00
2022-10-24 14:51:15 -07:00
2023-05-30 07:59:01 -07:00
2023-04-13 21:49:31 -07:00
2023-07-12 23:53:30 -04:00
2023-06-20 22:07:00 -07:00
2023-06-07 21:56:43 -07:00
2023-07-10 02:52:05 -04:00
2023-03-13 23:06:50 -07:00
2023-07-07 16:09:10 -04:00
2023-06-12 13:27:10 -07:00