Matt Robinson
3c489be773
feat: optional post-processing for Unstructured loaders (#7850)
### Summary
Adds a post-processing method for Unstructured loaders that allows users
to optionally modify or clean extracted elements.
### Testing
```python
from langchain.document_loaders import UnstructuredFileLoader
from unstructured.cleaners.core import clean_extra_whitespace
loader = UnstructuredFileLoader(
"./example_data/layout-parser-paper.pdf",
mode="elements",
post_processors=[clean_extra_whitespace],
)
docs = loader.load()
docs[:5]
```
### Reviewrs
- @rlancemartin
- @eyurtsev
- @hwchase17
2023-07-17 12:13:05 -07:00
..
2023-06-27 15:58:47 -07:00
2023-02-27 20:40:20 -08:00
2023-04-26 21:04:56 -07:00
2023-03-28 08:17:22 -07:00
2023-05-16 13:13:57 -04:00
2023-07-12 16:20:08 -04:00
2023-04-23 15:06:10 -07:00
2023-06-07 19:18:01 -07:00
2023-03-28 08:38:19 -07:00
2023-03-27 19:51:34 -07:00
2023-07-01 06:09:26 -07:00
2023-06-12 19:13:52 -07:00
2023-06-03 12:44:12 -07:00
2023-05-03 15:59:19 -07:00
2023-06-07 21:32:23 -07:00
2023-03-22 19:57:46 -07:00
2023-03-28 15:28:33 -07:00
2023-05-29 20:11:21 -07:00
2023-02-27 20:40:20 -08:00
2023-05-24 12:31:55 -07:00
2023-05-05 14:48:13 -07:00
2023-06-27 15:58:47 -07:00
2023-06-27 23:08:05 -07:00
2023-05-22 16:43:07 -07:00
2023-06-01 00:54:42 -07:00
2023-05-01 20:28:02 -07:00
2023-05-10 01:37:17 -07:00
2023-06-27 16:34:17 -07:00
2023-06-19 22:31:43 -07:00
2023-05-29 20:23:17 -07:00
2023-04-21 10:47:57 -07:00
2023-07-14 07:58:13 -07:00
2023-06-25 12:41:57 -07:00
2023-05-14 22:04:38 -07:00
2023-04-13 21:31:59 -07:00
2023-05-01 20:56:56 -07:00
2023-05-01 20:28:02 -07:00
2023-07-10 03:07:10 -04:00
2023-07-17 12:13:05 -07:00
2023-07-13 17:51:38 -04:00
2023-04-11 21:12:39 -07:00
2023-06-25 12:08:43 -07:00
2023-06-10 16:24:42 -07:00
2023-07-10 04:24:47 -04:00