Files
langchain/tests/integration_tests/document_loaders
Matt Robinson 3c489be773 feat: optional post-processing for Unstructured loaders (#7850)
### Summary

Adds a post-processing method for Unstructured loaders that allows users
to optionally modify or clean extracted elements.

### Testing

```python
from langchain.document_loaders import UnstructuredFileLoader
from unstructured.cleaners.core import clean_extra_whitespace

loader = UnstructuredFileLoader(
    "./example_data/layout-parser-paper.pdf",
    mode="elements",
    post_processors=[clean_extra_whitespace],
)

docs = loader.load()
docs[:5]
```


### Reviewrs
  - @rlancemartin
  - @eyurtsev
  - @hwchase17
2023-07-17 12:13:05 -07:00
..
2023-04-26 21:04:56 -07:00
2023-03-28 08:17:22 -07:00
2023-03-28 08:38:19 -07:00
2023-03-27 19:51:34 -07:00
2023-06-07 21:32:23 -07:00
2023-03-28 15:28:33 -07:00
2023-05-05 14:48:13 -07:00
2023-06-01 00:54:42 -07:00
2023-05-14 22:04:38 -07:00
2023-05-01 20:56:56 -07:00