langchain/docs/extras
Matt Robinson 3c489be773
feat: optional post-processing for Unstructured loaders (#7850)
### Summary

Adds a post-processing method for Unstructured loaders that allows users
to optionally modify or clean extracted elements.

### Testing

```python
from langchain.document_loaders import UnstructuredFileLoader
from unstructured.cleaners.core import clean_extra_whitespace

loader = UnstructuredFileLoader(
    "./example_data/layout-parser-paper.pdf",
    mode="elements",
    post_processors=[clean_extra_whitespace],
)

docs = loader.load()
docs[:5]
```


### Reviewrs
  - @rlancemartin
  - @eyurtsev
  - @hwchase17
2023-07-17 12:13:05 -07:00
..
_templates Doc refactor (#6300) 2023-06-16 11:52:56 -07:00
additional_resources mv tutorials (#7614) 2023-07-12 17:33:36 -04:00
ecosystem Support Redis Sentinel database connections (#5196) 2023-07-17 07:18:51 -07:00
guides minor langsmith notebook fixes (#7814) 2023-07-16 21:27:03 -07:00
modules feat: optional post-processing for Unstructured loaders (#7850) 2023-07-17 12:13:05 -07:00
use_cases Fix ntbk link in docs (#7755) 2023-07-15 09:11:18 -07:00