langchain/layout-parser-paper-password.pdf at cc/bind

mirror of https://github.com/hwchase17/langchain.git synced 2025-08-09 13:00:34 +00:00

Philippe PRADOS 4efc5093c1

community[minor]: Refactoring PyMuPDF parser, loader and add image blob parsers (#29063 )

* Adds BlobParsers for images. These implementations can take an image
and produce one or more documents per image. This interface can be used
for exposing OCR capabilities.
* Update PyMuPDFParser and Loader to standardize metadata, handle
images, improve table extraction etc.

- **Twitter handle:** pprados

This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once.
This specific part focuses to prepare the update of all parsers.

For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>

2025-01-20 15:15:43 -05:00

4.5 MiB Raw Permalink History

4.5 MiB

Raw Permalink History