This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once. This specific part focuses on updating the
PyPDFium2 parser.
For more details, see
https://github.com/langchain-ai/langchain/pull/28970.
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once. This specific part focuses on updating the XXX
parser.
For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once.
This specific part focuses on updating the PyPDF parser.
For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).
* Adds BlobParsers for images. These implementations can take an image
and produce one or more documents per image. This interface can be used
for exposing OCR capabilities.
* Update PyMuPDFParser and Loader to standardize metadata, handle
images, improve table extraction etc.
- **Twitter handle:** pprados
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once.
This specific part focuses to prepare the update of all parsers.
For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- **Refactoring PDF loaders step 1**: "community: Refactoring PDF
loaders to standardize approaches"
- **Description:** Declare CloudBlobLoader in __init__.py. file_path is
Union[str, PurePath] anywhere
- **Twitter handle:** pprados
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once.
This specific part focuses to prepare the update of all parsers.
For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).
@eyurtsev it's the start of a PR series.