mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-30 03:28:40 +00:00
- Updated `document_transformers` examples: titles, descriptions, links - Added `integrations/providers` for missed document_transformers
29 lines
1.0 KiB
Plaintext
29 lines
1.0 KiB
Plaintext
# Google Document AI
|
|
|
|
>[Document AI](https://cloud.google.com/document-ai/docs/overview) is a `Google Cloud Platform`
|
|
> service to transform unstructured data from documents into structured data, making it easier
|
|
> to understand, analyze, and consume.
|
|
|
|
|
|
## Installation and Setup
|
|
|
|
You need to set up a [`GCS` bucket and create your own OCR processor](https://cloud.google.com/document-ai/docs/create-processor)
|
|
The `GCS_OUTPUT_PATH` should be a path to a folder on GCS (starting with `gs://`)
|
|
and a processor name should look like `projects/PROJECT_NUMBER/locations/LOCATION/processors/PROCESSOR_ID`.
|
|
You can get it either programmatically or copy from the `Prediction endpoint` section of the `Processor details`
|
|
tab in the Google Cloud Console.
|
|
|
|
```bash
|
|
pip install google-cloud-documentai
|
|
pip install google-cloud-documentai-toolbox
|
|
```
|
|
|
|
## Document Transformer
|
|
|
|
See a [usage example](/docs/integrations/document_transformers/docai).
|
|
|
|
```python
|
|
from langchain.document_loaders.blob_loaders import Blob
|
|
from langchain.document_loaders.parsers import DocAIParser
|
|
```
|