langchain/docs/extras/integrations/providers/google_document_ai.mdx
Leonid Ganeline cb84f612c9
docs: document_transformers consistency (#10467)
- Updated `document_transformers` examples: titles, descriptions, links
- Added `integrations/providers` for missed document_transformers
2023-09-30 16:36:23 -07:00

29 lines
1.0 KiB
Plaintext

# Google Document AI
>[Document AI](https://cloud.google.com/document-ai/docs/overview) is a `Google Cloud Platform`
> service to transform unstructured data from documents into structured data, making it easier
> to understand, analyze, and consume.
## Installation and Setup
You need to set up a [`GCS` bucket and create your own OCR processor](https://cloud.google.com/document-ai/docs/create-processor)
The `GCS_OUTPUT_PATH` should be a path to a folder on GCS (starting with `gs://`)
and a processor name should look like `projects/PROJECT_NUMBER/locations/LOCATION/processors/PROCESSOR_ID`.
You can get it either programmatically or copy from the `Prediction endpoint` section of the `Processor details`
tab in the Google Cloud Console.
```bash
pip install google-cloud-documentai
pip install google-cloud-documentai-toolbox
```
## Document Transformer
See a [usage example](/docs/integrations/document_transformers/docai).
```python
from langchain.document_loaders.blob_loaders import Blob
from langchain.document_loaders.parsers import DocAIParser
```