langchain/docs/extras/integrations/platforms/google.mdx
Holt Skinner 9f73fec057
fix: Update Google Cloud Enterprise Search to Vertex AI Search (#10513)
- Description: Google Cloud Enterprise Search was renamed to Vertex AI
Search
-
https://cloud.google.com/blog/products/ai-machine-learning/vertex-ai-search-and-conversation-is-now-generally-available
- This PR updates the documentation and Retriever class to use the new
terminology.
- Changed retriever class from `GoogleCloudEnterpriseSearchRetriever` to
`GoogleVertexAISearchRetriever`
- Updated documentation to specify that `extractive_segments` requires
the new [Enterprise
edition](https://cloud.google.com/generative-ai-app-builder/docs/about-advanced-features#enterprise-features)
to be enabled.
  - Fixed spelling errors in documentation.
- Change parameter for Retriever from `search_engine_id` to
`data_store_id`
- When this retriever was originally implemented, there was no
distinction between a data store and search engine, but now these have
been split.
- Fixed an issue blocking some users where the api_endpoint can't be set
2023-10-05 10:47:47 -07:00

202 lines
6.0 KiB
Plaintext

# Google
All functionality related to [Google Cloud Platform](https://cloud.google.com/)
## LLMs
### Vertex AI
Access PaLM LLMs like `text-bison` and `code-bison` via Google Cloud.
```python
from langchain.llms import VertexAI
```
### Model Garden
Access PaLM and hundreds of OSS models via Vertex AI Model Garden.
```python
from langchain.llms import VertexAIModelGarden
```
## Chat models
### Vertex AI
Access PaLM chat models like `chat-bison` and `codechat-bison` via Google Cloud.
```python
from langchain.chat_models import ChatVertexAI
```
## Document Loader
### Google BigQuery
> [Google BigQuery](https://cloud.google.com/bigquery) is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.
`BigQuery` is a part of the `Google Cloud Platform`.
First, we need to install `google-cloud-bigquery` python package.
```bash
pip install google-cloud-bigquery
```
See a [usage example](/docs/integrations/document_loaders/google_bigquery).
```python
from langchain.document_loaders import BigQueryLoader
```
### Google Cloud Storage
>[Google Cloud Storage](https://en.wikipedia.org/wiki/Google_Cloud_Storage) is a managed service for storing unstructured data.
First, we need to install `google-cloud-storage` python package.
```bash
pip install google-cloud-storage
```
There are two loaders for the `Google Cloud Storage`: the `Directory` and the `File` loaders.
See a [usage example](/docs/integrations/document_loaders/google_cloud_storage_directory).
```python
from langchain.document_loaders import GCSDirectoryLoader
```
See a [usage example](/docs/integrations/document_loaders/google_cloud_storage_file).
```python
from langchain.document_loaders import GCSFileLoader
```
### Google Drive
>[Google Drive](https://en.wikipedia.org/wiki/Google_Drive) is a file storage and synchronization service developed by Google.
Currently, only `Google Docs` are supported.
First, we need to install several python package.
```bash
pip install google-api-python-client google-auth-httplib2 google-auth-oauthlib
```
See a [usage example and authorizing instructions](/docs/integrations/document_loaders/google_drive.html).
```python
from langchain.document_loaders import GoogleDriveLoader
```
## Vector Store
### Google Vertex AI MatchingEngine
> [Google Vertex AI Matching Engine](https://cloud.google.com/vertex-ai/docs/matching-engine/overview) provides
> the industry's leading high-scale low latency vector database. These vector databases are commonly
> referred to as vector similarity-matching or an approximate nearest neighbor (ANN) service.
We need to install several python packages.
```bash
pip install tensorflow google-cloud-aiplatform tensorflow-hub tensorflow-text
```
See a [usage example](/docs/integrations/vectorstores/matchingengine).
```python
from langchain.vectorstores import MatchingEngine
```
### Google ScaNN
>[Google ScaNN](https://github.com/google-research/google-research/tree/master/scann)
> (Scalable Nearest Neighbors) is a python package.
>
>`ScaNN` is a method for efficient vector similarity search at scale.
>`ScaNN` includes search space pruning and quantization for Maximum Inner
> Product Search and also supports other distance functions such as
> Euclidean distance. The implementation is optimized for x86 processors
> with AVX2 support. See its [Google Research github](https://github.com/google-research/google-research/tree/master/scann)
> for more details.
We need to install `scann` python package.
```bash
pip install scann
```
See a [usage example](/docs/integrations/vectorstores/scann).
```python
from langchain.vectorstores import ScaNN
```
## Retrievers
### Vertex AI Search
> [Google Cloud Vertex AI Search](https://cloud.google.com/generative-ai-app-builder/docs/introduction)
> allows developers to quickly build generative AI powered search engines for customers and employees.
First, you need to install the `google-cloud-discoveryengine` Python package.
```bash
pip install google-cloud-discoveryengine
```
See a [usage example](/docs/integrations/retrievers/google_vertex_ai_search).
```python
from langchain.retrievers import GoogleVertexAISearchRetriever
```
## Tools
### Google Search
- Install requirements with `pip install google-api-python-client`
- Set up a Custom Search Engine, following [these instructions](https://stackoverflow.com/questions/37083058/programmatically-searching-google-in-python-using-custom-search)
- Get an API Key and Custom Search Engine ID from the previous step, and set them as environment variables `GOOGLE_API_KEY` and `GOOGLE_CSE_ID` respectively
There exists a `GoogleSearchAPIWrapper` utility which wraps this API. To import this utility:
```python
from langchain.utilities import GoogleSearchAPIWrapper
```
For a more detailed walkthrough of this wrapper, see [this notebook](/docs/integrations/tools/google_search.html).
We can easily load this wrapper as a Tool (to use with an Agent). We can do this with:
```python
from langchain.agents import load_tools
tools = load_tools(["google-search"])
```
## Document Transformer
### Google Document AI
>[Document AI](https://cloud.google.com/document-ai/docs/overview) is a `Google Cloud Platform`
> service to transform unstructured data from documents into structured data, making it easier
> to understand, analyze, and consume.
We need to set up a [`GCS` bucket and create your own OCR processor](https://cloud.google.com/document-ai/docs/create-processor)
The `GCS_OUTPUT_PATH` should be a path to a folder on GCS (starting with `gs://`)
and a processor name should look like `projects/PROJECT_NUMBER/locations/LOCATION/processors/PROCESSOR_ID`.
We can get it either programmatically or copy from the `Prediction endpoint` section of the `Processor details`
tab in the Google Cloud Console.
```bash
pip install google-cloud-documentai
pip install google-cloud-documentai-toolbox
```
See a [usage example](/docs/integrations/document_transformers/docai).
```python
from langchain.document_loaders.blob_loaders import Blob
from langchain.document_loaders.parsers import DocAIParser
```