mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-07 15:36:30 +00:00
336 lines
11 KiB
Plaintext
336 lines
11 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Google Firestore in Datastore Mode\n",
|
|
"\n",
|
|
"> [Firestore in Datastore Mode](https://cloud.google.com/datastore) is a NoSQL document database built for automatic scaling, high performance and ease of application development. Extend your database application to build AI-powered experiences leveraging Datastore's Langchain integrations.\n",
|
|
"\n",
|
|
"This notebook goes over how to use [Firestore in Datastore Mode](https://cloud.google.com/datastore) to [save, load and delete langchain documents](/docs/how_to#document-loaders) with `DatastoreLoader` and `DatastoreSaver`.\n",
|
|
"\n",
|
|
"Learn more about the package on [GitHub](https://github.com/googleapis/langchain-google-datastore-python/).\n",
|
|
"\n",
|
|
"[](https://colab.research.google.com/github/googleapis/langchain-google-datastore-python/blob/main/docs/document_loader.ipynb)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Before You Begin\n",
|
|
"\n",
|
|
"To run this notebook, you will need to do the following:\n",
|
|
"\n",
|
|
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
|
|
"* [Enable the Datastore API](https://console.cloud.google.com/flows/enableapi?apiid=datastore.googleapis.com)\n",
|
|
"* [Create a Firestore in Datastore Mode database](https://cloud.google.com/datastore/docs/manage-databases)\n",
|
|
"\n",
|
|
"After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### 🦜🔗 Library Installation\n",
|
|
"\n",
|
|
"The integration lives in its own `langchain-google-datastore` package, so we need to install it."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%pip install -upgrade --quiet langchain-google-datastore"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
|
|
"# import IPython\n",
|
|
"\n",
|
|
"# app = IPython.Application.instance()\n",
|
|
"# app.kernel.do_shutdown(True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### ☁ Set Your Google Cloud Project\n",
|
|
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
|
|
"\n",
|
|
"If you don't know your project ID, try the following:\n",
|
|
"\n",
|
|
"* Run `gcloud config list`.\n",
|
|
"* Run `gcloud projects list`.\n",
|
|
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
|
|
"\n",
|
|
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
|
|
"\n",
|
|
"# Set the project id\n",
|
|
"!gcloud config set project {PROJECT_ID}"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### 🔐 Authentication\n",
|
|
"\n",
|
|
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
|
|
"\n",
|
|
"- If you are using Colab to run this notebook, use the cell below and continue.\n",
|
|
"- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from google.colab import auth\n",
|
|
"\n",
|
|
"auth.authenticate_user()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Basic Usage"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Save documents\n",
|
|
"\n",
|
|
"Save langchain documents with `DatastoreSaver.upsert_documents(<documents>)`. By default it will try to extract the entity key from the `key` in the Document metadata."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_core.documents import Document\n",
|
|
"from langchain_google_datastore import DatastoreSaver\n",
|
|
"\n",
|
|
"saver = DatastoreSaver()\n",
|
|
"\n",
|
|
"data = [Document(page_content=\"Hello, World!\")]\n",
|
|
"saver.upsert_documents(data)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Save documents without key\n",
|
|
"\n",
|
|
"If a `kind` is specified the documents will be stored with an auto generated id."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"saver = DatastoreSaver(\"MyKind\")\n",
|
|
"\n",
|
|
"saver.upsert_documents(data)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Load documents via Kind\n",
|
|
"\n",
|
|
"Load langchain documents with `DatastoreLoader.load()` or `DatastoreLoader.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `DatastoreLoader` class you need to provide:\n",
|
|
"1. `source` - The source to load the documents. It can be an instance of Query or the name of the Datastore kind to read from."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_google_datastore import DatastoreLoader\n",
|
|
"\n",
|
|
"loader = DatastoreLoader(\"MyKind\")\n",
|
|
"data = loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Load documents via query\n",
|
|
"\n",
|
|
"Other than loading documents from kind, we can also choose to load documents from query. For example:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from google.cloud import datastore\n",
|
|
"\n",
|
|
"client = datastore.Client(database=\"non-default-db\", namespace=\"custom_namespace\")\n",
|
|
"query_load = client.query(kind=\"MyKind\")\n",
|
|
"query_load.add_filter(\"region\", \"=\", \"west_coast\")\n",
|
|
"\n",
|
|
"loader_document = DatastoreLoader(query_load)\n",
|
|
"\n",
|
|
"data = loader_document.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Delete documents\n",
|
|
"\n",
|
|
"Delete a list of langchain documents from Datastore with `DatastoreSaver.delete_documents(<documents>)`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"saver = DatastoreSaver()\n",
|
|
"\n",
|
|
"saver.delete_documents(data)\n",
|
|
"\n",
|
|
"keys_to_delete = [\n",
|
|
" [\"Kind1\", \"identifier\"],\n",
|
|
" [\"Kind2\", 123],\n",
|
|
" [\"Kind3\", \"identifier\", \"NestedKind\", 456],\n",
|
|
"]\n",
|
|
"# The Documents will be ignored and only the document ids will be used.\n",
|
|
"saver.delete_documents(data, keys_to_delete)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Advanced Usage"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Load documents with customized document page content & metadata\n",
|
|
"\n",
|
|
"The arguments of `page_content_properties` and `metadata_properties` will specify the Entity properties to be written into LangChain Document `page_content` and `metadata`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"loader = DatastoreLoader(\n",
|
|
" source=\"MyKind\",\n",
|
|
" page_content_fields=[\"data_field\"],\n",
|
|
" metadata_fields=[\"metadata_field\"],\n",
|
|
")\n",
|
|
"\n",
|
|
"data = loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Customize Page Content Format\n",
|
|
"\n",
|
|
"When the `page_content` contains only one field the information will be the field value only. Otherwise the `page_content` will be in JSON format."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Customize Connection & Authentication"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from google.auth import compute_engine\n",
|
|
"from google.cloud.firestore import Client\n",
|
|
"\n",
|
|
"client = Client(database=\"non-default-db\", creds=compute_engine.Credentials())\n",
|
|
"loader = DatastoreLoader(\n",
|
|
" source=\"foo\",\n",
|
|
" client=client,\n",
|
|
")"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
} |