mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-17 13:01:48 +00:00
399 lines
11 KiB
Plaintext
399 lines
11 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Google Firestore (Native Mode)\n",
|
|
"\n",
|
|
"> [Firestore](https://cloud.google.com/firestore) is a serverless document-oriented database that scales to meet any demand. Extend your database application to build AI-powered experiences leveraging Firestore's Langchain integrations.\n",
|
|
"\n",
|
|
"This notebook goes over how to use [Firestore](https://cloud.google.com/firestore) to [save, load and delete langchain documents](/docs/how_to#document-loaders) with `FirestoreLoader` and `FirestoreSaver`.\n",
|
|
"\n",
|
|
"Learn more about the package on [GitHub](https://github.com/googleapis/langchain-google-firestore-python/).\n",
|
|
"\n",
|
|
"[](https://colab.research.google.com/github/googleapis/langchain-google-firestore-python/blob/main/docs/document_loader.ipynb)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Before You Begin\n",
|
|
"\n",
|
|
"To run this notebook, you will need to do the following:\n",
|
|
"\n",
|
|
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
|
|
"* [Enable the Firestore API](https://console.cloud.google.com/flows/enableapi?apiid=firestore.googleapis.com)\n",
|
|
"* [Create a Firestore database](https://cloud.google.com/firestore/docs/manage-databases)\n",
|
|
"\n",
|
|
"After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# @markdown Please specify a source for demo purpose.\n",
|
|
"SOURCE = \"test\" # @param {type:\"Query\"|\"CollectionGroup\"|\"DocumentReference\"|\"string\"}"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### 🦜🔗 Library Installation\n",
|
|
"\n",
|
|
"The integration lives in its own `langchain-google-firestore` package, so we need to install it."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%pip install -upgrade --quiet langchain-google-firestore"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
|
|
"# import IPython\n",
|
|
"\n",
|
|
"# app = IPython.Application.instance()\n",
|
|
"# app.kernel.do_shutdown(True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### ☁ Set Your Google Cloud Project\n",
|
|
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
|
|
"\n",
|
|
"If you don't know your project ID, try the following:\n",
|
|
"\n",
|
|
"* Run `gcloud config list`.\n",
|
|
"* Run `gcloud projects list`.\n",
|
|
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
|
|
"\n",
|
|
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
|
|
"\n",
|
|
"# Set the project id\n",
|
|
"!gcloud config set project {PROJECT_ID}"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### 🔐 Authentication\n",
|
|
"\n",
|
|
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
|
|
"\n",
|
|
"- If you are using Colab to run this notebook, use the cell below and continue.\n",
|
|
"- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from google.colab import auth\n",
|
|
"\n",
|
|
"auth.authenticate_user()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Basic Usage"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Save documents\n",
|
|
"\n",
|
|
"`FirestoreSaver` can store Documents into Firestore. By default it will try to extract the Document reference from the metadata\n",
|
|
"\n",
|
|
"Save langchain documents with `FirestoreSaver.upsert_documents(<documents>)`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_core.documents import Document\n",
|
|
"from langchain_google_firestore import FirestoreSaver\n",
|
|
"\n",
|
|
"saver = FirestoreSaver()\n",
|
|
"\n",
|
|
"data = [Document(page_content=\"Hello, World!\")]\n",
|
|
"\n",
|
|
"saver.upsert_documents(data)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Save documents without reference\n",
|
|
"\n",
|
|
"If a collection is specified the documents will be stored with an auto generated id."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"saver = FirestoreSaver(\"Collection\")\n",
|
|
"\n",
|
|
"saver.upsert_documents(data)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Save documents with other references"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"doc_ids = [\"AnotherCollection/doc_id\", \"foo/bar\"]\n",
|
|
"saver = FirestoreSaver()\n",
|
|
"\n",
|
|
"saver.upsert_documents(documents=data, document_ids=doc_ids)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Load from Collection or SubCollection"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Load langchain documents with `FirestoreLoader.load()` or `Firestore.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `FirestoreLoader` class you need to provide:\n",
|
|
"\n",
|
|
"1. `source` - An instance of a Query, CollectionGroup, DocumentReference or the single `\\`-delimited path to a Firestore collection."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_google_firestore import FirestoreLoader\n",
|
|
"\n",
|
|
"loader_collection = FirestoreLoader(\"Collection\")\n",
|
|
"loader_subcollection = FirestoreLoader(\"Collection/doc/SubCollection\")\n",
|
|
"\n",
|
|
"\n",
|
|
"data_collection = loader_collection.load()\n",
|
|
"data_subcollection = loader_subcollection.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Load a single Document"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from google.cloud import firestore\n",
|
|
"\n",
|
|
"client = firestore.Client()\n",
|
|
"doc_ref = client.collection(\"foo\").document(\"bar\")\n",
|
|
"\n",
|
|
"loader_document = FirestoreLoader(doc_ref)\n",
|
|
"\n",
|
|
"data = loader_document.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Load from CollectionGroup or Query"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from google.cloud.firestore import CollectionGroup, FieldFilter, Query\n",
|
|
"\n",
|
|
"col_ref = client.collection(\"col_group\")\n",
|
|
"collection_group = CollectionGroup(col_ref)\n",
|
|
"\n",
|
|
"loader_group = FirestoreLoader(collection_group)\n",
|
|
"\n",
|
|
"col_ref = client.collection(\"collection\")\n",
|
|
"query = col_ref.where(filter=FieldFilter(\"region\", \"==\", \"west_coast\"))\n",
|
|
"\n",
|
|
"loader_query = FirestoreLoader(query)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Delete documents\n",
|
|
"\n",
|
|
"Delete a list of langchain documents from Firestore collection with `FirestoreSaver.delete_documents(<documents>)`.\n",
|
|
"\n",
|
|
"If document ids is provided, the Documents will be ignored."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"saver = FirestoreSaver()\n",
|
|
"\n",
|
|
"saver.delete_documents(data)\n",
|
|
"\n",
|
|
"# The Documents will be ignored and only the document ids will be used.\n",
|
|
"saver.delete_documents(data, doc_ids)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Advanced Usage"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Load documents with customize document page content & metadata\n",
|
|
"\n",
|
|
"The arguments of `page_content_fields` and `metadata_fields` will specify the Firestore Document fields to be written into LangChain Document `page_content` and `metadata`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"loader = FirestoreLoader(\n",
|
|
" source=\"foo/bar/subcol\",\n",
|
|
" page_content_fields=[\"data_field\"],\n",
|
|
" metadata_fields=[\"metadata_field\"],\n",
|
|
")\n",
|
|
"\n",
|
|
"data = loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Customize Page Content Format\n",
|
|
"\n",
|
|
"When the `page_content` contains only one field the information will be the field value only. Otherwise the `page_content` will be in JSON format."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Customize Connection & Authentication"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from google.auth import compute_engine\n",
|
|
"from google.cloud.firestore import Client\n",
|
|
"\n",
|
|
"client = Client(database=\"non-default-db\", creds=compute_engine.Credentials())\n",
|
|
"loader = FirestoreLoader(\n",
|
|
" source=\"foo\",\n",
|
|
" client=client,\n",
|
|
")"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|