embeddings update

2025-08-09 21:08:59 +00:00 · 2025-04-25 15:23:57 +02:00 · 2025-04-25 15:23:57 +02:00 · b593cfd453
commit b593cfd453
parent d1043d85fa
2 changed files with 176 additions and 73 deletions
--- a/docs/docs/integrations/text_embedding/google_generative_ai.ipynb
+++ b/docs/docs/integrations/text_embedding/google_generative_ai.ipynb
@ -1,13 +1,76 @@
 {
 "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "8543d632",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "sidebar_label: Google Gemini\n",
+    "keywords: [google gemini embeddings]\n",
+    "---"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "afab8b36-10bb-4795-bc98-75ab2d2081bb",
   "metadata": {},
   "source": [
-    "# Google Generative AI Embeddings\n",
+    "# Google Generative AI Embeddings (AI Studio & Gemini API)\n",
    "\n",
-    "Connect to Google's generative AI embeddings service using the `GoogleGenerativeAIEmbeddings` class, found in the [langchain-google-genai](https://pypi.org/project/langchain-google-genai/) package."
+    "Connect to Google's generative AI embeddings service using the `GoogleGenerativeAIEmbeddings` class, found in the [langchain-google-genai](https://pypi.org/project/langchain-google-genai/) package.\n",
+    "\n",
+    "This will help you get started with Google's Generative AI embedding models (like Gemini) using LangChain. For detailed documentation on `GoogleGenerativeAIEmbeddings` features and configuration options, please refer to the [API reference](https://python.langchain.com/v0.2/api_reference/google_genai/embeddings/langchain_google_genai.embeddings.GoogleGenerativeAIEmbeddings.html).\n",
+    "\n",
+    "## Overview\n",
+    "### Integration details\n",
+    "\n",
+    "import { ItemTable } from \"@theme/FeatureTables\";\n",
+    "\n",
+    "<ItemTable category=\"text_embedding\" item=\"Google Gemini\" />\n",
+    "\n",
+    "## Setup\n",
+    "\n",
+    "To access Google Generative AI embedding models you'll need to create a Google Cloud project, enable the Generative Language API, get an API key, and install the `langchain-google-genai` integration package.\n",
+    "\n",
+    "### Credentials\n",
+    "\n",
+    "To use Google Generative AI models, you must have an API key. You can create one in Google AI Studio. See the [Google documentation](https://ai.google.dev/gemini-api/docs/api-key) for instructions.\n",
+    "\n",
+    "Once you have a key, set it as an environment variable `GOOGLE_API_KEY`:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "47652620",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import getpass\n",
+    "import os\n",
+    "\n",
+    "if not os.getenv(\"GOOGLE_API_KEY\"):\n",
+    "    os.environ[\"GOOGLE_API_KEY\"] = getpass.getpass(\"Enter your Google API key: \")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "67283790",
+   "metadata": {},
+   "source": [
+    "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "eccf1968",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# os.environ[\"LANGSMITH_TRACING\"] = \"true\"\n",
+    "# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")"
   ]
  },
  {
@ -28,28 +91,6 @@
    "%pip install --upgrade --quiet  langchain-google-genai"
   ]
  },
-  {
-   "cell_type": "markdown",
-   "id": "25f3f88e-164e-400d-b371-9fa488baba19",
-   "metadata": {},
-   "source": [
-    "## Credentials"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ec89153f-8999-4aab-a21b-0bfba1cc3893",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import getpass\n",
-    "import os\n",
-    "\n",
-    "if \"GOOGLE_API_KEY\" not in os.environ:\n",
-    "    os.environ[\"GOOGLE_API_KEY\"] = getpass.getpass(\"Provide your Google API key here\")"
-   ]
-  },
  {
   "cell_type": "markdown",
   "id": "f2437b22-e364-418a-8c13-490a026cb7b5",
@ -60,17 +101,21 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 20,
   "id": "eedc551e-a1f3-4fd8-8d65-4e0784c4441b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "[0.05636945, 0.0048285457, -0.0762591, -0.023642512, 0.05329321]"
+       "[-0.024917153641581535,\n",
+       " 0.012005362659692764,\n",
+       " -0.003886754624545574,\n",
+       " -0.05774897709488869,\n",
+       " 0.0020742062479257584]"
      ]
     },
-     "execution_count": 6,
+     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -78,7 +123,7 @@
   "source": [
    "from langchain_google_genai import GoogleGenerativeAIEmbeddings\n",
    "\n",
-    "embeddings = GoogleGenerativeAIEmbeddings(model=\"models/text-embedding-004\")\n",
+    "embeddings = GoogleGenerativeAIEmbeddings(model=\"models/gemini-embedding-exp-03-07\")\n",
    "vector = embeddings.embed_query(\"hello, world!\")\n",
    "vector[:5]"
   ]
@ -95,17 +140,17 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 5,
   "id": "6ec53aba-404f-4778-acd9-5d6664e79ed2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "(3, 768)"
+       "(3, 3072)"
      ]
     },
-     "execution_count": 7,
+     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -121,6 +166,56 @@
    "len(vectors), len(vectors[0])"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "c362bfbf",
+   "metadata": {},
+   "source": [
+    "## Indexing and Retrieval\n",
+    "\n",
+    "Embedding models are often used in retrieval-augmented generation (RAG) flows, both as part of indexing data as well as later retrieving it. For more detailed instructions, please see our [RAG tutorials](/docs/tutorials/).\n",
+    "\n",
+    "Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document in the `InMemoryVectorStore`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "606a7f65",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'LangChain is the framework for building context-aware reasoning applications'"
+      ]
+     },
+     "execution_count": 21,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Create a vector store with a sample text\n",
+    "from langchain_core.vectorstores import InMemoryVectorStore\n",
+    "\n",
+    "text = \"LangChain is the framework for building context-aware reasoning applications\"\n",
+    "\n",
+    "vectorstore = InMemoryVectorStore.from_texts(\n",
+    "    [text],\n",
+    "    embedding=embeddings,\n",
+    ")\n",
+    "\n",
+    "# Use the vectorstore as a retriever\n",
+    "retriever = vectorstore.as_retriever()\n",
+    "\n",
+    "# Retrieve the most similar text\n",
+    "retrieved_documents = retriever.invoke(\"What is LangChain?\")\n",
+    "\n",
+    "# show the retrieved document's content\n",
+    "retrieved_documents[0].page_content"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "1482486f-5617-498a-8a44-1974d3212dda",
@ -129,70 +224,72 @@
    "## Task type\n",
    "`GoogleGenerativeAIEmbeddings` optionally support a `task_type`, which currently must be one of:\n",
    "\n",
-    "- task_type_unspecified\n",
-    "- retrieval_query\n",
-    "- retrieval_document\n",
-    "- semantic_similarity\n",
-    "- classification\n",
-    "- clustering\n",
+    "- `SEMANTIC_SIMILARITY`: Used to generate embeddings that are optimized to assess text similarity.\n",
+    "- `CLASSIFICATION`: Used to generate embeddings that are optimized to classify texts according to preset labels.\n",
+    "- `CLUSTERING`: Used to generate embeddings that are optimized to cluster texts based on their similarities.\n",
+    "- `RETRIEVAL_DOCUMENT`, `RETRIEVAL_QUERY`, `QUESTION_ANSWERING`, and `FACT_VERIFICATION`: Used to generate embeddings that are optimized for document search or information retrieval.\n",
+    "- `CODE_RETRIEVAL_QUERY`: Used to retrieve a code block based on a natural language query, such as sort an array or reverse a linked list. Embeddings of the code blocks are computed using `RETRIEVAL_DOCUMENT`.\n",
    "\n",
-    "By default, we use `retrieval_document` in the `embed_documents` method and `retrieval_query` in the `embed_query` method. If you provide a task type, we will use that for all methods."
+    "By default, we use `RETRIEVAL_DOCUMENT` in the `embed_documents` method and `RETRIEVAL_QUERY` in the `embed_query` method. If you provide a task type, we will use that for all methods."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 15,
-   "id": "a223bb25-2b1b-418e-a570-2f543083132e",
+   "execution_count": null,
+   "id": "b7acc5c2",
   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Note: you may need to restart the kernel to use updated packages.\n"
-     ]
-    }
-   ],
+   "outputs": [],
   "source": [
    "%pip install --upgrade --quiet  matplotlib scikit-learn"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 33,
+   "execution_count": 19,
   "id": "f1f077db-8eb4-49f7-8866-471a8528dcdb",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Document 1\n",
+      "Cosine similarity with query: 0.7892893360164779\n",
+      "---\n",
+      "Document 2\n",
+      "Cosine similarity with query: 0.5438283285204146\n",
+      "---\n"
+     ]
+    }
+   ],
   "source": [
+    "from langchain_google_genai import GoogleGenerativeAIEmbeddings\n",
+    "from sklearn.metrics.pairwise import cosine_similarity\n",
+    "\n",
    "query_embeddings = GoogleGenerativeAIEmbeddings(\n",
-    "    model=\"models/embedding-001\", task_type=\"retrieval_query\"\n",
+    "    model=\"models/gemini-embedding-exp-03-07\", task_type=\"RETRIEVAL_QUERY\"\n",
    ")\n",
    "doc_embeddings = GoogleGenerativeAIEmbeddings(\n",
-    "    model=\"models/embedding-001\", task_type=\"retrieval_document\"\n",
-    ")"
+    "    model=\"models/gemini-embedding-exp-03-07\", task_type=\"RETRIEVAL_DOCUMENT\"\n",
+    ")\n",
+    "\n",
+    "q_embed = query_embeddings.embed_query(\"What is the capital of France?\")\n",
+    "d_embed = doc_embeddings.embed_documents([\"The capital of France is Paris.\", \"Philipp is likes to eat pizza.\"])\n",
+    "\n",
+    "for i, d in enumerate(d_embed):\n",
+    "    print(f\"Document {i+1}:\")\n",
+    "    print(f\"Cosine similarity with query: {cosine_similarity([q_embed], [d])[0][0]}\")\n",
+    "    print(\"---\")\n"
   ]
  },
  {
   "cell_type": "markdown",
-   "id": "79bd4a5e-75ba-413c-befa-86167c938caf",
+   "id": "f45ea7b1",
   "metadata": {},
   "source": [
-    "All of these will be embedded with the 'retrieval_query' task set\n",
-    "```python\n",
-    "query_vecs = [query_embeddings.embed_query(q) for q in [query, query_2, answer_1]]\n",
-    "```\n",
-    "All of these will be embedded with the 'retrieval_document' task set\n",
-    "```python\n",
-    "doc_vecs = [doc_embeddings.embed_query(q) for q in [query, query_2, answer_1]]\n",
-    "```"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9e1fae5e-0f84-4812-89f5-7d4d71affbc1",
-   "metadata": {},
-   "source": [
-    "In retrieval, relative distance matters. In the image above, you can see the difference in similarity scores between the \"relevant doc\" and \"simil stronger delta between the similar query and relevant doc on the latter case."
+    "## API Reference\n",
+    "\n",
+    "For detailed documentation on `GoogleGenerativeAIEmbeddings` features and configuration options, please refer to the [API reference](https://python.langchain.com/api_reference/google_genai/embeddings/langchain_google_genai.embeddings.GoogleGenerativeAIEmbeddings.html).\n"
   ]
  },
  {
@ -211,7 +308,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
@ -225,7 +322,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.9.1"
+   "version": "3.9.6"
  }
 },
 "nbformat": 4,
--- a/docs/src/theme/FeatureTables.js
+++ b/docs/src/theme/FeatureTables.js
@ -366,6 +366,12 @@ const FEATURE_TABLES = {
                package: "langchain-openai",
                apiLink: "https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html"
            },
+            {
+                name: "Google Gemini",
+                link: "google-generative-ai",
+                package: "langchain-google-genai",
+                apiLink: "https://python.langchain.com/api_reference/google_genai/embeddings/langchain_google_genai.embeddings.GoogleGenerativeAIEmbeddings.html"
+            },
            {
                name: "Together",
                link: "together",