Bagatur/dingo (#9079)

Co-authored-by: gary <1625721671@qq.com>
2026-01-29 21:30:18 +00:00 · 2023-08-11 10:54:45 -07:00
parent 926c64da60
commit 8cb2594562
4 changed files with 614 additions and 0 deletions
--- a/docs/extras/integrations/providers/dingo.mdx
+++ b/docs/extras/integrations/providers/dingo.mdx
@@ -0,0 +1,19 @@
+# Dingo
+
+This page covers how to use the Dingo ecosystem within LangChain.
+It is broken into two parts: installation and setup, and then references to specific Dingo wrappers.
+
+## Installation and Setup
+- Install the Python SDK with `pip install dingodb`
+
+## VectorStore
+
+There exists a wrapper around Dingo indexes, allowing you to use it as a vectorstore,
+whether for semantic search or example selection.
+
+To import this vectorstore:
+```python
+from langchain.vectorstores import Dingo
+```
+
+For a more detailed walkthrough of the Dingo wrapper, see [this notebook](/docs/integrations/vectorstores/dingo.html)
--- a/docs/extras/integrations/vectorstores/dingo.ipynb
+++ b/docs/extras/integrations/vectorstores/dingo.ipynb
@@ -0,0 +1,244 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "683953b3",
+   "metadata": {},
+   "source": [
+    "# Dingo\n",
+    "\n",
+    ">[Dingo](https://dingodb.readthedocs.io/en/latest/) is a distributed multi-mode vector database, which combines the characteristics of data lakes and vector databases, and can store data of any type and size (Key-Value, PDF, audio, video, etc.). It has real-time low-latency processing capabilities to achieve rapid insight and response, and can efficiently conduct instant analysis and process multi-modal data.\n",
+    "\n",
+    "This notebook shows how to use functionality related to the DingoDB vector database.\n",
+    "\n",
+    "To run, you should have a [DingoDB instance up and running](https://github.com/dingodb/dingo-deploy/blob/main/README.md)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a62cff8a-bcf7-4e33-bbbc-76999c2e3e20",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "!pip install dingodb"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a0f9e02-8eb0-4aef-b11f-8861360472ee",
+   "metadata": {},
+   "source": [
+    "We want to use OpenAIEmbeddings so we have to get the OpenAI API Key."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "8b6ed9cd-81b9-46e5-9c20-5aafca2844d0",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "OpenAI API Key:········\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "import getpass\n",
+    "\n",
+    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "aac9563e",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
+    "from langchain.text_splitter import CharacterTextSplitter\n",
+    "from langchain.vectorstores import Dingo\n",
+    "from langchain.document_loaders import TextLoader"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "a3c3999a",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from langchain.document_loaders import TextLoader\n",
+    "\n",
+    "loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
+    "documents = loader.load()\n",
+    "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
+    "docs = text_splitter.split_documents(documents)\n",
+    "\n",
+    "embeddings = OpenAIEmbeddings()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "dcf88bdf",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from dingodb import DingoDB\n",
+    "\n",
+    "index_name = \"langchain-demo\"\n",
+    "\n",
+    "dingo_client = DingoDB(user=\"\", password=\"\", host=[\"127.0.0.1:13000\"])\n",
+    "# First, check if our index already exists. If it doesn't, we create it\n",
+    "if index_name not in dingo_client.get_index():\n",
+    "    # we create a new index\n",
+    "    dingo_client.create_index(\n",
+    "      index_name=index_name,\n",
+    "      dimension=1536,\n",
+    "      metric_type='cosine',\n",
+    "      auto_id=False\n",
+    ")\n",
+    "\n",
+    "# The OpenAI embedding model `text-embedding-ada-002 uses 1536 dimensions`\n",
+    "docsearch = Dingo.from_documents(docs, embeddings, client=dingo_client, index_name=index_name)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c3aae49e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
+    "from langchain.text_splitter import CharacterTextSplitter\n",
+    "from langchain.vectorstores import Dingo\n",
+    "from langchain.document_loaders import TextLoader"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "a8c513ab",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
+    "docs = docsearch.similarity_search(query)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "fc516993",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(docs[0][1])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1eca81e4",
+   "metadata": {},
+   "source": [
+    "### Adding More Text to an Existing Index\n",
+    "\n",
+    "More text can embedded and upserted to an existing Dingo index using the `add_texts` function"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e40d558b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vectorstore = Dingo(client, embeddings.embed_query, \"text\")\n",
+    "\n",
+    "vectorstore.add_texts(\"More text!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bcb858a8",
+   "metadata": {},
+   "source": [
+    "### Maximal Marginal Relevance Searches\n",
+    "\n",
+    "In addition to using similarity search in the retriever object, you can also use `mmr` as retriever."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "649083ab",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "retriever = docsearch.as_retriever(search_type=\"mmr\")\n",
+    "matched_docs = retriever.get_relevant_documents(query)\n",
+    "for i, d in enumerate(matched_docs):\n",
+    "    print(f\"\\n## Document {i}\\n\")\n",
+    "    print(d.page_content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d3831ad",
+   "metadata": {},
+   "source": [
+    "Or use `max_marginal_relevance_search` directly:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "732f58b1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "found_docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10)\n",
+    "for i, doc in enumerate(found_docs):\n",
+    "    print(f\"{i + 1}.\", doc.page_content, \"\\n\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}