partners: langchain-oceanbase Integration (#28782)

Hi, langchain team! I'm a maintainer of
[OceanBase](https://github.com/oceanbase/oceanbase).

With the integration guidance, I create a python lib named
[langchain-oceanbase](https://github.com/oceanbase/langchain-oceanbase)
to integrate `Oceanbase Vector Store` with `Langchain`.

So I'd like to add the required docs. I will appreciate your feedback.
Thank you!

---------

Signed-off-by: shanhaikang.shk <shanhaikang.shk@oceanbase.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
This commit is contained in:
GITHUBear 2024-12-18 06:51:49 -08:00 committed by GitHub
parent 986b752fc8
commit 33b1fb95b8
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 417 additions and 0 deletions

View File

@ -0,0 +1,31 @@
# OceanBase
[OceanBase Database](https://github.com/oceanbase/oceanbase) is a distributed relational database.
It is developed entirely by Ant Group. The OceanBase Database is built on a common server cluster.
Based on the Paxos protocol and its distributed structure, the OceanBase Database provides high availability and linear scalability.
OceanBase currently has the ability to store vectors. Users can easily perform the following operations with SQL:
- Create a table containing vector type fields;
- Create a vector index table based on the HNSW algorithm;
- Perform vector approximate nearest neighbor queries;
- ...
## Installation
```bash
pip install -U langchain-oceanbase
```
We recommend using Docker to deploy OceanBase:
```shell
docker run --name=ob433 -e MODE=slim -p 2881:2881 -d oceanbase/oceanbase-ce:4.3.3.0-100000132024100711
```
[More methods to deploy OceanBase cluster](https://github.com/oceanbase/oceanbase-doc/blob/V4.3.1/en-US/400.deploy/500.deploy-oceanbase-database-community-edition/100.deployment-overview.md)
### Usage
For a more detailed walkthrough of the OceanBase Wrapper, see [this notebook](https://github.com/oceanbase/langchain-oceanbase/blob/main/docs/vectorstores.ipynb)

View File

@ -0,0 +1,383 @@
{
"cells": [
{
"cell_type": "raw",
"id": "1957f5cb",
"metadata": {},
"source": [
"---\n",
"sidebar_label: Oceanbase\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "ef1f0986",
"metadata": {},
"source": [
"# OceanbaseVectorStore\n",
"\n",
"This notebook covers how to get started with the Oceanbase vector store."
]
},
{
"cell_type": "markdown",
"id": "36fdc060",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"To access Oceanbase vector stores you'll need to deploy a standalone OceanBase server:"
]
},
{
"cell_type": "raw",
"id": "a7b92118",
"metadata": {},
"source": [
"%docker run --name=ob433 -e MODE=mini -e OB_SERVER_IP=127.0.0.1 -p 2881:2881 -d quay.io/oceanbase/oceanbase-ce:4.3.3.1-101000012024102216"
]
},
{
"cell_type": "markdown",
"id": "5b687bdc",
"metadata": {},
"source": [
"And install the `langchain-oceanbase` integration package."
]
},
{
"cell_type": "raw",
"id": "64e28aa6",
"metadata": {
"vscode": {
"languageId": "raw"
}
},
"source": [
"%pip install -qU \"langchain-oceanbase\""
]
},
{
"cell_type": "markdown",
"id": "ea850342",
"metadata": {},
"source": [
"Check the connection to OceanBase and set the memory usage ratio for vector data:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "066bbc79",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<sqlalchemy.engine.cursor.CursorResult at 0x12696f2a0>"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pyobvector import ObVecClient\n",
"\n",
"tmp_client = ObVecClient()\n",
"tmp_client.perform_raw_text_sql(\"ALTER SYSTEM ob_vector_memory_limit_percentage = 30\")"
]
},
{
"cell_type": "markdown",
"id": "93df377e",
"metadata": {},
"source": [
"## Initialization\n",
"\n",
"Configure the API key of the embedded model. Here we use `DashScopeEmbeddings` as an example. When deploying `Oceanbase` with a Docker image as described above, simply follow the script below to set the `host`, `port`, `user`, `password`, and `database name`. For other deployment methods, set these parameters according to the actual situation."
]
},
{
"cell_type": "raw",
"id": "ff29e3b7",
"metadata": {},
"source": [
"%pip install dashscope"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "dc37144c-208d-4ab3-9f3a-0407a69fe052",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain_community.embeddings import DashScopeEmbeddings\n",
"from langchain_oceanbase.vectorstores import OceanbaseVectorStore\n",
"\n",
"DASHSCOPE_API = os.environ.get(\"DASHSCOPE_API_KEY\", \"\")\n",
"connection_args = {\n",
" \"host\": \"127.0.0.1\",\n",
" \"port\": \"2881\",\n",
" \"user\": \"root@test\",\n",
" \"password\": \"\",\n",
" \"db_name\": \"test\",\n",
"}\n",
"\n",
"embeddings = DashScopeEmbeddings(\n",
" model=\"text-embedding-v1\", dashscope_api_key=DASHSCOPE_API\n",
")\n",
"\n",
"vector_store = OceanbaseVectorStore(\n",
" embedding_function=embeddings,\n",
" table_name=\"langchain_vector\",\n",
" connection_args=connection_args,\n",
" vidx_metric_type=\"l2\",\n",
" drop_old=True,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "ac6071d4",
"metadata": {},
"source": [
"## Manage vector store\n",
"\n",
"### Add items to vector store\n",
"\n",
"- TODO: Edit and then run code cell to generate output"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "17f5efc0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['1', '2', '3']"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_core.documents import Document\n",
"\n",
"document_1 = Document(page_content=\"foo\", metadata={\"source\": \"https://foo.com\"})\n",
"document_2 = Document(page_content=\"bar\", metadata={\"source\": \"https://bar.com\"})\n",
"document_3 = Document(page_content=\"baz\", metadata={\"source\": \"https://baz.com\"})\n",
"\n",
"documents = [document_1, document_2, document_3]\n",
"\n",
"vector_store.add_documents(documents=documents, ids=[\"1\", \"2\", \"3\"])"
]
},
{
"cell_type": "markdown",
"id": "c738c3e0",
"metadata": {},
"source": [
"### Update items in vector store"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "f0aa8b71",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['1']"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"updated_document = Document(\n",
" page_content=\"qux\", metadata={\"source\": \"https://another-example.com\"}\n",
")\n",
"\n",
"vector_store.add_documents(documents=[updated_document], ids=[\"1\"])"
]
},
{
"cell_type": "markdown",
"id": "dcf1b905",
"metadata": {},
"source": [
"### Delete items from vector store"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "ef61e188",
"metadata": {},
"outputs": [],
"source": [
"vector_store.delete(ids=[\"3\"])"
]
},
{
"cell_type": "markdown",
"id": "c3620501",
"metadata": {},
"source": [
"## Query vector store\n",
"\n",
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n",
"\n",
"### Query directly\n",
"\n",
"Performing a simple similarity search can be done as follows:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "aa0a16fa",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* bar [{'source': 'https://bar.com'}]\n"
]
}
],
"source": [
"results = vector_store.similarity_search(\n",
" query=\"thud\", k=1, filter={\"source\": \"https://another-example.com\"}\n",
")\n",
"for doc in results:\n",
" print(f\"* {doc.page_content} [{doc.metadata}]\")"
]
},
{
"cell_type": "markdown",
"id": "3ed9d733",
"metadata": {},
"source": [
"If you want to execute a similarity search and receive the corresponding scores you can run:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "5efd2eaa",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* [SIM=133.452299] bar [{'source': 'https://bar.com'}]\n"
]
}
],
"source": [
"results = vector_store.similarity_search_with_score(\n",
" query=\"thud\", k=1, filter={\"source\": \"https://example.com\"}\n",
")\n",
"for doc, score in results:\n",
" print(f\"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]\")"
]
},
{
"cell_type": "markdown",
"id": "0c235cdc",
"metadata": {},
"source": [
"### Query by turning into retriever\n",
"\n",
"You can also transform the vector store into a retriever for easier usage in your chains. "
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "f3460093",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(metadata={'source': 'https://bar.com'}, page_content='bar')]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever = vector_store.as_retriever(search_kwargs={\"k\": 1})\n",
"retriever.invoke(\"thud\")"
]
},
{
"cell_type": "markdown",
"id": "901c75dc",
"metadata": {},
"source": [
"## Usage for retrieval-augmented generation\n",
"\n",
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
"\n",
"- [Tutorials](/docs/tutorials/)\n",
"- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n",
"- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/#retrieval)"
]
},
{
"cell_type": "markdown",
"id": "8a27244f",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all OceanbaseVectorStore features and configurations head to the API reference: https://python.langchain.com/docs/integrations/vectorstores/oceanbase"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -158,3 +158,6 @@ packages:
- name: langchain-linkup - name: langchain-linkup
repo: LinkupPlatform/langchain-linkup repo: LinkupPlatform/langchain-linkup
path: . path: .
- name: langchain-oceanbase
repo: oceanbase/langchain-oceanbase
path: .