mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-06 21:43:44 +00:00
community[minor]: Oraclevs integration (#21123)
Thank you for contributing to LangChain! - Oracle AI Vector Search Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems. - Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems. This Pull Requests Adds the following functionalities Oracle AI Vector Search : Vector Store Oracle AI Vector Search : Document Loader Oracle AI Vector Search : Document Splitter Oracle AI Vector Search : Summary Oracle AI Vector Search : Oracle Embeddings - We have added unit tests and have our own local unit test suite which verifies all the code is correct. We have made sure to add guides for each of the components and one end to end guide that shows how the entire thing runs. - We have made sure that make format and make lint run clean. Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: skmishraoracle <shailendra.mishra@oracle.com> Co-authored-by: hroyofc <harichandan.roy@oracle.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
236
docs/docs/integrations/document_loaders/oracleai.ipynb
Normal file
236
docs/docs/integrations/document_loaders/oracleai.ipynb
Normal file
@@ -0,0 +1,236 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Oracle AI Vector Search: Document Processing\n",
|
||||
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
|
||||
"\n",
|
||||
"The guide demonstrates how to use Document Processing Capabilities within Oracle AI Vector Search to load and chunk documents using OracleDocLoader and OracleTextSplitter respectively."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Prerequisites\n",
|
||||
"\n",
|
||||
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# pip install oracledb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Connect to Oracle Database\n",
|
||||
"The following sample code will show how to connect to Oracle Database. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import sys\n",
|
||||
"\n",
|
||||
"import oracledb\n",
|
||||
"\n",
|
||||
"# please update with your username, password, hostname and service_name\n",
|
||||
"username = \"<username>\"\n",
|
||||
"password = \"<password>\"\n",
|
||||
"dsn = \"<hostname>/<service_name>\"\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
|
||||
" print(\"Connection successful!\")\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"Connection failed!\")\n",
|
||||
" sys.exit(1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now let's create a table and insert some sample docs to test."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"try:\n",
|
||||
" cursor = conn.cursor()\n",
|
||||
"\n",
|
||||
" drop_table_sql = \"\"\"drop table if exists demo_tab\"\"\"\n",
|
||||
" cursor.execute(drop_table_sql)\n",
|
||||
"\n",
|
||||
" create_table_sql = \"\"\"create table demo_tab (id number, data clob)\"\"\"\n",
|
||||
" cursor.execute(create_table_sql)\n",
|
||||
"\n",
|
||||
" insert_row_sql = \"\"\"insert into demo_tab values (:1, :2)\"\"\"\n",
|
||||
" rows_to_insert = [\n",
|
||||
" (\n",
|
||||
" 1,\n",
|
||||
" \"If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\",\n",
|
||||
" ),\n",
|
||||
" (\n",
|
||||
" 2,\n",
|
||||
" \"A tablespace can be online (accessible) or offline (not accessible) whenever the database is open.\\nA tablespace is usually online so that its data is available to users. The SYSTEM tablespace and temporary tablespaces cannot be taken offline.\",\n",
|
||||
" ),\n",
|
||||
" (\n",
|
||||
" 3,\n",
|
||||
" \"The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table.\\nSometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\",\n",
|
||||
" ),\n",
|
||||
" ]\n",
|
||||
" cursor.executemany(insert_row_sql, rows_to_insert)\n",
|
||||
"\n",
|
||||
" conn.commit()\n",
|
||||
"\n",
|
||||
" print(\"Table created and populated.\")\n",
|
||||
" cursor.close()\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"Table creation failed.\")\n",
|
||||
" cursor.close()\n",
|
||||
" conn.close()\n",
|
||||
" sys.exit(1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load Documents\n",
|
||||
"The users can load the documents from Oracle Database or a file system or both. They just need to set the loader parameters accordingly. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
|
||||
"\n",
|
||||
"The main benefit of using OracleDocLoader is that it can handle 150+ different file formats. You don't need to use different types of loader for different file formats. Here is the list of the formats that we support: [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html)\n",
|
||||
"\n",
|
||||
"The following sample code will show how to do that:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders.oracleai import OracleDocLoader\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"\"\"\"\n",
|
||||
"# loading a local file\n",
|
||||
"loader_params = {}\n",
|
||||
"loader_params[\"file\"] = \"<file>\"\n",
|
||||
"\n",
|
||||
"# loading from a local directory\n",
|
||||
"loader_params = {}\n",
|
||||
"loader_params[\"dir\"] = \"<directory>\"\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"# loading from Oracle Database table\n",
|
||||
"loader_params = {\n",
|
||||
" \"owner\": \"<owner>\",\n",
|
||||
" \"tablename\": \"demo_tab\",\n",
|
||||
" \"colname\": \"data\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"\"\"\" load the docs \"\"\"\n",
|
||||
"loader = OracleDocLoader(conn=conn, params=loader_params)\n",
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"\"\"\" verify \"\"\"\n",
|
||||
"print(f\"Number of docs loaded: {len(docs)}\")\n",
|
||||
"# print(f\"Document-0: {docs[0].page_content}\") # content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Split Documents\n",
|
||||
"The documents can be in different sizes: small, medium, large, or very large. The users like to split/chunk their documents into smaller pieces to generate embeddings. There are lots of different splitting customizations the users can do. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
|
||||
"\n",
|
||||
"The following sample code will show how to do that:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders.oracleai import OracleTextSplitter\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"\"\"\"\n",
|
||||
"# Some examples\n",
|
||||
"# split by chars, max 500 chars\n",
|
||||
"splitter_params = {\"split\": \"chars\", \"max\": 500, \"normalize\": \"all\"}\n",
|
||||
"\n",
|
||||
"# split by words, max 100 words\n",
|
||||
"splitter_params = {\"split\": \"words\", \"max\": 100, \"normalize\": \"all\"}\n",
|
||||
"\n",
|
||||
"# split by sentence, max 20 sentences\n",
|
||||
"splitter_params = {\"split\": \"sentence\", \"max\": 20, \"normalize\": \"all\"}\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"# split by default parameters\n",
|
||||
"splitter_params = {\"normalize\": \"all\"}\n",
|
||||
"\n",
|
||||
"# get the splitter instance\n",
|
||||
"splitter = OracleTextSplitter(conn=conn, params=splitter_params)\n",
|
||||
"\n",
|
||||
"list_chunks = []\n",
|
||||
"for doc in docs:\n",
|
||||
" chunks = splitter.split_text(doc.page_content)\n",
|
||||
" list_chunks.extend(chunks)\n",
|
||||
"\n",
|
||||
"\"\"\" verify \"\"\"\n",
|
||||
"print(f\"Number of Chunks: {len(list_chunks)}\")\n",
|
||||
"# print(f\"Chunk-0: {list_chunks[0]}\") # content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### End to End Demo\n",
|
||||
"Please refer to our complete demo guide [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) to build an end to end RAG pipeline with the help of Oracle AI Vector Search.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
65
docs/docs/integrations/providers/oracleai.mdx
Normal file
65
docs/docs/integrations/providers/oracleai.mdx
Normal file
@@ -0,0 +1,65 @@
|
||||
# OracleAI Vector Search
|
||||
Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.
|
||||
This is not only powerful but also significantly more effective because you dont need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.
|
||||
|
||||
In addition, because Oracle has been building database technologies for so long, your vectors can benefit from all of Oracle Database's most powerful features, like the following:
|
||||
|
||||
* Partitioning Support
|
||||
* Real Application Clusters scalability
|
||||
* Exadata smart scans
|
||||
* Shard processing across geographically distributed databases
|
||||
* Transactions
|
||||
* Parallel SQL
|
||||
* Disaster recovery
|
||||
* Security
|
||||
* Oracle Machine Learning
|
||||
* Oracle Graph Database
|
||||
* Oracle Spatial and Graph
|
||||
* Oracle Blockchain
|
||||
* JSON
|
||||
|
||||
|
||||
## Document Loaders
|
||||
|
||||
Please check the [usage example](/docs/integrations/document_loaders/oracleai).
|
||||
|
||||
```python
|
||||
from langchain_community.document_loaders.oracleai import OracleDocLoader
|
||||
```
|
||||
|
||||
## Text Splitter
|
||||
|
||||
Please check the [usage example](/docs/integrations/document_loaders/oracleai).
|
||||
|
||||
```python
|
||||
from langchain_community.document_loaders.oracleai import OracleTextSplitter
|
||||
```
|
||||
|
||||
## Embeddings
|
||||
|
||||
Please check the [usage example](/docs/integrations/text_embedding/oracleai).
|
||||
|
||||
```python
|
||||
from langchain_community.embeddings.oracleai import OracleEmbeddings
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
Please check the [usage example](/docs/integrations/tools/oracleai).
|
||||
|
||||
```python
|
||||
from langchain_community.utilities.oracleai import OracleSummary
|
||||
```
|
||||
|
||||
## Vector Store
|
||||
|
||||
Please check the [usage example](/docs/integrations/vectorstores/oracle).
|
||||
|
||||
```python
|
||||
from langchain_community.vectorstores.oraclevs import OracleVS
|
||||
```
|
||||
|
||||
## End to End Demo
|
||||
|
||||
Please check the [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo).
|
||||
|
262
docs/docs/integrations/text_embedding/oracleai.ipynb
Normal file
262
docs/docs/integrations/text_embedding/oracleai.ipynb
Normal file
@@ -0,0 +1,262 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Oracle AI Vector Search: Generate Embeddings\n",
|
||||
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
|
||||
"\n",
|
||||
"The guide demonstrates how to use Embedding Capabilities within Oracle AI Vector Search to generate embeddings for your documents using OracleEmbeddings."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Prerequisites\n",
|
||||
"\n",
|
||||
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# pip install oracledb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Connect to Oracle Database\n",
|
||||
"The following sample code will show how to connect to Oracle Database. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import sys\n",
|
||||
"\n",
|
||||
"import oracledb\n",
|
||||
"\n",
|
||||
"# please update with your username, password, hostname and service_name\n",
|
||||
"username = \"<username>\"\n",
|
||||
"password = \"<password>\"\n",
|
||||
"dsn = \"<hostname>/<service_name>\"\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
|
||||
" print(\"Connection successful!\")\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"Connection failed!\")\n",
|
||||
" sys.exit(1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For embedding, we have a few provider options that the users can choose from such as database, 3rd party providers like ocigenai, huggingface, openai, etc. If the users choose to use 3rd party provider, they need to create a credential with corresponding authentication information. On the other hand, if the users choose to use 'database' as provider, they need to load an onnx model to Oracle Database for embeddings."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load ONNX Model\n",
|
||||
"\n",
|
||||
"To generate embeddings, Oracle provides a few provider options for users to choose from. The users can choose 'database' provider or some 3rd party providers like OCIGENAI, HuggingFace, etc.\n",
|
||||
"\n",
|
||||
"***Note*** If the users choose database option, they need to load an ONNX model to Oracle Database. The users do not need to load an ONNX model to Oracle Database if they choose to use 3rd party provider to generate embeddings.\n",
|
||||
"\n",
|
||||
"One of the core benefits of using an ONNX model is that the users do not need to transfer their data to 3rd party to generate embeddings. And also, since it does not involve any network or REST API calls, it may provide better performance.\n",
|
||||
"\n",
|
||||
"Here is the sample code to load an ONNX model to Oracle Database:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
|
||||
"\n",
|
||||
"# please update with your related information\n",
|
||||
"# make sure that you have onnx file in the system\n",
|
||||
"onnx_dir = \"DEMO_DIR\"\n",
|
||||
"onnx_file = \"tinybert.onnx\"\n",
|
||||
"model_name = \"demo_model\"\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" OracleEmbeddings.load_onnx_model(conn, onnx_dir, onnx_file, model_name)\n",
|
||||
" print(\"ONNX model loaded.\")\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"ONNX model loading failed!\")\n",
|
||||
" sys.exit(1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create Credential\n",
|
||||
"\n",
|
||||
"On the other hand, if the users choose to use 3rd party provider to generate embeddings, they need to create credential to access 3rd party provider's end points.\n",
|
||||
"\n",
|
||||
"***Note:*** The users do not need to create any credential if they choose to use 'database' provider to generate embeddings. Should the users choose to 3rd party provider, they need to create credential for the 3rd party provider they want to use. \n",
|
||||
"\n",
|
||||
"Here is a sample example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"try:\n",
|
||||
" cursor = conn.cursor()\n",
|
||||
" cursor.execute(\n",
|
||||
" \"\"\"\n",
|
||||
" declare\n",
|
||||
" jo json_object_t;\n",
|
||||
" begin\n",
|
||||
" -- HuggingFace\n",
|
||||
" dbms_vector_chain.drop_credential(credential_name => 'HF_CRED');\n",
|
||||
" jo := json_object_t();\n",
|
||||
" jo.put('access_token', '<access_token>');\n",
|
||||
" dbms_vector_chain.create_credential(\n",
|
||||
" credential_name => 'HF_CRED',\n",
|
||||
" params => json(jo.to_string));\n",
|
||||
"\n",
|
||||
" -- OCIGENAI\n",
|
||||
" dbms_vector_chain.drop_credential(credential_name => 'OCI_CRED');\n",
|
||||
" jo := json_object_t();\n",
|
||||
" jo.put('user_ocid','<user_ocid>');\n",
|
||||
" jo.put('tenancy_ocid','<tenancy_ocid>');\n",
|
||||
" jo.put('compartment_ocid','<compartment_ocid>');\n",
|
||||
" jo.put('private_key','<private_key>');\n",
|
||||
" jo.put('fingerprint','<fingerprint>');\n",
|
||||
" dbms_vector_chain.create_credential(\n",
|
||||
" credential_name => 'OCI_CRED',\n",
|
||||
" params => json(jo.to_string));\n",
|
||||
" end;\n",
|
||||
" \"\"\"\n",
|
||||
" )\n",
|
||||
" cursor.close()\n",
|
||||
" print(\"Credentials created.\")\n",
|
||||
"except Exception as ex:\n",
|
||||
" cursor.close()\n",
|
||||
" raise"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate Embeddings\n",
|
||||
"Oracle AI Vector Search provides a number of ways to generate embeddings. The users can load an ONNX embedding model to Oracle Database and use it to generate embeddings or use some 3rd party API's end points to generate embeddings. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"***Note:*** The users may need to set proxy if they want to use some 3rd party embedding generation providers other than 'database' provider (aka using ONNX model)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# proxy to be used when we instantiate summary and embedder object\n",
|
||||
"proxy = \"<proxy>\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The following sample code will show how to generate embeddings:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"\"\"\"\n",
|
||||
"# using ocigenai\n",
|
||||
"embedder_params = {\n",
|
||||
" \"provider\": \"ocigenai\",\n",
|
||||
" \"credential_name\": \"OCI_CRED\",\n",
|
||||
" \"url\": \"https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/embedText\",\n",
|
||||
" \"model\": \"cohere.embed-english-light-v3.0\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"# using huggingface\n",
|
||||
"embedder_params = {\n",
|
||||
" \"provider\": \"huggingface\", \n",
|
||||
" \"credential_name\": \"HF_CRED\", \n",
|
||||
" \"url\": \"https://api-inference.huggingface.co/pipeline/feature-extraction/\", \n",
|
||||
" \"model\": \"sentence-transformers/all-MiniLM-L6-v2\", \n",
|
||||
" \"wait_for_model\": \"true\"\n",
|
||||
"}\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"# using ONNX model loaded to Oracle Database\n",
|
||||
"embedder_params = {\"provider\": \"database\", \"model\": \"demo_model\"}\n",
|
||||
"\n",
|
||||
"# Remove proxy if not required\n",
|
||||
"embedder = OracleEmbeddings(conn=conn, params=embedder_params, proxy=proxy)\n",
|
||||
"embed = embedder.embed_query(\"Hello World!\")\n",
|
||||
"\n",
|
||||
"\"\"\" verify \"\"\"\n",
|
||||
"print(f\"Embedding generated by OracleEmbeddings: {embed}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### End to End Demo\n",
|
||||
"Please refer to our complete demo guide [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) to build an end to end RAG pipeline with the help of Oracle AI Vector Search.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
174
docs/docs/integrations/tools/oracleai.ipynb
Normal file
174
docs/docs/integrations/tools/oracleai.ipynb
Normal file
@@ -0,0 +1,174 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Oracle AI Vector Search: Generate Summary\n",
|
||||
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
|
||||
"\n",
|
||||
"The guide demonstrates how to use Summary Capabilities within Oracle AI Vector Search to generate summary for your documents using OracleSummary."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Prerequisites\n",
|
||||
"\n",
|
||||
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# pip install oracledb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Connect to Oracle Database\n",
|
||||
"The following sample code will show how to connect to Oracle Database. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import sys\n",
|
||||
"\n",
|
||||
"import oracledb\n",
|
||||
"\n",
|
||||
"# please update with your username, password, hostname and service_name\n",
|
||||
"username = \"<username>\"\n",
|
||||
"password = \"<password>\"\n",
|
||||
"dsn = \"<hostname>/<service_name>\"\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
|
||||
" print(\"Connection successful!\")\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"Connection failed!\")\n",
|
||||
" sys.exit(1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate Summary\n",
|
||||
"The Oracle AI Vector Search Langchain library provides APIs to generate summaries of documents. There are a few summary generation provider options including Database, OCIGENAI, HuggingFace and so on. The users can choose their preferred provider to generate a summary. They just need to set the summary parameters accordingly. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"***Note:*** The users may need to set proxy if they want to use some 3rd party summary generation providers other than Oracle's in-house and default provider: 'database'. If you don't have proxy, please remove the proxy parameter when you instantiate the OracleSummary."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# proxy to be used when we instantiate summary and embedder object\n",
|
||||
"proxy = \"<proxy>\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The following sample code will show how to generate summary:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.utilities.oracleai import OracleSummary\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"\"\"\"\n",
|
||||
"# using 'ocigenai' provider\n",
|
||||
"summary_params = {\n",
|
||||
" \"provider\": \"ocigenai\",\n",
|
||||
" \"credential_name\": \"OCI_CRED\",\n",
|
||||
" \"url\": \"https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/summarizeText\",\n",
|
||||
" \"model\": \"cohere.command\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"# using 'huggingface' provider\n",
|
||||
"summary_params = {\n",
|
||||
" \"provider\": \"huggingface\",\n",
|
||||
" \"credential_name\": \"HF_CRED\",\n",
|
||||
" \"url\": \"https://api-inference.huggingface.co/models/\",\n",
|
||||
" \"model\": \"facebook/bart-large-cnn\",\n",
|
||||
" \"wait_for_model\": \"true\"\n",
|
||||
"}\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"# using 'database' provider\n",
|
||||
"summary_params = {\n",
|
||||
" \"provider\": \"database\",\n",
|
||||
" \"glevel\": \"S\",\n",
|
||||
" \"numParagraphs\": 1,\n",
|
||||
" \"language\": \"english\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"# get the summary instance\n",
|
||||
"# Remove proxy if not required\n",
|
||||
"summ = OracleSummary(conn=conn, params=summary_params, proxy=proxy)\n",
|
||||
"summary = summ.get_summary(\n",
|
||||
" \"In the heart of the forest, \"\n",
|
||||
" + \"a lone fox ventured out at dusk, seeking a lost treasure. \"\n",
|
||||
" + \"With each step, memories flooded back, guiding its path. \"\n",
|
||||
" + \"As the moon rose high, illuminating the night, the fox unearthed \"\n",
|
||||
" + \"not gold, but a forgotten friendship, worth more than any riches.\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(f\"Summary generated by OracleSummary: {summary}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### End to End Demo\n",
|
||||
"Please refer to our complete demo guide [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) to build an end to end RAG pipeline with the help of Oracle AI Vector Search.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
469
docs/docs/integrations/vectorstores/oracle.ipynb
Normal file
469
docs/docs/integrations/vectorstores/oracle.ipynb
Normal file
@@ -0,0 +1,469 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dd33e9d5-9dba-4aac-9f7f-4cf9e6686593",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Oracle AI Vector Search: Vector Store\n",
|
||||
"\n",
|
||||
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.\n",
|
||||
"One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.\n",
|
||||
"This is not only powerful but also significantly more effective because you dont need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
|
||||
"\n",
|
||||
"In addition, because Oracle has been building database technologies for so long, your vectors can benefit from all of Oracle Database's most powerful features, like the following:\n",
|
||||
"\n",
|
||||
" * Partitioning Support\n",
|
||||
" * Real Application Clusters scalability\n",
|
||||
" * Exadata smart scans\n",
|
||||
" * Shard processing across geographically distributed databases\n",
|
||||
" * Transactions\n",
|
||||
" * Parallel SQL\n",
|
||||
" * Disaster recovery\n",
|
||||
" * Security\n",
|
||||
" * Oracle Machine Learning\n",
|
||||
" * Oracle Graph Database\n",
|
||||
" * Oracle Spatial and Graph\n",
|
||||
" * Oracle Blockchain\n",
|
||||
" * JSON"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7bd80054-c803-47e1-a259-c40ed073c37d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Prerequisites for using Langchain with Oracle AI Vector Search\n",
|
||||
"\n",
|
||||
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2bbb989d-c6fb-4ab9-bafd-a95fd48538d0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# pip install oracledb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0fceaa5a-95da-4ebd-8b8d-5e73bb653172",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Connect to Oracle AI Vector Search"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4421e4b7-2c7e-4bcd-82b3-9576595edd0f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import oracledb\n",
|
||||
"\n",
|
||||
"username = \"username\"\n",
|
||||
"password = \"password\"\n",
|
||||
"dsn = \"ipaddress:port/orclpdb1\"\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" connection = oracledb.connect(user=username, password=password, dsn=dsn)\n",
|
||||
" print(\"Connection successful!\")\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"Connection failed!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b11cf362-01b0-485d-8527-31b0fbb5028e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Import the required dependencies to play with Oracle AI Vector Search"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "43ea59e3-2910-45a6-b195-5f06094bb7c9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.embeddings import HuggingFaceEmbeddings\n",
|
||||
"from langchain_community.vectorstores import oraclevs\n",
|
||||
"from langchain_community.vectorstores.oraclevs import OracleVS\n",
|
||||
"from langchain_community.vectorstores.utils import DistanceStrategy\n",
|
||||
"from langchain_core.documents import Document"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0aac10dc-a9cc-4fdb-901c-1b7a4bbbe5a7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load Documents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "70ac6982-b13a-4e8c-9c47-57c6d136ac60",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Define a list of documents (These dummy examples are 5 random documents from Oracle Concepts Manual )\n",
|
||||
"\n",
|
||||
"documents_json_list = [\n",
|
||||
" {\n",
|
||||
" \"id\": \"cncpt_15.5.3.2.2_P4\",\n",
|
||||
" \"text\": \"If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\",\n",
|
||||
" \"link\": \"https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/logical-storage-structures.html#GUID-5387D7B2-C0CA-4C1E-811B-C7EB9B636442\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"id\": \"cncpt_15.5.5_P1\",\n",
|
||||
" \"text\": \"A tablespace can be online (accessible) or offline (not accessible) whenever the database is open.\\nA tablespace is usually online so that its data is available to users. The SYSTEM tablespace and temporary tablespaces cannot be taken offline.\",\n",
|
||||
" \"link\": \"https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/logical-storage-structures.html#GUID-D02B2220-E6F5-40D9-AFB5-BC69BCEF6CD4\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"id\": \"cncpt_22.3.4.3.1_P2\",\n",
|
||||
" \"text\": \"The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table.\\nSometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\",\n",
|
||||
" \"link\": \"https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/concepts-for-database-developers.html#GUID-3C50EAB8-FC39-4BB3-B680-4EACCE49E866\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"id\": \"cncpt_22.3.4.3.1_P3\",\n",
|
||||
" \"text\": \"The LOB segment stores data in pieces called chunks. A chunk is a logically contiguous set of data blocks and is the smallest unit of allocation for a LOB. A row in the table stores a pointer called a LOB locator, which points to the LOB index. When the table is queried, the database uses the LOB index to quickly locate the LOB chunks.\",\n",
|
||||
" \"link\": \"https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/concepts-for-database-developers.html#GUID-3C50EAB8-FC39-4BB3-B680-4EACCE49E866\",\n",
|
||||
" },\n",
|
||||
"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "eaa942d6-5954-4898-8c32-3627b923a3a5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Create Langchain Documents\n",
|
||||
"\n",
|
||||
"documents_langchain = []\n",
|
||||
"\n",
|
||||
"for doc in documents_json_list:\n",
|
||||
" metadata = {\"id\": doc[\"id\"], \"link\": doc[\"link\"]}\n",
|
||||
" doc_langchain = Document(page_content=doc[\"text\"], metadata=metadata)\n",
|
||||
" documents_langchain.append(doc_langchain)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6823f5e6-997c-4f15-927b-bd44c61f105f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Using AI Vector Search Create a bunch of Vector Stores with different distance strategies\n",
|
||||
"\n",
|
||||
"First we will create three vector stores each with different distance functions. Since we have not created indices in them yet, they will just create tables for now. Later we will use these vector stores to create HNSW indicies.\n",
|
||||
"\n",
|
||||
"You can manually connect to the Oracle Database and will see three tables \n",
|
||||
"Documents_DOT, Documents_COSINE and Documents_EUCLIDEAN. \n",
|
||||
"\n",
|
||||
"We will then create three additional tables Documents_DOT_IVF, Documents_COSINE_IVF and Documents_EUCLIDEAN_IVF which will be used\n",
|
||||
"to create IVF indicies on the tables instead of HNSW indices. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ed1b253e-5f5c-4a81-983c-74645213a170",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Ingest documents into Oracle Vector Store using different distance strategies\n",
|
||||
"\n",
|
||||
"model = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-mpnet-base-v2\")\n",
|
||||
"\n",
|
||||
"vector_store_dot = OracleVS.from_documents(\n",
|
||||
" documents_langchain,\n",
|
||||
" model,\n",
|
||||
" client=connection,\n",
|
||||
" table_name=\"Documents_DOT\",\n",
|
||||
" distance_strategy=DistanceStrategy.DOT_PRODUCT,\n",
|
||||
")\n",
|
||||
"vector_store_max = OracleVS.from_documents(\n",
|
||||
" documents_langchain,\n",
|
||||
" model,\n",
|
||||
" client=connection,\n",
|
||||
" table_name=\"Documents_COSINE\",\n",
|
||||
" distance_strategy=DistanceStrategy.COSINE,\n",
|
||||
")\n",
|
||||
"vector_store_euclidean = OracleVS.from_documents(\n",
|
||||
" documents_langchain,\n",
|
||||
" model,\n",
|
||||
" client=connection,\n",
|
||||
" table_name=\"Documents_EUCLIDEAN\",\n",
|
||||
" distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Ingest documents into Oracle Vector Store using different distance strategies\n",
|
||||
"vector_store_dot_ivf = OracleVS.from_documents(\n",
|
||||
" documents_langchain,\n",
|
||||
" model,\n",
|
||||
" client=connection,\n",
|
||||
" table_name=\"Documents_DOT_IVF\",\n",
|
||||
" distance_strategy=DistanceStrategy.DOT_PRODUCT,\n",
|
||||
")\n",
|
||||
"vector_store_max_ivf = OracleVS.from_documents(\n",
|
||||
" documents_langchain,\n",
|
||||
" model,\n",
|
||||
" client=connection,\n",
|
||||
" table_name=\"Documents_COSINE_IVF\",\n",
|
||||
" distance_strategy=DistanceStrategy.COSINE,\n",
|
||||
")\n",
|
||||
"vector_store_euclidean_ivf = OracleVS.from_documents(\n",
|
||||
" documents_langchain,\n",
|
||||
" model,\n",
|
||||
" client=connection,\n",
|
||||
" table_name=\"Documents_EUCLIDEAN_IVF\",\n",
|
||||
" distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "77c29505-8688-4b87-9a99-e648fbb2d425",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Demonstrating add, delete operations for texts, and basic similarity search\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "306563ae-577b-4bc7-8a92-3dd6a59310f5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def manage_texts(vector_stores):\n",
|
||||
" \"\"\"\n",
|
||||
" Adds texts to each vector store, demonstrates error handling for duplicate additions,\n",
|
||||
" and performs deletion of texts. Showcases similarity searches and index creation for each vector store.\n",
|
||||
"\n",
|
||||
" Args:\n",
|
||||
" - vector_stores (list): A list of OracleVS instances.\n",
|
||||
" \"\"\"\n",
|
||||
" texts = [\"Rohan\", \"Shailendra\"]\n",
|
||||
" metadata = [\n",
|
||||
" {\"id\": \"100\", \"link\": \"Document Example Test 1\"},\n",
|
||||
" {\"id\": \"101\", \"link\": \"Document Example Test 2\"},\n",
|
||||
" ]\n",
|
||||
"\n",
|
||||
" for i, vs in enumerate(vector_stores, start=1):\n",
|
||||
" # Adding texts\n",
|
||||
" try:\n",
|
||||
" vs.add_texts(texts, metadata)\n",
|
||||
" print(f\"\\n\\n\\nAdd texts complete for vector store {i}\\n\\n\\n\")\n",
|
||||
" except Exception as ex:\n",
|
||||
" print(f\"\\n\\n\\nExpected error on duplicate add for vector store {i}\\n\\n\\n\")\n",
|
||||
"\n",
|
||||
" # Deleting texts using the value of 'id'\n",
|
||||
" vs.delete([metadata[0][\"id\"]])\n",
|
||||
" print(f\"\\n\\n\\nDelete texts complete for vector store {i}\\n\\n\\n\")\n",
|
||||
"\n",
|
||||
" # Similarity search\n",
|
||||
" results = vs.similarity_search(\"How are LOBS stored in Oracle Database\", 2)\n",
|
||||
" print(f\"\\n\\n\\nSimilarity search results for vector store {i}: {results}\\n\\n\\n\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"vector_store_list = [\n",
|
||||
" vector_store_dot,\n",
|
||||
" vector_store_max,\n",
|
||||
" vector_store_euclidean,\n",
|
||||
" vector_store_dot_ivf,\n",
|
||||
" vector_store_max_ivf,\n",
|
||||
" vector_store_euclidean_ivf,\n",
|
||||
"]\n",
|
||||
"manage_texts(vector_store_list)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0980cb33-69cf-4547-842a-afdc4d6fa7d3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Demonstrating index creation with specific parameters for each distance strategy\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "46298a27-e309-456e-b2b8-771d9cb3be29",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def create_search_indices(connection):\n",
|
||||
" \"\"\"\n",
|
||||
" Creates search indices for the vector stores, each with specific parameters tailored to their distance strategy.\n",
|
||||
" \"\"\"\n",
|
||||
" # Index for DOT_PRODUCT strategy\n",
|
||||
" # Notice we are creating a HNSW index with default parameters\n",
|
||||
" # This will default to creating a HNSW index with 8 Parallel Workers and use the Default Accuracy used by Oracle AI Vector Search\n",
|
||||
" oraclevs.create_index(\n",
|
||||
" connection,\n",
|
||||
" vector_store_dot,\n",
|
||||
" params={\"idx_name\": \"hnsw_idx1\", \"idx_type\": \"HNSW\"},\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" # Index for COSINE strategy with specific parameters\n",
|
||||
" # Notice we are creating a HNSW index with parallel 16 and Target Accuracy Specification as 97 percent\n",
|
||||
" oraclevs.create_index(\n",
|
||||
" connection,\n",
|
||||
" vector_store_max,\n",
|
||||
" params={\n",
|
||||
" \"idx_name\": \"hnsw_idx2\",\n",
|
||||
" \"idx_type\": \"HNSW\",\n",
|
||||
" \"accuracy\": 97,\n",
|
||||
" \"parallel\": 16,\n",
|
||||
" },\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" # Index for EUCLIDEAN_DISTANCE strategy with specific parameters\n",
|
||||
" # Notice we are creating a HNSW index by specifying Power User Parameters which are neighbors = 64 and efConstruction = 100\n",
|
||||
" oraclevs.create_index(\n",
|
||||
" connection,\n",
|
||||
" vector_store_euclidean,\n",
|
||||
" params={\n",
|
||||
" \"idx_name\": \"hnsw_idx3\",\n",
|
||||
" \"idx_type\": \"HNSW\",\n",
|
||||
" \"neighbors\": 64,\n",
|
||||
" \"efConstruction\": 100,\n",
|
||||
" },\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" # Index for DOT_PRODUCT strategy with specific parameters\n",
|
||||
" # Notice we are creating an IVF index with default parameters\n",
|
||||
" # This will default to creating an IVF index with 8 Parallel Workers and use the Default Accuracy used by Oracle AI Vector Search\n",
|
||||
" oraclevs.create_index(\n",
|
||||
" connection,\n",
|
||||
" vector_store_dot_ivf,\n",
|
||||
" params={\n",
|
||||
" \"idx_name\": \"ivf_idx1\",\n",
|
||||
" \"idx_type\": \"IVF\",\n",
|
||||
" },\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" # Index for COSINE strategy with specific parameters\n",
|
||||
" # Notice we are creating an IVF index with parallel 32 and Target Accuracy Specification as 90 percent\n",
|
||||
" oraclevs.create_index(\n",
|
||||
" connection,\n",
|
||||
" vector_store_max_ivf,\n",
|
||||
" params={\n",
|
||||
" \"idx_name\": \"ivf_idx2\",\n",
|
||||
" \"idx_type\": \"IVF\",\n",
|
||||
" \"accuracy\": 90,\n",
|
||||
" \"parallel\": 32,\n",
|
||||
" },\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" # Index for EUCLIDEAN_DISTANCE strategy with specific parameters\n",
|
||||
" # Notice we are creating an IVF index by specifying Power User Parameters which is neighbor_part = 64\n",
|
||||
" oraclevs.create_index(\n",
|
||||
" connection,\n",
|
||||
" vector_store_euclidean_ivf,\n",
|
||||
" params={\"idx_name\": \"ivf_idx3\", \"idx_type\": \"IVF\", \"neighbor_part\": 64},\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" print(\"Index creation complete.\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"create_search_indices(connection)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7223d048-5c0b-4e91-a91b-a7daa9f86758",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Now we will conduct a bunch of advanced searches on all six vector stores. Each of these three searches have a with and without filter version. The filter only selects the document with id 101 out and filters out everything else"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "37ca2e7d-9803-4260-95e7-62776d4fb820",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Conduct advanced searches after creating the indices\n",
|
||||
"def conduct_advanced_searches(vector_stores):\n",
|
||||
" query = \"How are LOBS stored in Oracle Database\"\n",
|
||||
" # Constructing a filter for direct comparison against document metadata\n",
|
||||
" # This filter aims to include documents whose metadata 'id' is exactly '2'\n",
|
||||
" filter_criteria = {\"id\": [\"101\"]} # Direct comparison filter\n",
|
||||
"\n",
|
||||
" for i, vs in enumerate(vector_stores, start=1):\n",
|
||||
" print(f\"\\n--- Vector Store {i} Advanced Searches ---\")\n",
|
||||
" # Similarity search without a filter\n",
|
||||
" print(\"\\nSimilarity search results without filter:\")\n",
|
||||
" print(vs.similarity_search(query, 2))\n",
|
||||
"\n",
|
||||
" # Similarity search with a filter\n",
|
||||
" print(\"\\nSimilarity search results with filter:\")\n",
|
||||
" print(vs.similarity_search(query, 2, filter=filter_criteria))\n",
|
||||
"\n",
|
||||
" # Similarity search with relevance score\n",
|
||||
" print(\"\\nSimilarity search with relevance score:\")\n",
|
||||
" print(vs.similarity_search_with_score(query, 2))\n",
|
||||
"\n",
|
||||
" # Similarity search with relevance score with filter\n",
|
||||
" print(\"\\nSimilarity search with relevance score with filter:\")\n",
|
||||
" print(vs.similarity_search_with_score(query, 2, filter=filter_criteria))\n",
|
||||
"\n",
|
||||
" # Max marginal relevance search\n",
|
||||
" print(\"\\nMax marginal relevance search results:\")\n",
|
||||
" print(vs.max_marginal_relevance_search(query, 2, fetch_k=20, lambda_mult=0.5))\n",
|
||||
"\n",
|
||||
" # Max marginal relevance search with filter\n",
|
||||
" print(\"\\nMax marginal relevance search results with filter:\")\n",
|
||||
" print(\n",
|
||||
" vs.max_marginal_relevance_search(\n",
|
||||
" query, 2, fetch_k=20, lambda_mult=0.5, filter=filter_criteria\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"conduct_advanced_searches(vector_store_list)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0da8c7e2-0db0-4363-b31b-a7a5e3f83717",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### End to End Demo\n",
|
||||
"Please refer to our complete demo guide [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) to build an end to end RAG pipeline with the help of Oracle AI Vector Search.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
Reference in New Issue
Block a user