docs: updates for OracleDB (#21745)

Thank you for contributing to LangChain!

Documentation change for OracleDB

Fixed several things in Oracle Documentation.
This commit is contained in:
Rohan Aggarwal 2024-05-20 16:01:35 -07:00 committed by GitHub
parent 9799437bc2
commit d8a101074f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 224 additions and 115 deletions

View File

@ -6,23 +6,24 @@
"source": [
"# Oracle AI Vector Search with Document Processing\n",
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.\n",
"One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"One of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.\n",
"This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"\n",
"In addition, because Oracle has been building database technologies for so long, your vectors can benefit from all of Oracle Database's most powerful features, like the following:\n",
"In addition, your vectors can benefit from all of Oracle Databases most powerful features, like the following:\n",
"\n",
" * Partitioning Support\n",
" * Real Application Clusters scalability\n",
" * Exadata smart scans\n",
" * Shard processing across geographically distributed databases\n",
" * Transactions\n",
" * Parallel SQL\n",
" * Disaster recovery\n",
" * Security\n",
" * Oracle Machine Learning\n",
" * Oracle Graph Database\n",
" * Oracle Spatial and Graph\n",
" * Oracle Blockchain\n",
" * JSON\n",
" * [Partitioning Support](https://www.oracle.com/database/technologies/partitioning.html)\n",
" * [Real Application Clusters scalability](https://www.oracle.com/database/real-application-clusters/)\n",
" * [Exadata smart scans](https://www.oracle.com/database/technologies/exadata/software/smartscan/)\n",
" * [Shard processing across geographically distributed databases](https://www.oracle.com/database/distributed-database/)\n",
" * [Transactions](https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/transactions.html)\n",
" * [Parallel SQL](https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-intro.html#GUID-D28717E4-0F77-44F5-BB4E-234C31D4E4BA)\n",
" * [Disaster recovery](https://www.oracle.com/database/data-guard/)\n",
" * [Security](https://www.oracle.com/security/database-security/)\n",
" * [Oracle Machine Learning](https://www.oracle.com/artificial-intelligence/database-machine-learning/)\n",
" * [Oracle Graph Database](https://www.oracle.com/database/integrated-graph-database/)\n",
" * [Oracle Spatial and Graph](https://www.oracle.com/database/spatial/)\n",
" * [Oracle Blockchain](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_blockchain_table.html#GUID-B469E277-978E-4378-A8C1-26D3FF96C9A6)\n",
" * [JSON](https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-in-oracle-database.html)\n",
"\n",
"This guide demonstrates how Oracle AI Vector Search can be used with Langchain to serve an end-to-end RAG pipeline. This guide goes through examples of:\n",
"\n",
@ -33,6 +34,13 @@
" * Storing and Indexing them in a Vector Store and querying them for queries in OracleVS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you are just starting with Oracle Database, consider exploring the [free Oracle 23 AI](https://www.oracle.com/database/free/#resources) which provides a great introduction to setting up your database environment. While working with the database, it is often advisable to avoid using the system user by default; instead, you can create your own user for enhanced security and customization. For detailed steps on user creation, refer to our [end-to-end guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo.ipynb) which also shows how to set up a user in Oracle. Additionally, understanding user privileges is crucial for managing database security effectively. You can learn more about this topic in the official [Oracle guide](https://docs.oracle.com/en/database/oracle/oracle-database/19/admqs/administering-user-accounts-and-security.html#GUID-36B21D72-1BBB-46C9-A0C9-F0D2A8591B8D) on administering user accounts and security."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -131,13 +139,13 @@
"metadata": {},
"source": [
"## Process Documents using Oracle AI\n",
"Let's think about a scenario that the users have some documents in Oracle Database or in a file system. They want to use the data for Oracle AI Vector Search using Langchain.\n",
"Consider the following scenario: users possess documents stored either in an Oracle Database or a file system and intend to utilize this data with Oracle AI Vector Search powered by Langchain.\n",
"\n",
"For that, the users need to do some document preprocessing. The first step would be to read the documents, generate their summary(if needed) and then chunk/split them if needed. After that, they need to generate the embeddings for those chunks and store into Oracle AI Vector Store. Finally, the users will perform some semantic queries on those data. \n",
"To prepare the documents for analysis, a comprehensive preprocessing workflow is necessary. Initially, the documents must be retrieved, summarized (if required), and chunked as needed. Subsequent steps involve generating embeddings for these chunks and integrating them into the Oracle AI Vector Store. Users can then conduct semantic searches on this data.\n",
"\n",
"Oracle AI Vector Search Langchain library provides a range of document processing functionalities including document loading, splitting, generating summary and embeddings.\n",
"The Oracle AI Vector Search Langchain library encompasses a suite of document processing tools that facilitate document loading, chunking, summary generation, and embedding creation.\n",
"\n",
"In the following sections, we will go through how to use Oracle AI Langchain APIs to achieve each of these functionalities individually. "
"In the sections that follow, we will detail the utilization of Oracle AI Langchain APIs to effectively implement each of these processes."
]
},
{
@ -145,7 +153,7 @@
"metadata": {},
"source": [
"### Connect to Demo User\n",
"The following sample code will show how to connect to Oracle Database. "
"The following sample code will show how to connect to Oracle Database. By default, python-oracledb runs in a Thin mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in Thick mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You might want to switch to thick-mode if you are unable to use thin-mode."
]
},
{
@ -242,9 +250,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"Now that we have a demo user and a demo table with some data, we just need to do one more setup. For embedding and summary, we have a few provider options that the users can choose from such as database, 3rd party providers like ocigenai, huggingface, openai, etc. If the users choose to use 3rd party provider, they need to create a credential with corresponding authentication information. On the other hand, if the users choose to use 'database' as provider, they need to load an onnx model to Oracle Database for embeddings; however, for summary, they don't need to do anything."
"With the inclusion of a demo user and a populated sample table, the remaining configuration involves setting up embedding and summary functionalities. Users are presented with multiple provider options, including local database solutions and third-party services such as Ocigenai, Hugging Face, and OpenAI. Should users opt for a third-party provider, they are required to establish credentials containing the necessary authentication details. Conversely, if selecting a database as the provider for embeddings, it is necessary to upload an ONNX model to the Oracle Database. No additional setup is required for summary functionalities when using the database option."
]
},
{
@ -253,13 +259,13 @@
"source": [
"### Load ONNX Model\n",
"\n",
"To generate embeddings, Oracle provides a few provider options for users to choose from. The users can choose 'database' provider or some 3rd party providers like OCIGENAI, HuggingFace, etc.\n",
"Oracle accommodates a variety of embedding providers, enabling users to choose between proprietary database solutions and third-party services such as OCIGENAI and HuggingFace. This selection dictates the methodology for generating and managing embeddings.\n",
"\n",
"***Note*** If the users choose database option, they need to load an ONNX model to Oracle Database. The users do not need to load an ONNX model to Oracle Database if they choose to use 3rd party provider to generate embeddings.\n",
"***Important*** : Should users opt for the database option, they must upload an ONNX model into the Oracle Database. Conversely, if a third-party provider is selected for embedding generation, uploading an ONNX model to Oracle Database is not required.\n",
"\n",
"One of the core benefits of using an ONNX model is that the users do not need to transfer their data to 3rd party to generate embeddings. And also, since it does not involve any network or REST API calls, it may provide better performance.\n",
"A significant advantage of utilizing an ONNX model directly within Oracle is the enhanced security and performance it offers by eliminating the need to transmit data to external parties. Additionally, this method avoids the latency typically associated with network or REST API calls.\n",
"\n",
"Here is the sample code to load an ONNX model to Oracle Database:"
"Below is the example code to upload an ONNX model into Oracle Database:"
]
},
{
@ -298,11 +304,11 @@
"source": [
"### Create Credential\n",
"\n",
"On the other hand, if the users choose to use 3rd party provider to generate embeddings and summary, they need to create credential to access 3rd party provider's end points.\n",
"When selecting third-party providers for generating embeddings, users are required to establish credentials to securely access the provider's endpoints.\n",
"\n",
"***Note:*** The users do not need to create any credential if they choose to use 'database' provider to generate embeddings and summary. Should the users choose to 3rd party provider, they need to create credential for the 3rd party provider they want to use. \n",
"***Important:*** No credentials are necessary when opting for the 'database' provider to generate embeddings. However, should users decide to utilize a third-party provider, they must create credentials specific to the chosen provider.\n",
"\n",
"Here is a sample example:"
"Below is an illustrative example:"
]
},
{
@ -352,11 +358,11 @@
"metadata": {},
"source": [
"### Load Documents\n",
"The users can load the documents from Oracle Database or a file system or both. They just need to set the loader parameters accordingly. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
"Users have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F).\n",
"\n",
"The main benefit of using OracleDocLoader is that it can handle 150+ different file formats. You don't need to use different types of loader for different file formats. Here is the list formats that we support: [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html)\n",
"A significant advantage of utilizing OracleDocLoader is its capability to process over 150 distinct file formats, eliminating the need for multiple loaders for different document types. For a complete list of the supported formats, please refer to the [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html).\n",
"\n",
"The following sample code will show how to do that:"
"Below is a sample code snippet that demonstrates how to use OracleDocLoader"
]
},
{
@ -399,7 +405,7 @@
"metadata": {},
"source": [
"### Generate Summary\n",
"Now that the user loaded the documents, they may want to generate a summary for each document. The Oracle AI Vector Search Langchain library provides an API to do that. There are a few summary generation provider options including Database, OCIGENAI, HuggingFace and so on. The users can choose their preferred provider to generate a summary. Like before, they just need to set the summary parameters accordingly. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters."
"Now that the user loaded the documents, they may want to generate a summary for each document. The Oracle AI Vector Search Langchain library offers a suite of APIs designed for document summarization. It supports multiple summarization providers such as Database, OCIGENAI, HuggingFace, among others, allowing users to select the provider that best meets their needs. To utilize these capabilities, users must configure the summary parameters as specified. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-EC9DDB58-6A15-4B36-BA66-ECBA20D2CE57)."
]
},
{
@ -470,9 +476,9 @@
"metadata": {},
"source": [
"### Split Documents\n",
"The documents can be in different sizes: small, medium, large, or very large. The users like to split/chunk their documents into smaller pieces to generate embeddings. There are lots of different splitting customizations the users can do. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
"The documents may vary in size, ranging from small to very large. Users often prefer to chunk their documents into smaller sections to facilitate the generation of embeddings. A wide array of customization options is available for this splitting process. For comprehensive details regarding these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-4E145629-7098-4C7C-804F-FC85D1F24240).\n",
"\n",
"The following sample code will show how to do that:"
"Below is a sample code illustrating how to implement this:"
]
},
{
@ -513,14 +519,16 @@
"metadata": {},
"source": [
"### Generate Embeddings\n",
"Now that the documents are chunked as per requirements, the users may want to generate embeddings for these chunks. Oracle AI Vector Search provides a number of ways to generate embeddings. The users can load an ONNX embedding model to Oracle Database and use it to generate embeddings or use some 3rd party API's end points to generate embeddings. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters."
"Now that the documents are chunked as per requirements, the users may want to generate embeddings for these chunks. Oracle AI Vector Search provides multiple methods for generating embeddings, utilizing either locally hosted ONNX models or third-party APIs. For comprehensive instructions on configuring these alternatives, please refer to the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-C6439E94-4E86-4ECD-954E-4B73D53579DE)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Note:*** The users may need to set proxy if they want to use some 3rd party embedding generation providers other than 'database' provider (aka using ONNX model)."
"***Note:*** Currently, OracleEmbeddings processes each embedding generation request individually, without batching, by calling REST endpoints separately for each request. This method could potentially lead to exceeding the maximum request per minute quota set by some providers. However, we are actively working to enhance this process by implementing request batching, which will allow multiple embedding requests to be combined into fewer API calls, thereby optimizing our use of provider resources and adhering to their request limits. This update is expected to be rolled out soon, eliminating the current limitation.\n",
"\n",
"***Note:*** Users may need to configure a proxy to utilize third-party embedding generation providers, excluding the 'database' provider that utilizes an ONNX model."
]
},
{
@ -752,20 +760,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The above example creates a vector store with DOT_PRODUCT distance strategy. \n",
"\n",
"However, the users can create Oracle AI Vector Store provides different distance strategies. Please see the [comprehensive guide](/docs/integrations/vectorstores/oracle) for more information."
"The example provided illustrates the creation of a vector store using the DOT_PRODUCT distance strategy. Users have the flexibility to employ various distance strategies with the Oracle AI Vector Store, as detailed in our [comprehensive guide](https://python.langchain.com/v0.1/docs/integrations/vectorstores/oracle/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have embeddings stored in vector stores, let's create an index on them to get better semantic search performance during query time.\n",
"With embeddings now stored in vector stores, it is advisable to establish an index to enhance semantic search performance during query execution.\n",
"\n",
"***Note*** If you are getting some insufficient memory error, please increase ***vector_memory_size*** in your database.\n",
"***Note*** Should you encounter an \"insufficient memory\" error, it is recommended to increase the ***vector_memory_size*** in your database configuration\n",
"\n",
"Here is the sample code to create an index:"
"Below is a sample code snippet for creating an index:"
]
},
{
@ -785,9 +791,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The above example creates a default HNSW index on the embeddings stored in 'oravs' table. The users can set different parameters as per their requirements. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
"This example demonstrates the creation of a default HNSW index on embeddings within the 'oravs' table. Users may adjust various parameters according to their specific needs. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/manage-different-categories-vector-indexes.html).\n",
"\n",
"Also, there are different types of vector indices that the users can create. Please see the [comprehensive guide](/docs/integrations/vectorstores/oracle) for more information.\n"
"Additionally, various types of vector indices can be created to meet diverse requirements. More details can be found in our [comprehensive guide](https://python.langchain.com/v0.1/docs/integrations/vectorstores/oracle/).\n"
]
},
{
@ -797,9 +803,9 @@
"## Perform Semantic Search\n",
"All set!\n",
"\n",
"We have processed the documents, stored them to vector store, and then created index to get better query performance. Now let's do some semantic searches.\n",
"We have successfully processed the documents and stored them in the vector store, followed by the creation of an index to enhance query performance. We are now prepared to proceed with semantic searches.\n",
"\n",
"Here is the sample code for this:"
"Below is the sample code for this process:"
]
},
{

View File

@ -5,11 +5,37 @@
"metadata": {},
"source": [
"# Oracle AI Vector Search: Document Processing\n",
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.\n",
"One of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.\n",
"This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"\n",
"In addition, your vectors can benefit from all of Oracle Databases most powerful features, like the following:\n",
"\n",
" * [Partitioning Support](https://www.oracle.com/database/technologies/partitioning.html)\n",
" * [Real Application Clusters scalability](https://www.oracle.com/database/real-application-clusters/)\n",
" * [Exadata smart scans](https://www.oracle.com/database/technologies/exadata/software/smartscan/)\n",
" * [Shard processing across geographically distributed databases](https://www.oracle.com/database/distributed-database/)\n",
" * [Transactions](https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/transactions.html)\n",
" * [Parallel SQL](https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-intro.html#GUID-D28717E4-0F77-44F5-BB4E-234C31D4E4BA)\n",
" * [Disaster recovery](https://www.oracle.com/database/data-guard/)\n",
" * [Security](https://www.oracle.com/security/database-security/)\n",
" * [Oracle Machine Learning](https://www.oracle.com/artificial-intelligence/database-machine-learning/)\n",
" * [Oracle Graph Database](https://www.oracle.com/database/integrated-graph-database/)\n",
" * [Oracle Spatial and Graph](https://www.oracle.com/database/spatial/)\n",
" * [Oracle Blockchain](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_blockchain_table.html#GUID-B469E277-978E-4378-A8C1-26D3FF96C9A6)\n",
" * [JSON](https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-in-oracle-database.html)\n",
"\n",
"\n",
"The guide demonstrates how to use Document Processing Capabilities within Oracle AI Vector Search to load and chunk documents using OracleDocLoader and OracleTextSplitter respectively."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you are just starting with Oracle Database, consider exploring the [free Oracle 23 AI](https://www.oracle.com/database/free/#resources) which provides a great introduction to setting up your database environment. While working with the database, it is often advisable to avoid using the system user by default; instead, you can create your own user for enhanced security and customization. For detailed steps on user creation, refer to our [end-to-end guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo.ipynb) which also shows how to set up a user in Oracle. Additionally, understanding user privileges is crucial for managing database security effectively. You can learn more about this topic in the official [Oracle guide](https://docs.oracle.com/en/database/oracle/oracle-database/19/admqs/administering-user-accounts-and-security.html#GUID-36B21D72-1BBB-46C9-A0C9-F0D2A8591B8D) on administering user accounts and security."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -33,7 +59,7 @@
"metadata": {},
"source": [
"### Connect to Oracle Database\n",
"The following sample code will show how to connect to Oracle Database. "
"The following sample code will show how to connect to Oracle Database. By default, python-oracledb runs in a Thin mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in Thick mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You might want to switch to thick-mode if you are unable to use thin-mode."
]
},
{
@ -114,11 +140,12 @@
"metadata": {},
"source": [
"### Load Documents\n",
"The users can load the documents from Oracle Database or a file system or both. They just need to set the loader parameters accordingly. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
"\n",
"The main benefit of using OracleDocLoader is that it can handle 150+ different file formats. You don't need to use different types of loader for different file formats. Here is the list of the formats that we support: [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html)\n",
"Users have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F).\n",
"\n",
"The following sample code will show how to do that:"
"A significant advantage of utilizing OracleDocLoader is its capability to process over 150 distinct file formats, eliminating the need for multiple loaders for different document types. For a complete list of the supported formats, please refer to the [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html).\n",
"\n",
"Below is a sample code snippet that demonstrates how to use OracleDocLoader"
]
},
{
@ -161,9 +188,9 @@
"metadata": {},
"source": [
"### Split Documents\n",
"The documents can be in different sizes: small, medium, large, or very large. The users like to split/chunk their documents into smaller pieces to generate embeddings. There are lots of different splitting customizations the users can do. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
"The documents may vary in size, ranging from small to very large. Users often prefer to chunk their documents into smaller sections to facilitate the generation of embeddings. A wide array of customization options is available for this splitting process. For comprehensive details regarding these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-4E145629-7098-4C7C-804F-FC85D1F24240).\n",
"\n",
"The following sample code will show how to do that:"
"Below is a sample code illustrating how to implement this:"
]
},
{

View File

@ -1,22 +1,24 @@
# OracleAI Vector Search
Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.
This is not only powerful but also significantly more effective because you dont need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.
In addition, because Oracle has been building database technologies for so long, your vectors can benefit from all of Oracle Database's most powerful features, like the following:
Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.
One of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.
This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.
* Partitioning Support
* Real Application Clusters scalability
* Exadata smart scans
* Shard processing across geographically distributed databases
* Transactions
* Parallel SQL
* Disaster recovery
* Security
* Oracle Machine Learning
* Oracle Graph Database
* Oracle Spatial and Graph
* Oracle Blockchain
* JSON
In addition, your vectors can benefit from all of Oracle Databases most powerful features, like the following:
* [Partitioning Support](https://www.oracle.com/database/technologies/partitioning.html)
* [Real Application Clusters scalability](https://www.oracle.com/database/real-application-clusters/)
* [Exadata smart scans](https://www.oracle.com/database/technologies/exadata/software/smartscan/)
* [Shard processing across geographically distributed databases](https://www.oracle.com/database/distributed-database/)
* [Transactions](https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/transactions.html)
* [Parallel SQL](https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-intro.html#GUID-D28717E4-0F77-44F5-BB4E-234C31D4E4BA)
* [Disaster recovery](https://www.oracle.com/database/data-guard/)
* [Security](https://www.oracle.com/security/database-security/)
* [Oracle Machine Learning](https://www.oracle.com/artificial-intelligence/database-machine-learning/)
* [Oracle Graph Database](https://www.oracle.com/database/integrated-graph-database/)
* [Oracle Spatial and Graph](https://www.oracle.com/database/spatial/)
* [Oracle Blockchain](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_blockchain_table.html#GUID-B469E277-978E-4378-A8C1-26D3FF96C9A6)
* [JSON](https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-in-oracle-database.html)
## Document Loaders
@ -61,5 +63,5 @@ from langchain_community.vectorstores.oraclevs import OracleVS
## End to End Demo
Please check the [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo).
Please check the [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo.ipynb).

View File

@ -5,18 +5,44 @@
"metadata": {},
"source": [
"# Oracle AI Vector Search: Generate Embeddings\n",
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.\n",
"One of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.\n",
"This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"\n",
"In addition, your vectors can benefit from all of Oracle Databases most powerful features, like the following:\n",
"\n",
" * [Partitioning Support](https://www.oracle.com/database/technologies/partitioning.html)\n",
" * [Real Application Clusters scalability](https://www.oracle.com/database/real-application-clusters/)\n",
" * [Exadata smart scans](https://www.oracle.com/database/technologies/exadata/software/smartscan/)\n",
" * [Shard processing across geographically distributed databases](https://www.oracle.com/database/distributed-database/)\n",
" * [Transactions](https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/transactions.html)\n",
" * [Parallel SQL](https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-intro.html#GUID-D28717E4-0F77-44F5-BB4E-234C31D4E4BA)\n",
" * [Disaster recovery](https://www.oracle.com/database/data-guard/)\n",
" * [Security](https://www.oracle.com/security/database-security/)\n",
" * [Oracle Machine Learning](https://www.oracle.com/artificial-intelligence/database-machine-learning/)\n",
" * [Oracle Graph Database](https://www.oracle.com/database/integrated-graph-database/)\n",
" * [Oracle Spatial and Graph](https://www.oracle.com/database/spatial/)\n",
" * [Oracle Blockchain](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_blockchain_table.html#GUID-B469E277-978E-4378-A8C1-26D3FF96C9A6)\n",
" * [JSON](https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-in-oracle-database.html)\n",
"\n",
"\n",
"The guide demonstrates how to use Embedding Capabilities within Oracle AI Vector Search to generate embeddings for your documents using OracleEmbeddings."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you are just starting with Oracle Database, consider exploring the [free Oracle 23 AI](https://www.oracle.com/database/free/#resources) which provides a great introduction to setting up your database environment. While working with the database, it is often advisable to avoid using the system user by default; instead, you can create your own user for enhanced security and customization. For detailed steps on user creation, refer to our [end-to-end guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo.ipynb) which also shows how to set up a user in Oracle. Additionally, understanding user privileges is crucial for managing database security effectively. You can learn more about this topic in the official [Oracle guide](https://docs.oracle.com/en/database/oracle/oracle-database/19/admqs/administering-user-accounts-and-security.html#GUID-36B21D72-1BBB-46C9-A0C9-F0D2A8591B8D) on administering user accounts and security."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisites\n",
"\n",
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
"Ensure you have the Oracle Python Client driver installed to facilitate the integration of Langchain with Oracle AI Vector Search."
]
},
{
@ -33,7 +59,7 @@
"metadata": {},
"source": [
"### Connect to Oracle Database\n",
"The following sample code will show how to connect to Oracle Database. "
"The following sample code will show how to connect to Oracle Database. By default, python-oracledb runs in a Thin mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in Thick mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You might want to switch to thick-mode if you are unable to use thin-mode."
]
},
{
@ -46,7 +72,7 @@
"\n",
"import oracledb\n",
"\n",
"# please update with your username, password, hostname and service_name\n",
"# Update the following variables with your Oracle database credentials and connection details\n",
"username = \"<username>\"\n",
"password = \"<password>\"\n",
"dsn = \"<hostname>/<service_name>\"\n",
@ -63,7 +89,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For embedding, we have a few provider options that the users can choose from such as database, 3rd party providers like ocigenai, huggingface, openai, etc. If the users choose to use 3rd party provider, they need to create a credential with corresponding authentication information. On the other hand, if the users choose to use 'database' as provider, they need to load an onnx model to Oracle Database for embeddings."
"For embedding generation, several provider options are available to users, including embedding generation within the database and third-party services such as OcigenAI, Hugging Face, and OpenAI. Users opting for third-party providers must establish credentials that include the requisite authentication information. Alternatively, if users select 'database' as their provider, they are required to load an ONNX model into the Oracle Database to facilitate embeddings."
]
},
{
@ -72,13 +98,13 @@
"source": [
"### Load ONNX Model\n",
"\n",
"To generate embeddings, Oracle provides a few provider options for users to choose from. The users can choose 'database' provider or some 3rd party providers like OCIGENAI, HuggingFace, etc.\n",
"Oracle accommodates a variety of embedding providers, enabling users to choose between proprietary database solutions and third-party services such as OCIGENAI and HuggingFace. This selection dictates the methodology for generating and managing embeddings.\n",
"\n",
"***Note*** If the users choose database option, they need to load an ONNX model to Oracle Database. The users do not need to load an ONNX model to Oracle Database if they choose to use 3rd party provider to generate embeddings.\n",
"***Important*** : Should users opt for the database option, they must upload an ONNX model into the Oracle Database. Conversely, if a third-party provider is selected for embedding generation, uploading an ONNX model to Oracle Database is not required.\n",
"\n",
"One of the core benefits of using an ONNX model is that the users do not need to transfer their data to 3rd party to generate embeddings. And also, since it does not involve any network or REST API calls, it may provide better performance.\n",
"A significant advantage of utilizing an ONNX model directly within Oracle is the enhanced security and performance it offers by eliminating the need to transmit data to external parties. Additionally, this method avoids the latency typically associated with network or REST API calls.\n",
"\n",
"Here is the sample code to load an ONNX model to Oracle Database:"
"Below is the example code to upload an ONNX model into Oracle Database:"
]
},
{
@ -89,7 +115,7 @@
"source": [
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
"\n",
"# please update with your related information\n",
"# Update the directory and file names for your ONNX model\n",
"# make sure that you have onnx file in the system\n",
"onnx_dir = \"DEMO_DIR\"\n",
"onnx_file = \"tinybert.onnx\"\n",
@ -109,11 +135,11 @@
"source": [
"### Create Credential\n",
"\n",
"On the other hand, if the users choose to use 3rd party provider to generate embeddings, they need to create credential to access 3rd party provider's end points.\n",
"When selecting third-party providers for generating embeddings, users are required to establish credentials to securely access the provider's endpoints.\n",
"\n",
"***Note:*** The users do not need to create any credential if they choose to use 'database' provider to generate embeddings. Should the users choose to 3rd party provider, they need to create credential for the 3rd party provider they want to use. \n",
"***Important:*** No credentials are necessary when opting for the 'database' provider to generate embeddings. However, should users decide to utilize a third-party provider, they must create credentials specific to the chosen provider.\n",
"\n",
"Here is a sample example:"
"Below is an illustrative example:"
]
},
{
@ -163,14 +189,22 @@
"metadata": {},
"source": [
"### Generate Embeddings\n",
"Oracle AI Vector Search provides a number of ways to generate embeddings. The users can load an ONNX embedding model to Oracle Database and use it to generate embeddings or use some 3rd party API's end points to generate embeddings. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters."
"\n",
"Oracle AI Vector Search provides multiple methods for generating embeddings, utilizing either locally hosted ONNX models or third-party APIs. For comprehensive instructions on configuring these alternatives, please refer to the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-C6439E94-4E86-4ECD-954E-4B73D53579DE)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Note:*** The users may need to set proxy if they want to use some 3rd party embedding generation providers other than 'database' provider (aka using ONNX model)."
"***Note:*** Currently, OracleEmbeddings processes each embedding generation request individually, without batching, by calling REST endpoints separately for each request. This method could potentially lead to exceeding the maximum request per minute quota set by some providers. However, we are actively working to enhance this process by implementing request batching, which will allow multiple embedding requests to be combined into fewer API calls, thereby optimizing our use of provider resources and adhering to their request limits. This update is expected to be rolled out soon, eliminating the current limitation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Note:*** Users may need to configure a proxy to utilize third-party embedding generation providers, excluding the 'database' provider that utilizes an ONNX model."
]
},
{
@ -221,7 +255,7 @@
"# using ONNX model loaded to Oracle Database\n",
"embedder_params = {\"provider\": \"database\", \"model\": \"demo_model\"}\n",
"\n",
"# Remove proxy if not required\n",
"# If a proxy is not required for your environment, you can omit the 'proxy' parameter below\n",
"embedder = OracleEmbeddings(conn=conn, params=embedder_params, proxy=proxy)\n",
"embed = embedder.embed_query(\"Hello World!\")\n",
"\n",

View File

@ -5,11 +5,37 @@
"metadata": {},
"source": [
"# Oracle AI Vector Search: Generate Summary\n",
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"\n",
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.\n",
"One of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.\n",
"This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"\n",
"In addition, your vectors can benefit from all of Oracle Databases most powerful features, like the following:\n",
"\n",
" * [Partitioning Support](https://www.oracle.com/database/technologies/partitioning.html)\n",
" * [Real Application Clusters scalability](https://www.oracle.com/database/real-application-clusters/)\n",
" * [Exadata smart scans](https://www.oracle.com/database/technologies/exadata/software/smartscan/)\n",
" * [Shard processing across geographically distributed databases](https://www.oracle.com/database/distributed-database/)\n",
" * [Transactions](https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/transactions.html)\n",
" * [Parallel SQL](https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-intro.html#GUID-D28717E4-0F77-44F5-BB4E-234C31D4E4BA)\n",
" * [Disaster recovery](https://www.oracle.com/database/data-guard/)\n",
" * [Security](https://www.oracle.com/security/database-security/)\n",
" * [Oracle Machine Learning](https://www.oracle.com/artificial-intelligence/database-machine-learning/)\n",
" * [Oracle Graph Database](https://www.oracle.com/database/integrated-graph-database/)\n",
" * [Oracle Spatial and Graph](https://www.oracle.com/database/spatial/)\n",
" * [Oracle Blockchain](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_blockchain_table.html#GUID-B469E277-978E-4378-A8C1-26D3FF96C9A6)\n",
" * [JSON](https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-in-oracle-database.html)\n",
"\n",
"The guide demonstrates how to use Summary Capabilities within Oracle AI Vector Search to generate summary for your documents using OracleSummary."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you are just starting with Oracle Database, consider exploring the [free Oracle 23 AI](https://www.oracle.com/database/free/#resources) which provides a great introduction to setting up your database environment. While working with the database, it is often advisable to avoid using the system user by default; instead, you can create your own user for enhanced security and customization. For detailed steps on user creation, refer to our [end-to-end guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo.ipynb) which also shows how to set up a user in Oracle. Additionally, understanding user privileges is crucial for managing database security effectively. You can learn more about this topic in the official [Oracle guide](https://docs.oracle.com/en/database/oracle/oracle-database/19/admqs/administering-user-accounts-and-security.html#GUID-36B21D72-1BBB-46C9-A0C9-F0D2A8591B8D) on administering user accounts and security."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -33,7 +59,7 @@
"metadata": {},
"source": [
"### Connect to Oracle Database\n",
"The following sample code will show how to connect to Oracle Database. "
"The following sample code will show how to connect to Oracle Database. By default, python-oracledb runs in a Thin mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in Thick mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You might want to switch to thick-mode if you are unable to use thin-mode."
]
},
{
@ -64,7 +90,7 @@
"metadata": {},
"source": [
"### Generate Summary\n",
"The Oracle AI Vector Search Langchain library provides APIs to generate summaries of documents. There are a few summary generation provider options including Database, OCIGENAI, HuggingFace and so on. The users can choose their preferred provider to generate a summary. They just need to set the summary parameters accordingly. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters."
"The Oracle AI Vector Search Langchain library offers a suite of APIs designed for document summarization. It supports multiple summarization providers such as Database, OCIGENAI, HuggingFace, among others, allowing users to select the provider that best meets their needs. To utilize these capabilities, users must configure the summary parameters as specified. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-EC9DDB58-6A15-4B36-BA66-ECBA20D2CE57)."
]
},
{

View File

@ -8,24 +8,32 @@
"# Oracle AI Vector Search: Vector Store\n",
"\n",
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.\n",
"One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.\n",
"This is not only powerful but also significantly more effective because you dont need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"One of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.\n",
"This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"\n",
"In addition, because Oracle has been building database technologies for so long, your vectors can benefit from all of Oracle Database's most powerful features, like the following:\n",
"In addition, your vectors can benefit from all of Oracle Databases most powerful features, like the following:\n",
"\n",
" * Partitioning Support\n",
" * Real Application Clusters scalability\n",
" * Exadata smart scans\n",
" * Shard processing across geographically distributed databases\n",
" * Transactions\n",
" * Parallel SQL\n",
" * Disaster recovery\n",
" * Security\n",
" * Oracle Machine Learning\n",
" * Oracle Graph Database\n",
" * Oracle Spatial and Graph\n",
" * Oracle Blockchain\n",
" * JSON"
" * [Partitioning Support](https://www.oracle.com/database/technologies/partitioning.html)\n",
" * [Real Application Clusters scalability](https://www.oracle.com/database/real-application-clusters/)\n",
" * [Exadata smart scans](https://www.oracle.com/database/technologies/exadata/software/smartscan/)\n",
" * [Shard processing across geographically distributed databases](https://www.oracle.com/database/distributed-database/)\n",
" * [Transactions](https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/transactions.html)\n",
" * [Parallel SQL](https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-intro.html#GUID-D28717E4-0F77-44F5-BB4E-234C31D4E4BA)\n",
" * [Disaster recovery](https://www.oracle.com/database/data-guard/)\n",
" * [Security](https://www.oracle.com/security/database-security/)\n",
" * [Oracle Machine Learning](https://www.oracle.com/artificial-intelligence/database-machine-learning/)\n",
" * [Oracle Graph Database](https://www.oracle.com/database/integrated-graph-database/)\n",
" * [Oracle Spatial and Graph](https://www.oracle.com/database/spatial/)\n",
" * [Oracle Blockchain](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_blockchain_table.html#GUID-B469E277-978E-4378-A8C1-26D3FF96C9A6)\n",
" * [JSON](https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-in-oracle-database.html)"
]
},
{
"cell_type": "markdown",
"id": "2b9645ac-8d05-4cd7-9b76-bacb6cf4fcf1",
"metadata": {},
"source": [
"If you are just starting with Oracle Database, consider exploring the [free Oracle 23 AI](https://www.oracle.com/database/free/#resources) which provides a great introduction to setting up your database environment. While working with the database, it is often advisable to avoid using the system user by default; instead, you can create your own user for enhanced security and customization. For detailed steps on user creation, refer to our [end-to-end guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo.ipynb) which also shows how to set up a user in Oracle. Additionally, understanding user privileges is crucial for managing database security effectively. You can learn more about this topic in the official [Oracle guide](https://docs.oracle.com/en/database/oracle/oracle-database/19/admqs/administering-user-accounts-and-security.html#GUID-36B21D72-1BBB-46C9-A0C9-F0D2A8591B8D) on administering user accounts and security."
]
},
{
@ -53,7 +61,9 @@
"id": "0fceaa5a-95da-4ebd-8b8d-5e73bb653172",
"metadata": {},
"source": [
"### Connect to Oracle AI Vector Search"
"### Connect to Oracle AI Vector Search\n",
"\n",
"The following sample code will show how to connect to Oracle Database. By default, python-oracledb runs in a Thin mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in Thick mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You might want to switch to thick-mode if you are unable to use thin-mode."
]
},
{
@ -81,7 +91,7 @@
"id": "b11cf362-01b0-485d-8527-31b0fbb5028e",
"metadata": {},
"source": [
"### Import the required dependencies to play with Oracle AI Vector Search"
"### Import the required dependencies to use Oracle AI Vector Search"
]
},
{
@ -113,7 +123,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Define a list of documents (These dummy examples are 5 random documents from Oracle Concepts Manual )\n",
"# Define a list of documents (The examples below are 5 random documents from Oracle Concepts Manual )\n",
"\n",
"documents_json_list = [\n",
" {\n",
@ -161,11 +171,11 @@
"id": "6823f5e6-997c-4f15-927b-bd44c61f105f",
"metadata": {},
"source": [
"### Using AI Vector Search Create a bunch of Vector Stores with different distance strategies\n",
"### Create Vector Stores with different distance metrics using AI Vector Search\n",
"\n",
"First we will create three vector stores each with different distance functions. Since we have not created indices in them yet, they will just create tables for now. Later we will use these vector stores to create HNSW indicies.\n",
"First we will create three vector stores each with different distance functions. Since we have not created indices in them yet, they will just create tables for now. Later we will use these vector stores to create HNSW indicies. To understand more about the different types of indices Oracle AI Vector Search supports, refer to the following [guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/manage-different-categories-vector-indexes.html) .\n",
"\n",
"You can manually connect to the Oracle Database and will see three tables \n",
"You can manually connect to the Oracle Database and will see three tables : \n",
"Documents_DOT, Documents_COSINE and Documents_EUCLIDEAN. \n",
"\n",
"We will then create three additional tables Documents_DOT_IVF, Documents_COSINE_IVF and Documents_EUCLIDEAN_IVF which will be used\n",
@ -181,6 +191,10 @@
"source": [
"# Ingest documents into Oracle Vector Store using different distance strategies\n",
"\n",
"# When using our API calls, start by initializing your vector store with a subset of your documents\n",
"# through from_documents(), then incrementally add more documents using add_texts().\n",
"# This approach prevents system overload and ensures efficient document processing.\n",
"\n",
"model = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-mpnet-base-v2\")\n",
"\n",
"vector_store_dot = OracleVS.from_documents(\n",
@ -234,7 +248,7 @@
"id": "77c29505-8688-4b87-9a99-e648fbb2d425",
"metadata": {},
"source": [
"### Demonstrating add, delete operations for texts, and basic similarity search\n"
"### Demonstrating add and delete operations for texts, along with basic similarity search\n"
]
},
{
@ -384,7 +398,7 @@
"id": "7223d048-5c0b-4e91-a91b-a7daa9f86758",
"metadata": {},
"source": [
"### Now we will conduct a bunch of advanced searches on all six vector stores. Each of these three searches have a with and without filter version. The filter only selects the document with id 101 out and filters out everything else"
"### Demonstrate advanced searches on all six vector stores, with and without attribute filtering with filtering, we only select the document id 101 and nothing else"
]
},
{