mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-06 05:25:04 +00:00
docs: updates for OracleDB (#21745)
Thank you for contributing to LangChain! Documentation change for OracleDB Fixed several things in Oracle Documentation.
This commit is contained in:
@@ -5,11 +5,37 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Oracle AI Vector Search: Document Processing\n",
|
||||
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
|
||||
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.\n",
|
||||
"One of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.\n",
|
||||
"This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
|
||||
"\n",
|
||||
"In addition, your vectors can benefit from all of Oracle Database’s most powerful features, like the following:\n",
|
||||
"\n",
|
||||
" * [Partitioning Support](https://www.oracle.com/database/technologies/partitioning.html)\n",
|
||||
" * [Real Application Clusters scalability](https://www.oracle.com/database/real-application-clusters/)\n",
|
||||
" * [Exadata smart scans](https://www.oracle.com/database/technologies/exadata/software/smartscan/)\n",
|
||||
" * [Shard processing across geographically distributed databases](https://www.oracle.com/database/distributed-database/)\n",
|
||||
" * [Transactions](https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/transactions.html)\n",
|
||||
" * [Parallel SQL](https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-intro.html#GUID-D28717E4-0F77-44F5-BB4E-234C31D4E4BA)\n",
|
||||
" * [Disaster recovery](https://www.oracle.com/database/data-guard/)\n",
|
||||
" * [Security](https://www.oracle.com/security/database-security/)\n",
|
||||
" * [Oracle Machine Learning](https://www.oracle.com/artificial-intelligence/database-machine-learning/)\n",
|
||||
" * [Oracle Graph Database](https://www.oracle.com/database/integrated-graph-database/)\n",
|
||||
" * [Oracle Spatial and Graph](https://www.oracle.com/database/spatial/)\n",
|
||||
" * [Oracle Blockchain](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_blockchain_table.html#GUID-B469E277-978E-4378-A8C1-26D3FF96C9A6)\n",
|
||||
" * [JSON](https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-in-oracle-database.html)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"The guide demonstrates how to use Document Processing Capabilities within Oracle AI Vector Search to load and chunk documents using OracleDocLoader and OracleTextSplitter respectively."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you are just starting with Oracle Database, consider exploring the [free Oracle 23 AI](https://www.oracle.com/database/free/#resources) which provides a great introduction to setting up your database environment. While working with the database, it is often advisable to avoid using the system user by default; instead, you can create your own user for enhanced security and customization. For detailed steps on user creation, refer to our [end-to-end guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo.ipynb) which also shows how to set up a user in Oracle. Additionally, understanding user privileges is crucial for managing database security effectively. You can learn more about this topic in the official [Oracle guide](https://docs.oracle.com/en/database/oracle/oracle-database/19/admqs/administering-user-accounts-and-security.html#GUID-36B21D72-1BBB-46C9-A0C9-F0D2A8591B8D) on administering user accounts and security."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -33,7 +59,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Connect to Oracle Database\n",
|
||||
"The following sample code will show how to connect to Oracle Database. "
|
||||
"The following sample code will show how to connect to Oracle Database. By default, python-oracledb runs in a ‘Thin’ mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in ‘Thick’ mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You might want to switch to thick-mode if you are unable to use thin-mode."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -114,11 +140,12 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load Documents\n",
|
||||
"The users can load the documents from Oracle Database or a file system or both. They just need to set the loader parameters accordingly. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
|
||||
"\n",
|
||||
"The main benefit of using OracleDocLoader is that it can handle 150+ different file formats. You don't need to use different types of loader for different file formats. Here is the list of the formats that we support: [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html)\n",
|
||||
"Users have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F).\n",
|
||||
"\n",
|
||||
"The following sample code will show how to do that:"
|
||||
"A significant advantage of utilizing OracleDocLoader is its capability to process over 150 distinct file formats, eliminating the need for multiple loaders for different document types. For a complete list of the supported formats, please refer to the [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html).\n",
|
||||
"\n",
|
||||
"Below is a sample code snippet that demonstrates how to use OracleDocLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -161,9 +188,9 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Split Documents\n",
|
||||
"The documents can be in different sizes: small, medium, large, or very large. The users like to split/chunk their documents into smaller pieces to generate embeddings. There are lots of different splitting customizations the users can do. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
|
||||
"The documents may vary in size, ranging from small to very large. Users often prefer to chunk their documents into smaller sections to facilitate the generation of embeddings. A wide array of customization options is available for this splitting process. For comprehensive details regarding these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-4E145629-7098-4C7C-804F-FC85D1F24240).\n",
|
||||
"\n",
|
||||
"The following sample code will show how to do that:"
|
||||
"Below is a sample code illustrating how to implement this:"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@@ -1,22 +1,24 @@
|
||||
# OracleAI Vector Search
|
||||
Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.
|
||||
This is not only powerful but also significantly more effective because you dont need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.
|
||||
|
||||
In addition, because Oracle has been building database technologies for so long, your vectors can benefit from all of Oracle Database's most powerful features, like the following:
|
||||
Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.
|
||||
One of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.
|
||||
This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.
|
||||
|
||||
* Partitioning Support
|
||||
* Real Application Clusters scalability
|
||||
* Exadata smart scans
|
||||
* Shard processing across geographically distributed databases
|
||||
* Transactions
|
||||
* Parallel SQL
|
||||
* Disaster recovery
|
||||
* Security
|
||||
* Oracle Machine Learning
|
||||
* Oracle Graph Database
|
||||
* Oracle Spatial and Graph
|
||||
* Oracle Blockchain
|
||||
* JSON
|
||||
In addition, your vectors can benefit from all of Oracle Database’s most powerful features, like the following:
|
||||
|
||||
* [Partitioning Support](https://www.oracle.com/database/technologies/partitioning.html)
|
||||
* [Real Application Clusters scalability](https://www.oracle.com/database/real-application-clusters/)
|
||||
* [Exadata smart scans](https://www.oracle.com/database/technologies/exadata/software/smartscan/)
|
||||
* [Shard processing across geographically distributed databases](https://www.oracle.com/database/distributed-database/)
|
||||
* [Transactions](https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/transactions.html)
|
||||
* [Parallel SQL](https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-intro.html#GUID-D28717E4-0F77-44F5-BB4E-234C31D4E4BA)
|
||||
* [Disaster recovery](https://www.oracle.com/database/data-guard/)
|
||||
* [Security](https://www.oracle.com/security/database-security/)
|
||||
* [Oracle Machine Learning](https://www.oracle.com/artificial-intelligence/database-machine-learning/)
|
||||
* [Oracle Graph Database](https://www.oracle.com/database/integrated-graph-database/)
|
||||
* [Oracle Spatial and Graph](https://www.oracle.com/database/spatial/)
|
||||
* [Oracle Blockchain](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_blockchain_table.html#GUID-B469E277-978E-4378-A8C1-26D3FF96C9A6)
|
||||
* [JSON](https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-in-oracle-database.html)
|
||||
|
||||
|
||||
## Document Loaders
|
||||
@@ -61,5 +63,5 @@ from langchain_community.vectorstores.oraclevs import OracleVS
|
||||
|
||||
## End to End Demo
|
||||
|
||||
Please check the [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo).
|
||||
Please check the [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo.ipynb).
|
||||
|
||||
|
@@ -5,18 +5,44 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Oracle AI Vector Search: Generate Embeddings\n",
|
||||
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
|
||||
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.\n",
|
||||
"One of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.\n",
|
||||
"This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
|
||||
"\n",
|
||||
"In addition, your vectors can benefit from all of Oracle Database’s most powerful features, like the following:\n",
|
||||
"\n",
|
||||
" * [Partitioning Support](https://www.oracle.com/database/technologies/partitioning.html)\n",
|
||||
" * [Real Application Clusters scalability](https://www.oracle.com/database/real-application-clusters/)\n",
|
||||
" * [Exadata smart scans](https://www.oracle.com/database/technologies/exadata/software/smartscan/)\n",
|
||||
" * [Shard processing across geographically distributed databases](https://www.oracle.com/database/distributed-database/)\n",
|
||||
" * [Transactions](https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/transactions.html)\n",
|
||||
" * [Parallel SQL](https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-intro.html#GUID-D28717E4-0F77-44F5-BB4E-234C31D4E4BA)\n",
|
||||
" * [Disaster recovery](https://www.oracle.com/database/data-guard/)\n",
|
||||
" * [Security](https://www.oracle.com/security/database-security/)\n",
|
||||
" * [Oracle Machine Learning](https://www.oracle.com/artificial-intelligence/database-machine-learning/)\n",
|
||||
" * [Oracle Graph Database](https://www.oracle.com/database/integrated-graph-database/)\n",
|
||||
" * [Oracle Spatial and Graph](https://www.oracle.com/database/spatial/)\n",
|
||||
" * [Oracle Blockchain](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_blockchain_table.html#GUID-B469E277-978E-4378-A8C1-26D3FF96C9A6)\n",
|
||||
" * [JSON](https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-in-oracle-database.html)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"The guide demonstrates how to use Embedding Capabilities within Oracle AI Vector Search to generate embeddings for your documents using OracleEmbeddings."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you are just starting with Oracle Database, consider exploring the [free Oracle 23 AI](https://www.oracle.com/database/free/#resources) which provides a great introduction to setting up your database environment. While working with the database, it is often advisable to avoid using the system user by default; instead, you can create your own user for enhanced security and customization. For detailed steps on user creation, refer to our [end-to-end guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo.ipynb) which also shows how to set up a user in Oracle. Additionally, understanding user privileges is crucial for managing database security effectively. You can learn more about this topic in the official [Oracle guide](https://docs.oracle.com/en/database/oracle/oracle-database/19/admqs/administering-user-accounts-and-security.html#GUID-36B21D72-1BBB-46C9-A0C9-F0D2A8591B8D) on administering user accounts and security."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Prerequisites\n",
|
||||
"\n",
|
||||
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
|
||||
"Ensure you have the Oracle Python Client driver installed to facilitate the integration of Langchain with Oracle AI Vector Search."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -33,7 +59,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Connect to Oracle Database\n",
|
||||
"The following sample code will show how to connect to Oracle Database. "
|
||||
"The following sample code will show how to connect to Oracle Database. By default, python-oracledb runs in a ‘Thin’ mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in ‘Thick’ mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You might want to switch to thick-mode if you are unable to use thin-mode."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -46,7 +72,7 @@
|
||||
"\n",
|
||||
"import oracledb\n",
|
||||
"\n",
|
||||
"# please update with your username, password, hostname and service_name\n",
|
||||
"# Update the following variables with your Oracle database credentials and connection details\n",
|
||||
"username = \"<username>\"\n",
|
||||
"password = \"<password>\"\n",
|
||||
"dsn = \"<hostname>/<service_name>\"\n",
|
||||
@@ -63,7 +89,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For embedding, we have a few provider options that the users can choose from such as database, 3rd party providers like ocigenai, huggingface, openai, etc. If the users choose to use 3rd party provider, they need to create a credential with corresponding authentication information. On the other hand, if the users choose to use 'database' as provider, they need to load an onnx model to Oracle Database for embeddings."
|
||||
"For embedding generation, several provider options are available to users, including embedding generation within the database and third-party services such as OcigenAI, Hugging Face, and OpenAI. Users opting for third-party providers must establish credentials that include the requisite authentication information. Alternatively, if users select 'database' as their provider, they are required to load an ONNX model into the Oracle Database to facilitate embeddings."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -72,13 +98,13 @@
|
||||
"source": [
|
||||
"### Load ONNX Model\n",
|
||||
"\n",
|
||||
"To generate embeddings, Oracle provides a few provider options for users to choose from. The users can choose 'database' provider or some 3rd party providers like OCIGENAI, HuggingFace, etc.\n",
|
||||
"Oracle accommodates a variety of embedding providers, enabling users to choose between proprietary database solutions and third-party services such as OCIGENAI and HuggingFace. This selection dictates the methodology for generating and managing embeddings.\n",
|
||||
"\n",
|
||||
"***Note*** If the users choose database option, they need to load an ONNX model to Oracle Database. The users do not need to load an ONNX model to Oracle Database if they choose to use 3rd party provider to generate embeddings.\n",
|
||||
"***Important*** : Should users opt for the database option, they must upload an ONNX model into the Oracle Database. Conversely, if a third-party provider is selected for embedding generation, uploading an ONNX model to Oracle Database is not required.\n",
|
||||
"\n",
|
||||
"One of the core benefits of using an ONNX model is that the users do not need to transfer their data to 3rd party to generate embeddings. And also, since it does not involve any network or REST API calls, it may provide better performance.\n",
|
||||
"A significant advantage of utilizing an ONNX model directly within Oracle is the enhanced security and performance it offers by eliminating the need to transmit data to external parties. Additionally, this method avoids the latency typically associated with network or REST API calls.\n",
|
||||
"\n",
|
||||
"Here is the sample code to load an ONNX model to Oracle Database:"
|
||||
"Below is the example code to upload an ONNX model into Oracle Database:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -89,7 +115,7 @@
|
||||
"source": [
|
||||
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
|
||||
"\n",
|
||||
"# please update with your related information\n",
|
||||
"# Update the directory and file names for your ONNX model\n",
|
||||
"# make sure that you have onnx file in the system\n",
|
||||
"onnx_dir = \"DEMO_DIR\"\n",
|
||||
"onnx_file = \"tinybert.onnx\"\n",
|
||||
@@ -109,11 +135,11 @@
|
||||
"source": [
|
||||
"### Create Credential\n",
|
||||
"\n",
|
||||
"On the other hand, if the users choose to use 3rd party provider to generate embeddings, they need to create credential to access 3rd party provider's end points.\n",
|
||||
"When selecting third-party providers for generating embeddings, users are required to establish credentials to securely access the provider's endpoints.\n",
|
||||
"\n",
|
||||
"***Note:*** The users do not need to create any credential if they choose to use 'database' provider to generate embeddings. Should the users choose to 3rd party provider, they need to create credential for the 3rd party provider they want to use. \n",
|
||||
"***Important:*** No credentials are necessary when opting for the 'database' provider to generate embeddings. However, should users decide to utilize a third-party provider, they must create credentials specific to the chosen provider.\n",
|
||||
"\n",
|
||||
"Here is a sample example:"
|
||||
"Below is an illustrative example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -163,14 +189,22 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate Embeddings\n",
|
||||
"Oracle AI Vector Search provides a number of ways to generate embeddings. The users can load an ONNX embedding model to Oracle Database and use it to generate embeddings or use some 3rd party API's end points to generate embeddings. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters."
|
||||
"\n",
|
||||
"Oracle AI Vector Search provides multiple methods for generating embeddings, utilizing either locally hosted ONNX models or third-party APIs. For comprehensive instructions on configuring these alternatives, please refer to the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-C6439E94-4E86-4ECD-954E-4B73D53579DE)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"***Note:*** The users may need to set proxy if they want to use some 3rd party embedding generation providers other than 'database' provider (aka using ONNX model)."
|
||||
"***Note:*** Currently, OracleEmbeddings processes each embedding generation request individually, without batching, by calling REST endpoints separately for each request. This method could potentially lead to exceeding the maximum request per minute quota set by some providers. However, we are actively working to enhance this process by implementing request batching, which will allow multiple embedding requests to be combined into fewer API calls, thereby optimizing our use of provider resources and adhering to their request limits. This update is expected to be rolled out soon, eliminating the current limitation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"***Note:*** Users may need to configure a proxy to utilize third-party embedding generation providers, excluding the 'database' provider that utilizes an ONNX model."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -221,7 +255,7 @@
|
||||
"# using ONNX model loaded to Oracle Database\n",
|
||||
"embedder_params = {\"provider\": \"database\", \"model\": \"demo_model\"}\n",
|
||||
"\n",
|
||||
"# Remove proxy if not required\n",
|
||||
"# If a proxy is not required for your environment, you can omit the 'proxy' parameter below\n",
|
||||
"embedder = OracleEmbeddings(conn=conn, params=embedder_params, proxy=proxy)\n",
|
||||
"embed = embedder.embed_query(\"Hello World!\")\n",
|
||||
"\n",
|
||||
|
@@ -5,11 +5,37 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Oracle AI Vector Search: Generate Summary\n",
|
||||
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
|
||||
"\n",
|
||||
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.\n",
|
||||
"One of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.\n",
|
||||
"This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
|
||||
"\n",
|
||||
"In addition, your vectors can benefit from all of Oracle Database’s most powerful features, like the following:\n",
|
||||
"\n",
|
||||
" * [Partitioning Support](https://www.oracle.com/database/technologies/partitioning.html)\n",
|
||||
" * [Real Application Clusters scalability](https://www.oracle.com/database/real-application-clusters/)\n",
|
||||
" * [Exadata smart scans](https://www.oracle.com/database/technologies/exadata/software/smartscan/)\n",
|
||||
" * [Shard processing across geographically distributed databases](https://www.oracle.com/database/distributed-database/)\n",
|
||||
" * [Transactions](https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/transactions.html)\n",
|
||||
" * [Parallel SQL](https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-intro.html#GUID-D28717E4-0F77-44F5-BB4E-234C31D4E4BA)\n",
|
||||
" * [Disaster recovery](https://www.oracle.com/database/data-guard/)\n",
|
||||
" * [Security](https://www.oracle.com/security/database-security/)\n",
|
||||
" * [Oracle Machine Learning](https://www.oracle.com/artificial-intelligence/database-machine-learning/)\n",
|
||||
" * [Oracle Graph Database](https://www.oracle.com/database/integrated-graph-database/)\n",
|
||||
" * [Oracle Spatial and Graph](https://www.oracle.com/database/spatial/)\n",
|
||||
" * [Oracle Blockchain](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_blockchain_table.html#GUID-B469E277-978E-4378-A8C1-26D3FF96C9A6)\n",
|
||||
" * [JSON](https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-in-oracle-database.html)\n",
|
||||
"\n",
|
||||
"The guide demonstrates how to use Summary Capabilities within Oracle AI Vector Search to generate summary for your documents using OracleSummary."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you are just starting with Oracle Database, consider exploring the [free Oracle 23 AI](https://www.oracle.com/database/free/#resources) which provides a great introduction to setting up your database environment. While working with the database, it is often advisable to avoid using the system user by default; instead, you can create your own user for enhanced security and customization. For detailed steps on user creation, refer to our [end-to-end guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo.ipynb) which also shows how to set up a user in Oracle. Additionally, understanding user privileges is crucial for managing database security effectively. You can learn more about this topic in the official [Oracle guide](https://docs.oracle.com/en/database/oracle/oracle-database/19/admqs/administering-user-accounts-and-security.html#GUID-36B21D72-1BBB-46C9-A0C9-F0D2A8591B8D) on administering user accounts and security."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -33,7 +59,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Connect to Oracle Database\n",
|
||||
"The following sample code will show how to connect to Oracle Database. "
|
||||
"The following sample code will show how to connect to Oracle Database. By default, python-oracledb runs in a ‘Thin’ mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in ‘Thick’ mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You might want to switch to thick-mode if you are unable to use thin-mode."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -64,7 +90,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate Summary\n",
|
||||
"The Oracle AI Vector Search Langchain library provides APIs to generate summaries of documents. There are a few summary generation provider options including Database, OCIGENAI, HuggingFace and so on. The users can choose their preferred provider to generate a summary. They just need to set the summary parameters accordingly. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters."
|
||||
"The Oracle AI Vector Search Langchain library offers a suite of APIs designed for document summarization. It supports multiple summarization providers such as Database, OCIGENAI, HuggingFace, among others, allowing users to select the provider that best meets their needs. To utilize these capabilities, users must configure the summary parameters as specified. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-EC9DDB58-6A15-4B36-BA66-ECBA20D2CE57)."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@@ -8,24 +8,32 @@
|
||||
"# Oracle AI Vector Search: Vector Store\n",
|
||||
"\n",
|
||||
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.\n",
|
||||
"One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.\n",
|
||||
"This is not only powerful but also significantly more effective because you dont need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
|
||||
"One of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.\n",
|
||||
"This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
|
||||
"\n",
|
||||
"In addition, because Oracle has been building database technologies for so long, your vectors can benefit from all of Oracle Database's most powerful features, like the following:\n",
|
||||
"In addition, your vectors can benefit from all of Oracle Database’s most powerful features, like the following:\n",
|
||||
"\n",
|
||||
" * Partitioning Support\n",
|
||||
" * Real Application Clusters scalability\n",
|
||||
" * Exadata smart scans\n",
|
||||
" * Shard processing across geographically distributed databases\n",
|
||||
" * Transactions\n",
|
||||
" * Parallel SQL\n",
|
||||
" * Disaster recovery\n",
|
||||
" * Security\n",
|
||||
" * Oracle Machine Learning\n",
|
||||
" * Oracle Graph Database\n",
|
||||
" * Oracle Spatial and Graph\n",
|
||||
" * Oracle Blockchain\n",
|
||||
" * JSON"
|
||||
" * [Partitioning Support](https://www.oracle.com/database/technologies/partitioning.html)\n",
|
||||
" * [Real Application Clusters scalability](https://www.oracle.com/database/real-application-clusters/)\n",
|
||||
" * [Exadata smart scans](https://www.oracle.com/database/technologies/exadata/software/smartscan/)\n",
|
||||
" * [Shard processing across geographically distributed databases](https://www.oracle.com/database/distributed-database/)\n",
|
||||
" * [Transactions](https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/transactions.html)\n",
|
||||
" * [Parallel SQL](https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-intro.html#GUID-D28717E4-0F77-44F5-BB4E-234C31D4E4BA)\n",
|
||||
" * [Disaster recovery](https://www.oracle.com/database/data-guard/)\n",
|
||||
" * [Security](https://www.oracle.com/security/database-security/)\n",
|
||||
" * [Oracle Machine Learning](https://www.oracle.com/artificial-intelligence/database-machine-learning/)\n",
|
||||
" * [Oracle Graph Database](https://www.oracle.com/database/integrated-graph-database/)\n",
|
||||
" * [Oracle Spatial and Graph](https://www.oracle.com/database/spatial/)\n",
|
||||
" * [Oracle Blockchain](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_blockchain_table.html#GUID-B469E277-978E-4378-A8C1-26D3FF96C9A6)\n",
|
||||
" * [JSON](https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-in-oracle-database.html)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2b9645ac-8d05-4cd7-9b76-bacb6cf4fcf1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you are just starting with Oracle Database, consider exploring the [free Oracle 23 AI](https://www.oracle.com/database/free/#resources) which provides a great introduction to setting up your database environment. While working with the database, it is often advisable to avoid using the system user by default; instead, you can create your own user for enhanced security and customization. For detailed steps on user creation, refer to our [end-to-end guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo.ipynb) which also shows how to set up a user in Oracle. Additionally, understanding user privileges is crucial for managing database security effectively. You can learn more about this topic in the official [Oracle guide](https://docs.oracle.com/en/database/oracle/oracle-database/19/admqs/administering-user-accounts-and-security.html#GUID-36B21D72-1BBB-46C9-A0C9-F0D2A8591B8D) on administering user accounts and security."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -53,7 +61,9 @@
|
||||
"id": "0fceaa5a-95da-4ebd-8b8d-5e73bb653172",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Connect to Oracle AI Vector Search"
|
||||
"### Connect to Oracle AI Vector Search\n",
|
||||
"\n",
|
||||
"The following sample code will show how to connect to Oracle Database. By default, python-oracledb runs in a ‘Thin’ mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in ‘Thick’ mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You might want to switch to thick-mode if you are unable to use thin-mode."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -81,7 +91,7 @@
|
||||
"id": "b11cf362-01b0-485d-8527-31b0fbb5028e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Import the required dependencies to play with Oracle AI Vector Search"
|
||||
"### Import the required dependencies to use Oracle AI Vector Search"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -113,7 +123,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Define a list of documents (These dummy examples are 5 random documents from Oracle Concepts Manual )\n",
|
||||
"# Define a list of documents (The examples below are 5 random documents from Oracle Concepts Manual )\n",
|
||||
"\n",
|
||||
"documents_json_list = [\n",
|
||||
" {\n",
|
||||
@@ -161,11 +171,11 @@
|
||||
"id": "6823f5e6-997c-4f15-927b-bd44c61f105f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Using AI Vector Search Create a bunch of Vector Stores with different distance strategies\n",
|
||||
"### Create Vector Stores with different distance metrics using AI Vector Search\n",
|
||||
"\n",
|
||||
"First we will create three vector stores each with different distance functions. Since we have not created indices in them yet, they will just create tables for now. Later we will use these vector stores to create HNSW indicies.\n",
|
||||
"First we will create three vector stores each with different distance functions. Since we have not created indices in them yet, they will just create tables for now. Later we will use these vector stores to create HNSW indicies. To understand more about the different types of indices Oracle AI Vector Search supports, refer to the following [guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/manage-different-categories-vector-indexes.html) .\n",
|
||||
"\n",
|
||||
"You can manually connect to the Oracle Database and will see three tables \n",
|
||||
"You can manually connect to the Oracle Database and will see three tables : \n",
|
||||
"Documents_DOT, Documents_COSINE and Documents_EUCLIDEAN. \n",
|
||||
"\n",
|
||||
"We will then create three additional tables Documents_DOT_IVF, Documents_COSINE_IVF and Documents_EUCLIDEAN_IVF which will be used\n",
|
||||
@@ -181,6 +191,10 @@
|
||||
"source": [
|
||||
"# Ingest documents into Oracle Vector Store using different distance strategies\n",
|
||||
"\n",
|
||||
"# When using our API calls, start by initializing your vector store with a subset of your documents\n",
|
||||
"# through from_documents(), then incrementally add more documents using add_texts().\n",
|
||||
"# This approach prevents system overload and ensures efficient document processing.\n",
|
||||
"\n",
|
||||
"model = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-mpnet-base-v2\")\n",
|
||||
"\n",
|
||||
"vector_store_dot = OracleVS.from_documents(\n",
|
||||
@@ -234,7 +248,7 @@
|
||||
"id": "77c29505-8688-4b87-9a99-e648fbb2d425",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Demonstrating add, delete operations for texts, and basic similarity search\n"
|
||||
"### Demonstrating add and delete operations for texts, along with basic similarity search\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -384,7 +398,7 @@
|
||||
"id": "7223d048-5c0b-4e91-a91b-a7daa9f86758",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Now we will conduct a bunch of advanced searches on all six vector stores. Each of these three searches have a with and without filter version. The filter only selects the document with id 101 out and filters out everything else"
|
||||
"### Demonstrate advanced searches on all six vector stores, with and without attribute filtering – with filtering, we only select the document id 101 and nothing else"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
Reference in New Issue
Block a user