mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-12 10:37:32 +00:00
Thank you for contributing to LangChain! - [ ] **PR title**: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" "community/embeddings: update oracleai.py" - [ ] **PR message**: ***Delete this entire checklist*** and replace with - **Description:** a description of the change - **Issue:** the issue # it fixes, if applicable - **Dependencies:** any dependencies required for this change - **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out! Adding oracle VECTOR_ARRAY_T support. - [ ] **Add tests and docs**: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. Tests are not impacted. - [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Done. Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
290 lines
13 KiB
Plaintext
290 lines
13 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Oracle AI Vector Search: Generate Embeddings\n",
|
||
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.\n",
|
||
"One of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.\n",
|
||
"This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
|
||
"\n",
|
||
"In addition, your vectors can benefit from all of Oracle Database’s most powerful features, like the following:\n",
|
||
"\n",
|
||
" * [Partitioning Support](https://www.oracle.com/database/technologies/partitioning.html)\n",
|
||
" * [Real Application Clusters scalability](https://www.oracle.com/database/real-application-clusters/)\n",
|
||
" * [Exadata smart scans](https://www.oracle.com/database/technologies/exadata/software/smartscan/)\n",
|
||
" * [Shard processing across geographically distributed databases](https://www.oracle.com/database/distributed-database/)\n",
|
||
" * [Transactions](https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/transactions.html)\n",
|
||
" * [Parallel SQL](https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-intro.html#GUID-D28717E4-0F77-44F5-BB4E-234C31D4E4BA)\n",
|
||
" * [Disaster recovery](https://www.oracle.com/database/data-guard/)\n",
|
||
" * [Security](https://www.oracle.com/security/database-security/)\n",
|
||
" * [Oracle Machine Learning](https://www.oracle.com/artificial-intelligence/database-machine-learning/)\n",
|
||
" * [Oracle Graph Database](https://www.oracle.com/database/integrated-graph-database/)\n",
|
||
" * [Oracle Spatial and Graph](https://www.oracle.com/database/spatial/)\n",
|
||
" * [Oracle Blockchain](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_blockchain_table.html#GUID-B469E277-978E-4378-A8C1-26D3FF96C9A6)\n",
|
||
" * [JSON](https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-in-oracle-database.html)\n",
|
||
"\n",
|
||
"\n",
|
||
"The guide demonstrates how to use Embedding Capabilities within Oracle AI Vector Search to generate embeddings for your documents using OracleEmbeddings."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"If you are just starting with Oracle Database, consider exploring the [free Oracle 23 AI](https://www.oracle.com/database/free/#resources) which provides a great introduction to setting up your database environment. While working with the database, it is often advisable to avoid using the system user by default; instead, you can create your own user for enhanced security and customization. For detailed steps on user creation, refer to our [end-to-end guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo.ipynb) which also shows how to set up a user in Oracle. Additionally, understanding user privileges is crucial for managing database security effectively. You can learn more about this topic in the official [Oracle guide](https://docs.oracle.com/en/database/oracle/oracle-database/19/admqs/administering-user-accounts-and-security.html#GUID-36B21D72-1BBB-46C9-A0C9-F0D2A8591B8D) on administering user accounts and security."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Prerequisites\n",
|
||
"\n",
|
||
"Ensure you have the Oracle Python Client driver installed to facilitate the integration of Langchain with Oracle AI Vector Search."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# pip install oracledb"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Connect to Oracle Database\n",
|
||
"The following sample code will show how to connect to Oracle Database. By default, python-oracledb runs in a ‘Thin’ mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in ‘Thick’ mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You might want to switch to thick-mode if you are unable to use thin-mode."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import sys\n",
|
||
"\n",
|
||
"import oracledb\n",
|
||
"\n",
|
||
"# Update the following variables with your Oracle database credentials and connection details\n",
|
||
"username = \"<username>\"\n",
|
||
"password = \"<password>\"\n",
|
||
"dsn = \"<hostname>/<service_name>\"\n",
|
||
"\n",
|
||
"try:\n",
|
||
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
|
||
" print(\"Connection successful!\")\n",
|
||
"except Exception as e:\n",
|
||
" print(\"Connection failed!\")\n",
|
||
" sys.exit(1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"For embedding generation, several provider options are available to users, including embedding generation within the database and third-party services such as OcigenAI, Hugging Face, and OpenAI. Users opting for third-party providers must establish credentials that include the requisite authentication information. Alternatively, if users select 'database' as their provider, they are required to load an ONNX model into the Oracle Database to facilitate embeddings."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Load ONNX Model\n",
|
||
"\n",
|
||
"Oracle accommodates a variety of embedding providers, enabling users to choose between proprietary database solutions and third-party services such as OCIGENAI and HuggingFace. This selection dictates the methodology for generating and managing embeddings.\n",
|
||
"\n",
|
||
"***Important*** : Should users opt for the database option, they must upload an ONNX model into the Oracle Database. Conversely, if a third-party provider is selected for embedding generation, uploading an ONNX model to Oracle Database is not required.\n",
|
||
"\n",
|
||
"A significant advantage of utilizing an ONNX model directly within Oracle is the enhanced security and performance it offers by eliminating the need to transmit data to external parties. Additionally, this method avoids the latency typically associated with network or REST API calls.\n",
|
||
"\n",
|
||
"Below is the example code to upload an ONNX model into Oracle Database:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
|
||
"\n",
|
||
"# Update the directory and file names for your ONNX model\n",
|
||
"# make sure that you have onnx file in the system\n",
|
||
"onnx_dir = \"DEMO_DIR\"\n",
|
||
"onnx_file = \"tinybert.onnx\"\n",
|
||
"model_name = \"demo_model\"\n",
|
||
"\n",
|
||
"try:\n",
|
||
" OracleEmbeddings.load_onnx_model(conn, onnx_dir, onnx_file, model_name)\n",
|
||
" print(\"ONNX model loaded.\")\n",
|
||
"except Exception as e:\n",
|
||
" print(\"ONNX model loading failed!\")\n",
|
||
" sys.exit(1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Create Credential\n",
|
||
"\n",
|
||
"When selecting third-party providers for generating embeddings, users are required to establish credentials to securely access the provider's endpoints.\n",
|
||
"\n",
|
||
"***Important:*** No credentials are necessary when opting for the 'database' provider to generate embeddings. However, should users decide to utilize a third-party provider, they must create credentials specific to the chosen provider.\n",
|
||
"\n",
|
||
"Below is an illustrative example:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"try:\n",
|
||
" cursor = conn.cursor()\n",
|
||
" cursor.execute(\n",
|
||
" \"\"\"\n",
|
||
" declare\n",
|
||
" jo json_object_t;\n",
|
||
" begin\n",
|
||
" -- HuggingFace\n",
|
||
" dbms_vector_chain.drop_credential(credential_name => 'HF_CRED');\n",
|
||
" jo := json_object_t();\n",
|
||
" jo.put('access_token', '<access_token>');\n",
|
||
" dbms_vector_chain.create_credential(\n",
|
||
" credential_name => 'HF_CRED',\n",
|
||
" params => json(jo.to_string));\n",
|
||
"\n",
|
||
" -- OCIGENAI\n",
|
||
" dbms_vector_chain.drop_credential(credential_name => 'OCI_CRED');\n",
|
||
" jo := json_object_t();\n",
|
||
" jo.put('user_ocid','<user_ocid>');\n",
|
||
" jo.put('tenancy_ocid','<tenancy_ocid>');\n",
|
||
" jo.put('compartment_ocid','<compartment_ocid>');\n",
|
||
" jo.put('private_key','<private_key>');\n",
|
||
" jo.put('fingerprint','<fingerprint>');\n",
|
||
" dbms_vector_chain.create_credential(\n",
|
||
" credential_name => 'OCI_CRED',\n",
|
||
" params => json(jo.to_string));\n",
|
||
" end;\n",
|
||
" \"\"\"\n",
|
||
" )\n",
|
||
" cursor.close()\n",
|
||
" print(\"Credentials created.\")\n",
|
||
"except Exception as ex:\n",
|
||
" cursor.close()\n",
|
||
" raise"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Generate Embeddings\n",
|
||
"\n",
|
||
"Oracle AI Vector Search provides multiple methods for generating embeddings, utilizing either locally hosted ONNX models or third-party APIs. For comprehensive instructions on configuring these alternatives, please refer to the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-C6439E94-4E86-4ECD-954E-4B73D53579DE)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"***Note:*** Users may need to configure a proxy to utilize third-party embedding generation providers, excluding the 'database' provider that utilizes an ONNX model."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# proxy to be used when we instantiate summary and embedder object\n",
|
||
"proxy = \"<proxy>\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The following sample code will show how to generate embeddings:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
|
||
"from langchain_core.documents import Document\n",
|
||
"\n",
|
||
"\"\"\"\n",
|
||
"# using ocigenai\n",
|
||
"embedder_params = {\n",
|
||
" \"provider\": \"ocigenai\",\n",
|
||
" \"credential_name\": \"OCI_CRED\",\n",
|
||
" \"url\": \"https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/embedText\",\n",
|
||
" \"model\": \"cohere.embed-english-light-v3.0\",\n",
|
||
"}\n",
|
||
"\n",
|
||
"# using huggingface\n",
|
||
"embedder_params = {\n",
|
||
" \"provider\": \"huggingface\", \n",
|
||
" \"credential_name\": \"HF_CRED\", \n",
|
||
" \"url\": \"https://api-inference.huggingface.co/pipeline/feature-extraction/\", \n",
|
||
" \"model\": \"sentence-transformers/all-MiniLM-L6-v2\", \n",
|
||
" \"wait_for_model\": \"true\"\n",
|
||
"}\n",
|
||
"\"\"\"\n",
|
||
"\n",
|
||
"# using ONNX model loaded to Oracle Database\n",
|
||
"embedder_params = {\"provider\": \"database\", \"model\": \"demo_model\"}\n",
|
||
"\n",
|
||
"# If a proxy is not required for your environment, you can omit the 'proxy' parameter below\n",
|
||
"embedder = OracleEmbeddings(conn=conn, params=embedder_params, proxy=proxy)\n",
|
||
"embed = embedder.embed_query(\"Hello World!\")\n",
|
||
"\n",
|
||
"\"\"\" verify \"\"\"\n",
|
||
"print(f\"Embedding generated by OracleEmbeddings: {embed}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### End to End Demo\n",
|
||
"Please refer to our complete demo guide [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) to build an end to end RAG pipeline with the help of Oracle AI Vector Search.\n"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.11.9"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 4
|
||
}
|