{ "cells": [ { "cell_type": "markdown", "id": "278b6c63", "metadata": {}, "source": [ "# OpenAI\n", "\n", "Let's load the OpenAI Embedding class." ] }, { "cell_type": "markdown", "id": "40ff98ff-58e9-4716-8788-227a5c3f473d", "metadata": {}, "source": [ "## Setup\n", "\n", "First we install langchain-openai and set the required env vars" ] }, { "cell_type": "code", "execution_count": null, "id": "c66c4613-6c67-40ca-b3b1-c026750d1742", "metadata": {}, "outputs": [], "source": [ "%pip install -qU langchain-openai" ] }, { "cell_type": "code", "execution_count": null, "id": "62e3710e-55a0-44fb-ba51-2f1d520dfc38", "metadata": {}, "outputs": [], "source": [ "import getpass\n", "import os\n", "\n", "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()" ] }, { "cell_type": "code", "execution_count": 1, "id": "0be1af71", "metadata": {}, "outputs": [], "source": [ "from langchain_openai import OpenAIEmbeddings" ] }, { "cell_type": "code", "execution_count": 5, "id": "2c66e5da", "metadata": {}, "outputs": [], "source": [ "embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")" ] }, { "cell_type": "code", "execution_count": 6, "id": "01370375", "metadata": {}, "outputs": [], "source": [ "text = \"This is a test document.\"" ] }, { "cell_type": "markdown", "id": "f012c222-3fa9-470a-935c-758b2048d9af", "metadata": {}, "source": [ "## Usage\n", "### Embed query" ] }, { "cell_type": "code", "execution_count": 7, "id": "bfb6142c", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning: model not found. Using cl100k_base encoding.\n" ] } ], "source": [ "query_result = embeddings.embed_query(text)" ] }, { "cell_type": "code", "execution_count": 8, "id": "91bc875d-829b-4c3d-8e6f-fc2dda30a3bd", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[-0.014380056377383358,\n", " -0.027191711627651764,\n", " -0.020042716111860304,\n", " 0.057301379620345545,\n", " -0.022267658631828974]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query_result[:5]" ] }, { "cell_type": "markdown", "id": "6b733391-1e23-438b-a6bc-0d77eed9426e", "metadata": {}, "source": [ "## Embed documents" ] }, { "cell_type": "code", "execution_count": 9, "id": "0356c3b7", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning: model not found. Using cl100k_base encoding.\n" ] } ], "source": [ "doc_result = embeddings.embed_documents([text])" ] }, { "cell_type": "code", "execution_count": 10, "id": "a4b0d49e-0c73-44b6-aed5-5b426564e085", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[-0.014380056377383358,\n", " -0.027191711627651764,\n", " -0.020042716111860304,\n", " 0.057301379620345545,\n", " -0.022267658631828974]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "doc_result[0][:5]" ] }, { "cell_type": "markdown", "id": "e7dc464a-6fa2-4cff-ab2e-49a0566d819b", "metadata": {}, "source": [ "## Specify dimensions\n", "\n", "With the `text-embedding-3` class of models, you can specify the size of the embeddings you want returned. For example by default `text-embedding-3-large` returned embeddings of dimension 3072:" ] }, { "cell_type": "code", "execution_count": 11, "id": "f7be1e7b-54c6-4893-b8ad-b872e6705735", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3072" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(doc_result[0])" ] }, { "cell_type": "markdown", "id": "33287142-0835-4958-962f-385ae4447431", "metadata": {}, "source": [ "But by passing in `dimensions=1024` we can reduce the size of our embeddings to 1024:" ] }, { "cell_type": "code", "execution_count": 15, "id": "854ee772-2de9-4a83-84e0-908033d98e4e", "metadata": {}, "outputs": [], "source": [ "embeddings_1024 = OpenAIEmbeddings(model=\"text-embedding-3-large\", dimensions=1024)" ] }, { "cell_type": "code", "execution_count": 16, "id": "3b464396-8d94-478b-8329-849b56e1ae23", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning: model not found. Using cl100k_base encoding.\n" ] }, { "data": { "text/plain": [ "1024" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(embeddings_1024.embed_documents([text])[0])" ] } ], "metadata": { "kernelspec": { "display_name": "poetry-venv", "language": "python", "name": "poetry-venv" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.1" }, "vscode": { "interpreter": { "hash": "e971737741ff4ec9aff7dc6155a1060a59a8a6d52c757dbbe66bf8ee389494b1" } } }, "nbformat": 4, "nbformat_minor": 5 }