Files
langchain/docs/versioned_docs/version-0.2.x/integrations/text_embedding/voyageai.ipynb
Jacob Lee aff771923a Jacob/new docs (#20570)
Use docusaurus versioning with a callout, merged master as well

@hwchase17 @baskaryan

---------

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com>
Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
Co-authored-by: Averi Kitsch <akitsch@google.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Martín Gotelli Ferenaz <martingotelliferenaz@gmail.com>
Co-authored-by: Fayfox <admin@fayfox.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Dawson Bauer <105886620+djbauer2@users.noreply.github.com>
Co-authored-by: Ravindu Somawansa <ravindu.somawansa@gmail.com>
Co-authored-by: Dhruv Chawla <43818888+Dominastorm@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: WeichenXu <weichen.xu@databricks.com>
Co-authored-by: Benito Geordie <89472452+benitoThree@users.noreply.github.com>
Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com>
Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>
Co-authored-by: Sevin F. Varoglu <sfvaroglu@octoml.ai>
Co-authored-by: MacanPN <martin.triska@gmail.com>
Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
Co-authored-by: Hyeongchan Kim <kozistr@gmail.com>
Co-authored-by: sdan <git@sdan.io>
Co-authored-by: Guangdong Liu <liugddx@gmail.com>
Co-authored-by: Rahul Triptahi <rahul.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: pjb157 <84070455+pjb157@users.noreply.github.com>
Co-authored-by: Eun Hye Kim <ehkim1440@gmail.com>
Co-authored-by: kaijietti <43436010+kaijietti@users.noreply.github.com>
Co-authored-by: Pengcheng Liu <pcliu.fd@gmail.com>
Co-authored-by: Tomer Cagan <tomer@tomercagan.com>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
2024-04-18 11:10:55 -07:00

233 lines
6.3 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "278b6c63",
"metadata": {},
"source": [
"# Voyage AI\n",
"\n",
">[Voyage AI](https://www.voyageai.com/) provides cutting-edge embedding/vectorizations models.\n",
"\n",
"Let's load the Voyage AI Embedding class. (Install the LangChain partner package with `pip install langchain-voyageai`)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "0be1af71",
"metadata": {},
"outputs": [],
"source": [
"from langchain_voyageai import VoyageAIEmbeddings"
]
},
{
"cell_type": "markdown",
"id": "137cfde9-b88c-409a-9394-a9e31a6bf30d",
"metadata": {},
"source": [
"Voyage AI utilizes API keys to monitor usage and manage permissions. To obtain your key, create an account on our [homepage](https://www.voyageai.com). Then, create a VoyageEmbeddings model with your API key. Please refer to the documentation for further details on the available models: https://docs.voyageai.com/embeddings/"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2c66e5da",
"metadata": {},
"outputs": [],
"source": [
"embeddings = VoyageAIEmbeddings(\n",
" voyage_api_key=\"[ Your Voyage API key ]\", model=\"voyage-2\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "459dffb3-9bff-41f2-8507-642de7431b2d",
"metadata": {},
"source": [
"Prepare the documents and use `embed_documents` to get their embeddings."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c85e948f-85fd-4d56-8d21-6e2f7e65cab8",
"metadata": {},
"outputs": [],
"source": [
"documents = [\n",
" \"Caching embeddings enables the storage or temporary caching of embeddings, eliminating the necessity to recompute them each time.\",\n",
" \"An LLMChain is a chain that composes basic LLM functionality. It consists of a PromptTemplate and a language model (either an LLM or chat model). It formats the prompt template using the input key values provided (and also memory key values, if available), passes the formatted string to LLM and returns the LLM output.\",\n",
" \"A Runnable represents a generic unit of work that can be invoked, batched, streamed, and/or transformed.\",\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "5a77a12d-6ac6-4ab8-b103-80ff24487019",
"metadata": {},
"outputs": [],
"source": [
"documents_embds = embeddings.embed_documents(documents)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "2c89167c-816c-487e-8704-90908a4190bb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[0.0562174916267395,\n",
" 0.018221192061901093,\n",
" 0.0025736060924828053,\n",
" -0.009720131754875183,\n",
" 0.04108370840549469]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"documents_embds[0][:5]"
]
},
{
"cell_type": "markdown",
"id": "f8d796d1-4ced-44d3-81bf-282721edb6bb",
"metadata": {},
"source": [
"Similarly, use `embed_query` to embed the query."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "bfb6142c",
"metadata": {},
"outputs": [],
"source": [
"query = \"What's an LLMChain?\""
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "91bc875d-829b-4c3d-8e6f-fc2dda30a3bd",
"metadata": {},
"outputs": [],
"source": [
"query_embd = embeddings.embed_query(query)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "a4b0d49e-0c73-44b6-aed5-5b426564e085",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[-0.0052348352037370205,\n",
" -0.040072452276945114,\n",
" 0.0033957737032324076,\n",
" 0.01763271726667881,\n",
" -0.019235141575336456]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query_embd[:5]"
]
},
{
"cell_type": "markdown",
"id": "b16ddbb2-61f0-49ec-92c3-a6f236d9517f",
"metadata": {},
"source": [
"## A minimalist retrieval system"
]
},
{
"cell_type": "markdown",
"id": "5464cb0a-6967-4f1e-ac7c-0aab80b2795a",
"metadata": {},
"source": [
"The main feature of the embeddings is that the cosine similarity between two embeddings captures the semantic relatedness of the corresponding original passages. This allows us to use the embeddings to do semantic retrieval / search."
]
},
{
"cell_type": "markdown",
"id": "a0bd3ad2-ca68-4e75-9172-76aea28ba46e",
"metadata": {},
"source": [
" We can find a few closest embeddings in the documents embeddings based on the cosine similarity, and retrieve the corresponding document using the `KNNRetriever` class from LangChain."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "0a3fc579-85a9-4bd0-a944-4e32ac62e2d4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"An LLMChain is a chain that composes basic LLM functionality. It consists of a PromptTemplate and a language model (either an LLM or chat model). It formats the prompt template using the input key values provided (and also memory key values, if available), passes the formatted string to LLM and returns the LLM output.\n"
]
}
],
"source": [
"from langchain.retrievers import KNNRetriever\n",
"\n",
"retriever = KNNRetriever.from_texts(documents, embeddings)\n",
"\n",
"# retrieve the most relevant documents\n",
"result = retriever.get_relevant_documents(query)\n",
"top1_retrieved_doc = result[0].page_content # return the top1 retrieved result\n",
"\n",
"print(top1_retrieved_doc)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
},
"vscode": {
"interpreter": {
"hash": "e971737741ff4ec9aff7dc6155a1060a59a8a6d52c757dbbe66bf8ee389494b1"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}