Files
langchain/docs/versioned_docs/version-0.2.x/integrations/vectorstores/meilisearch.ipynb
Jacob Lee aff771923a Jacob/new docs (#20570)
Use docusaurus versioning with a callout, merged master as well

@hwchase17 @baskaryan

---------

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com>
Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
Co-authored-by: Averi Kitsch <akitsch@google.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Martín Gotelli Ferenaz <martingotelliferenaz@gmail.com>
Co-authored-by: Fayfox <admin@fayfox.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Dawson Bauer <105886620+djbauer2@users.noreply.github.com>
Co-authored-by: Ravindu Somawansa <ravindu.somawansa@gmail.com>
Co-authored-by: Dhruv Chawla <43818888+Dominastorm@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: WeichenXu <weichen.xu@databricks.com>
Co-authored-by: Benito Geordie <89472452+benitoThree@users.noreply.github.com>
Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com>
Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>
Co-authored-by: Sevin F. Varoglu <sfvaroglu@octoml.ai>
Co-authored-by: MacanPN <martin.triska@gmail.com>
Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
Co-authored-by: Hyeongchan Kim <kozistr@gmail.com>
Co-authored-by: sdan <git@sdan.io>
Co-authored-by: Guangdong Liu <liugddx@gmail.com>
Co-authored-by: Rahul Triptahi <rahul.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: pjb157 <84070455+pjb157@users.noreply.github.com>
Co-authored-by: Eun Hye Kim <ehkim1440@gmail.com>
Co-authored-by: kaijietti <43436010+kaijietti@users.noreply.github.com>
Co-authored-by: Pengcheng Liu <pcliu.fd@gmail.com>
Co-authored-by: Tomer Cagan <tomer@tomercagan.com>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
2024-04-18 11:10:55 -07:00

327 lines
10 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Meilisearch\n",
"\n",
"> [Meilisearch](https://meilisearch.com) is an open-source, lightning-fast, and hyper relevant search engine. It comes with great defaults to help developers build snappy search experiences. \n",
">\n",
"> You can [self-host Meilisearch](https://www.meilisearch.com/docs/learn/getting_started/installation#local-installation) or run on [Meilisearch Cloud](https://www.meilisearch.com/pricing).\n",
"\n",
"Meilisearch v1.3 supports vector search. This page guides you through integrating Meilisearch as a vector store and using it to perform vector search."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"### Launching a Meilisearch instance\n",
"\n",
"You will need a running Meilisearch instance to use as your vector store. You can run [Meilisearch in local](https://www.meilisearch.com/docs/learn/getting_started/installation#local-installation) or create a [Meilisearch Cloud](https://cloud.meilisearch.com/) account.\n",
"\n",
"As of Meilisearch v1.3, vector storage is an experimental feature. After launching your Meilisearch instance, you need to **enable vector storage**. For self-hosted Meilisearch, read the docs on [enabling experimental features](https://www.meilisearch.com/docs/learn/experimental/overview). On **Meilisearch Cloud**, enable _Vector Store_ via your project _Settings_ page.\n",
"\n",
"You should now have a running Meilisearch instance with vector storage enabled. 🎉\n",
"\n",
"### Credentials\n",
"\n",
"To interact with your Meilisearch instance, the Meilisearch SDK needs a host (URL of your instance) and an API key.\n",
"\n",
"**Host**\n",
"\n",
"- In **local**, the default host is `localhost:7700`\n",
"- On **Meilisearch Cloud**, find the host in your project _Settings_ page\n",
"\n",
"**API keys**\n",
"\n",
"Meilisearch instance provides you with three API keys out of the box: \n",
"- A `MASTER KEY` — it should only be used to create your Meilisearch instance\n",
"- A `ADMIN KEY` — use it only server-side to update your database and its settings\n",
"- A `SEARCH KEY` — a key that you can safely share in front-end applications\n",
"\n",
"You can create [additional API keys](https://www.meilisearch.com/docs/learn/security/master_api_keys) as needed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installing dependencies\n",
"\n",
"This guide uses the [Meilisearch Python SDK](https://github.com/meilisearch/meilisearch-python). You can install it by running:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet meilisearch"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For more information, refer to the [Meilisearch Python SDK documentation](https://meilisearch.github.io/meilisearch-python/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Examples\n",
"\n",
"There are multiple ways to initialize the Meilisearch vector store: providing a Meilisearch client or the _URL_ and _API key_ as needed. In our examples, the credentials will be loaded from the environment.\n",
"\n",
"You can make environment variables available in your Notebook environment by using `os` and `getpass`. You can use this technique for all the following examples."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"MEILI_HTTP_ADDR\"] = getpass.getpass(\"Meilisearch HTTP address and port:\")\n",
"os.environ[\"MEILI_MASTER_KEY\"] = getpass.getpass(\"Meilisearch API Key:\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We want to use OpenAIEmbeddings so we have to get the OpenAI API Key."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Adding text and embeddings\n",
"\n",
"This example adds text to the Meilisearch vector database without having to initialize a Meilisearch vector store."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.vectorstores import Meilisearch\n",
"from langchain_openai import OpenAIEmbeddings\n",
"from langchain_text_splitters import CharacterTextSplitter\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"embedders = {\n",
" \"default\": {\n",
" \"source\": \"userProvided\",\n",
" \"dimensions\": 1536,\n",
" }\n",
"}\n",
"embedder_name = \"default\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open(\"../../modules/state_of_the_union.txt\") as f:\n",
" state_of_the_union = f.read()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_text(state_of_the_union)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Use Meilisearch vector store to store texts & associated embeddings as vector\n",
"vector_store = Meilisearch.from_texts(\n",
" texts=texts, embedding=embeddings, embedders=embedders, embedder_name=embedder_name\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Behind the scenes, Meilisearch will convert the text to multiple vectors. This will bring us to the same result as the following example."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Adding documents and embeddings\n",
"\n",
"In this example, we'll use Langchain TextSplitter to split the text in multiple documents. Then, we'll store these documents along with their embeddings."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders import TextLoader\n",
"\n",
"# Load text\n",
"loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"\n",
"# Create documents\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"# Import documents & embeddings in the vector store\n",
"vector_store = Meilisearch.from_documents(\n",
" documents=documents,\n",
" embedding=embeddings,\n",
" embedders=embedders,\n",
" embedder_name=embedder_name,\n",
")\n",
"\n",
"# Search in our vector store\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = vector_store.similarity_search(query, embedder_name=embedder_name)\n",
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Add documents by creating a Meilisearch Vectorstore"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this approach, we create a vector store object and add documents to it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import meilisearch\n",
"from langchain_community.vectorstores import Meilisearch\n",
"\n",
"client = meilisearch.Client(url=\"http://127.0.0.1:7700\", api_key=\"***\")\n",
"vector_store = Meilisearch(\n",
" embedding=embeddings,\n",
" embedders=embedders,\n",
" client=client,\n",
" index_name=\"langchain_demo\",\n",
" text_key=\"text\",\n",
")\n",
"vector_store.add_documents(documents)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Similarity Search with score\n",
"\n",
"This specific method allows you to return the documents and the distance score of the query to them. `embedder_name` is the name of the embedder that should be used for semantic search, defaults to \"default\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"docs_and_scores = vector_store.similarity_search_with_score(\n",
" query, embedder_name=embedder_name\n",
")\n",
"docs_and_scores[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Similarity Search by vector\n",
"`embedder_name` is the name of the embedder that should be used for semantic search, defaults to \"default\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"embedding_vector = embeddings.embed_query(query)\n",
"docs_and_scores = vector_store.similarity_search_by_vector(\n",
" embedding_vector, embedder_name=embedder_name\n",
")\n",
"docs_and_scores[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Additional resources\n",
"\n",
"Documentation\n",
"- [Meilisearch](https://www.meilisearch.com/docs/)\n",
"- [Meilisearch Python SDK](https://python-sdk.meilisearch.com)\n",
"\n",
"Open-source repositories\n",
"- [Meilisearch repository](https://github.com/meilisearch/meilisearch)\n",
"- [Meilisearch Python SDK](https://github.com/meilisearch/meilisearch-python)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}