docs: Fix Milvus vector store initialization (#29511)

- [x] **PR title**:


- [x] **PR message**:

- A change in the Milvus API has caused an issue with the local vector
store initialization. Having used an Ollama embedding model, the vector
store initialization results in the following error:

<img width="978" alt="image"
src="https://github.com/user-attachments/assets/d57e495c-1764-4fbe-ab8c-21ee44f1e686"
/>

- This is fixed by setting the index type explicitly:

`vector_store = Milvus(embedding_function=embeddings,
connection_args={"uri": URI}, index_params={"index_type": "FLAT",
"metric_type": "L2"},)`

Other small documentation edits were also made.


- [x] **Add tests and docs**:
  N/A


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
This commit is contained in:
Mark Perfect 2025-01-31 12:57:36 -05:00 committed by GitHub
parent 0c405245c4
commit b8e218b09f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -3,7 +3,9 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "683953b3", "id": "683953b3",
"metadata": {}, "metadata": {
"id": "683953b3"
},
"source": [ "source": [
"# Milvus\n", "# Milvus\n",
"\n", "\n",
@ -21,6 +23,7 @@
"execution_count": null, "execution_count": null,
"id": "a62cff8a-bcf7-4e33-bbbc-76999c2e3e20", "id": "a62cff8a-bcf7-4e33-bbbc-76999c2e3e20",
"metadata": { "metadata": {
"id": "a62cff8a-bcf7-4e33-bbbc-76999c2e3e20",
"tags": [] "tags": []
}, },
"outputs": [], "outputs": [],
@ -31,9 +34,11 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "633addc3", "id": "633addc3",
"metadata": {}, "metadata": {
"id": "633addc3"
},
"source": [ "source": [
"The latest version of pymilvus comes with a local vector database Milvus Lite, good for prototyping. If you have large scale of data such as more than a million docs, we recommend setting up a more performant Milvus server on [docker or kubernetes](https://milvus.io/docs/install_standalone-docker.md#Start-Milvus).\n", "The latest version of `pymilvus` comes with a local vector database called Milvus Lite, which is good for prototyping. If you have a large amount of data (e.g., more than a million vectors), we recommend setting up a more performant Milvus server on [Docker](https://milvus.io/docs/install_standalone-docker.md#Start-Milvus) or [Kubernetes](https://milvus.io/docs/install_cluster-milvusoperator.md).\n",
"\n", "\n",
"### Credentials\n", "### Credentials\n",
"\n", "\n",
@ -48,9 +53,11 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 25, "execution_count": null,
"id": "a7dd253f", "id": "a7dd253f",
"metadata": {}, "metadata": {
"id": "a7dd253f"
},
"outputs": [], "outputs": [],
"source": [ "source": [
"# | output: false\n", "# | output: false\n",
@ -62,9 +69,10 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 28, "execution_count": null,
"id": "dcf88bdf", "id": "dcf88bdf",
"metadata": { "metadata": {
"id": "dcf88bdf",
"tags": [] "tags": []
}, },
"outputs": [], "outputs": [],
@ -78,32 +86,40 @@
"vector_store = Milvus(\n", "vector_store = Milvus(\n",
" embedding_function=embeddings,\n", " embedding_function=embeddings,\n",
" connection_args={\"uri\": URI},\n", " connection_args={\"uri\": URI},\n",
" # Set index_params if needed\n",
" index_params={\"index_type\": \"FLAT\", \"metric_type\": \"L2\"},\n",
")" ")"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "cae1a7d5", "id": "cae1a7d5",
"metadata": {}, "metadata": {
"id": "cae1a7d5"
},
"source": [ "source": [
"### Compartmentalize the data with Milvus Collections\n", "### Compartmentalize the data with Milvus Collections\n",
"\n", "\n",
"You can store different unrelated documents in different collections within same Milvus instance to maintain the context" "You can store unrelated documents in different collections within the same Milvus instance."
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "c07cd24b", "id": "c07cd24b",
"metadata": {}, "metadata": {
"id": "c07cd24b"
},
"source": [ "source": [
"Here's how you can create a new collection" "Here's how you can create a new collection:"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 29, "execution_count": null,
"id": "c6f4973d", "id": "c6f4973d",
"metadata": {}, "metadata": {
"id": "c6f4973d"
},
"outputs": [], "outputs": [],
"source": [ "source": [
"from langchain_core.documents import Document\n", "from langchain_core.documents import Document\n",
@ -119,16 +135,20 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "3b12df8c", "id": "3b12df8c",
"metadata": {}, "metadata": {
"id": "3b12df8c"
},
"source": [ "source": [
"And here is how you retrieve that stored collection" "And here is how you retrieve that stored collection:"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 30, "execution_count": null,
"id": "12817d16", "id": "12817d16",
"metadata": {}, "metadata": {
"id": "12817d16"
},
"outputs": [], "outputs": [],
"source": [ "source": [
"vector_store_loaded = Milvus(\n", "vector_store_loaded = Milvus(\n",
@ -141,7 +161,9 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "f1fc3818", "id": "f1fc3818",
"metadata": {}, "metadata": {
"id": "f1fc3818"
},
"source": [ "source": [
"## Manage vector store\n", "## Manage vector store\n",
"\n", "\n",
@ -154,9 +176,12 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 31, "execution_count": null,
"id": "3ced24f6", "id": "3ced24f6",
"metadata": {}, "metadata": {
"id": "3ced24f6",
"outputId": "9c57a6bb-86eb-456c-f007-6cabd6865299"
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@ -253,16 +278,21 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "e23c22d8", "id": "e23c22d8",
"metadata": {}, "metadata": {
"id": "e23c22d8"
},
"source": [ "source": [
"### Delete items from vector store" "### Delete items from vector store"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 32, "execution_count": null,
"id": "1f387fa8", "id": "1f387fa8",
"metadata": {}, "metadata": {
"id": "1f387fa8",
"outputId": "62fee30d-92c9-4efd-df8a-453545ff61d0"
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@ -282,11 +312,13 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "fb12fa75", "id": "fb12fa75",
"metadata": {}, "metadata": {
"id": "fb12fa75"
},
"source": [ "source": [
"## Query vector store\n", "## Query vector store\n",
"\n", "\n",
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n", "Once your vector store has been created and the relevant documents have been added, you will most likely wish to query it during the running of your chain or agent.\n",
"\n", "\n",
"### Query directly\n", "### Query directly\n",
"\n", "\n",
@ -297,9 +329,12 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 33, "execution_count": null,
"id": "35801a55", "id": "35801a55",
"metadata": {}, "metadata": {
"id": "35801a55",
"outputId": "13865abb-11a2-41ae-9ad7-44e8586fd099"
},
"outputs": [ "outputs": [
{ {
"name": "stdout", "name": "stdout",
@ -323,7 +358,9 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "35574409", "id": "35574409",
"metadata": {}, "metadata": {
"id": "35574409"
},
"source": [ "source": [
"#### Similarity search with score\n", "#### Similarity search with score\n",
"\n", "\n",
@ -332,9 +369,12 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 22, "execution_count": null,
"id": "c360af3d", "id": "c360af3d",
"metadata": {}, "metadata": {
"id": "c360af3d",
"outputId": "16cb1961-9f4a-494a-9500-27b98a1158d8"
},
"outputs": [ "outputs": [
{ {
"name": "stdout", "name": "stdout",
@ -355,7 +395,9 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "14db337f", "id": "14db337f",
"metadata": {}, "metadata": {
"id": "14db337f"
},
"source": [ "source": [
"For a full list of all the search options available when using the `Milvus` vector store, you can visit the [API reference](https://python.langchain.com/api_reference/milvus/vectorstores/langchain_milvus.vectorstores.milvus.Milvus.html).\n", "For a full list of all the search options available when using the `Milvus` vector store, you can visit the [API reference](https://python.langchain.com/api_reference/milvus/vectorstores/langchain_milvus.vectorstores.milvus.Milvus.html).\n",
"\n", "\n",
@ -366,9 +408,12 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 34, "execution_count": null,
"id": "f6d9357c", "id": "f6d9357c",
"metadata": {}, "metadata": {
"id": "f6d9357c",
"outputId": "bcaa7620-a1c0-418f-9f54-684a472b0b55"
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@ -389,7 +434,9 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "8ac953f1", "id": "8ac953f1",
"metadata": {}, "metadata": {
"id": "8ac953f1"
},
"source": [ "source": [
"## Usage for retrieval-augmented generation\n", "## Usage for retrieval-augmented generation\n",
"\n", "\n",
@ -404,6 +451,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"id": "7fb27b941602401d91542211134fc71a", "id": "7fb27b941602401d91542211134fc71a",
"metadata": { "metadata": {
"id": "7fb27b941602401d91542211134fc71a",
"pycharm": { "pycharm": {
"name": "#%% md\n" "name": "#%% md\n"
} }
@ -413,15 +461,16 @@
"\n", "\n",
"When building a retrieval app, you often have to build it with multiple users in mind. This means that you may be storing data not just for one user, but for many different users, and they should not be able to see each others data.\n", "When building a retrieval app, you often have to build it with multiple users in mind. This means that you may be storing data not just for one user, but for many different users, and they should not be able to see each others data.\n",
"\n", "\n",
"Milvus recommends using [partition_key](https://milvus.io/docs/multi_tenancy.md#Partition-key-based-multi-tenancy) to implement multi-tenancy, here is an example.\n", "Milvus recommends using [partition_key](https://milvus.io/docs/multi_tenancy.md#Partition-key-based-multi-tenancy) to implement multi-tenancy. Here is an example:\n",
"> The feature of Partition key is now not available in Milvus Lite, if you want to use it, you need to start Milvus server from [docker or kubernetes](https://milvus.io/docs/install_standalone-docker.md#Start-Milvus)." "> The feature of Partition key is now not available in Milvus Lite, if you want to use it, you need to start Milvus server, as mentioned above."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 2, "execution_count": null,
"id": "acae54e37e7d407bbb7b55eff062a284", "id": "acae54e37e7d407bbb7b55eff062a284",
"metadata": { "metadata": {
"id": "acae54e37e7d407bbb7b55eff062a284",
"pycharm": { "pycharm": {
"name": "#%%\n" "name": "#%%\n"
} }
@ -447,6 +496,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"id": "9a63283cbaf04dbcab1f6479b197f3a8", "id": "9a63283cbaf04dbcab1f6479b197f3a8",
"metadata": { "metadata": {
"id": "9a63283cbaf04dbcab1f6479b197f3a8",
"pycharm": { "pycharm": {
"name": "#%% md\n" "name": "#%% md\n"
} }
@ -465,9 +515,11 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 3, "execution_count": null,
"id": "8dd0d8092fe74a7c96281538738b07e2", "id": "8dd0d8092fe74a7c96281538738b07e2",
"metadata": { "metadata": {
"id": "8dd0d8092fe74a7c96281538738b07e2",
"outputId": "e38ff0ea-1425-4f12-cfb5-7767d040397b",
"pycharm": { "pycharm": {
"name": "#%%\n" "name": "#%%\n"
} }
@ -493,9 +545,11 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 4, "execution_count": null,
"id": "72eea5119410473aa328ad9291626812", "id": "72eea5119410473aa328ad9291626812",
"metadata": { "metadata": {
"id": "72eea5119410473aa328ad9291626812",
"outputId": "9d3ad63e-fcb9-4f9a-bdf1-1bc263ce832b",
"pycharm": { "pycharm": {
"name": "#%%\n" "name": "#%%\n"
} }
@ -522,7 +576,9 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "f1a873c5", "id": "f1a873c5",
"metadata": {}, "metadata": {
"id": "f1a873c5"
},
"source": [ "source": [
"## API reference\n", "## API reference\n",
"\n", "\n",
@ -531,6 +587,9 @@
} }
], ],
"metadata": { "metadata": {
"colab": {
"provenance": []
},
"kernelspec": { "kernelspec": {
"display_name": "Python 3 (ipykernel)", "display_name": "Python 3 (ipykernel)",
"language": "python", "language": "python",