community[minor]: ManticoreSearch engine added to vectorstore (#19117)

**Description:** ManticoreSearch engine added to vectorstores **Issue:** no issue, just a new feature **Dependencies:** https://pypi.org/project/manticoresearch-dev/ **Twitter handle:** @EvilFreelancer - Example notebook with test integration: https://github.com/EvilFreelancer/langchain/blob/manticore-search-vectorstore/docs/docs/integrations/vectorstores/manticore_search.ipynb --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>
2025-09-23 19:39:58 +00:00 · 2024-05-23 23:56:18 +03:00
parent 95c3e5f85f
commit fe26f937e4
5 changed files with 826 additions and 106 deletions
--- a/docs/docs/integrations/vectorstores/manticore_search.ipynb
+++ b/docs/docs/integrations/vectorstores/manticore_search.ipynb
@@ -0,0 +1,443 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "bf48a5c8c3d125e1",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "# ManticoreSearch VectorStore\n",
+    "\n",
+    "[ManticoreSearch](https://manticoresearch.com/) is an open-source search engine that offers fast, scalable, and user-friendly capabilities. Originating as a fork of [Sphinx Search](http://sphinxsearch.com/), it has evolved to incorporate modern search engine features and improvements. ManticoreSearch distinguishes itself with its robust performance and ease of integration into various applications.\n",
+    "\n",
+    "ManticoreSearch has recently introduced [vector search capabilities](https://manual.manticoresearch.com/dev/Searching/KNN), starting with search engine version 6.2 and only with [manticore-columnar-lib](https://github.com/manticoresoftware/columnar) package installed. This feature is a considerable advancement, allowing for the execution of searches based on vector similarity.\n",
+    "\n",
+    "As of now, the vector search functionality is only accessible in the developmental (dev) versions of the search engine. Consequently, it is imperative to employ a developmental [manticoresearch-dev](https://pypi.org/project/manticoresearch-dev/) Python client for utilizing this feature effectively."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d5050b607ca217ad",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "## Setting up environments"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b26c5ab7f89a61fc",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "Starting Docker-container with ManticoreSearch and installing manticore-columnar-lib package (optional)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "initial_id",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-03-03T11:28:37.177840Z",
+     "start_time": "2024-03-03T11:28:26.863511Z"
+    },
+    "collapsed": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Get:1 http://repo.manticoresearch.com/repository/manticoresearch_jammy_dev jammy InRelease [3525 kB]\r\n",
+      "Get:2 http://archive.ubuntu.com/ubuntu jammy InRelease [270 kB]            \r\n",
+      "Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]      \r\n",
+      "Get:4 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]        \r\n",
+      "Get:5 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [1074 kB]\r\n",
+      "Get:6 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [109 kB]      \r\n",
+      "Get:7 http://archive.ubuntu.com/ubuntu jammy/universe amd64 Packages [17.5 MB] \r\n",
+      "Get:8 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [1517 kB]\r\n",
+      "Get:9 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [1889 kB]\r\n",
+      "Get:10 http://security.ubuntu.com/ubuntu jammy-security/multiverse amd64 Packages [44.6 kB]\r\n",
+      "Get:11 http://archive.ubuntu.com/ubuntu jammy/restricted amd64 Packages [164 kB]\r\n",
+      "Get:12 http://archive.ubuntu.com/ubuntu jammy/multiverse amd64 Packages [266 kB]\r\n",
+      "Get:13 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages [1792 kB]    \r\n",
+      "Get:14 http://archive.ubuntu.com/ubuntu jammy-updates/multiverse amd64 Packages [50.4 kB]\r\n",
+      "Get:15 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages [1927 kB]\r\n",
+      "Get:16 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1346 kB]\r\n",
+      "Get:17 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [1796 kB]\r\n",
+      "Get:18 http://archive.ubuntu.com/ubuntu jammy-backports/universe amd64 Packages [28.1 kB]\r\n",
+      "Get:19 http://archive.ubuntu.com/ubuntu jammy-backports/main amd64 Packages [50.4 kB]\r\n",
+      "Get:20 http://repo.manticoresearch.com/repository/manticoresearch_jammy_dev jammy/main amd64 Packages [5020 kB]\r\n",
+      "Fetched 38.6 MB in 7s (5847 kB/s)                                              \r\n",
+      "Reading package lists... Done\r\n",
+      "Reading package lists... Done\r\n",
+      "Building dependency tree... Done\r\n",
+      "Reading state information... Done\r\n",
+      "The following NEW packages will be installed:\r\n",
+      "  manticore-columnar-lib\r\n",
+      "0 upgraded, 1 newly installed, 0 to remove and 21 not upgraded.\r\n",
+      "Need to get 1990 kB of archives.\r\n",
+      "After this operation, 10.0 MB of additional disk space will be used.\r\n",
+      "Get:1 http://repo.manticoresearch.com/repository/manticoresearch_jammy_dev jammy/main amd64 manticore-columnar-lib amd64 2.2.5-240217-a5342a1 [1990 kB]\r\n",
+      "Fetched 1990 kB in 1s (1505 kB/s)                 \r\n",
+      "debconf: delaying package configuration, since apt-utils is not installed\r\n",
+      "Selecting previously unselected package manticore-columnar-lib.\r\n",
+      "(Reading database ... 12260 files and directories currently installed.)\r\n",
+      "Preparing to unpack .../manticore-columnar-lib_2.2.5-240217-a5342a1_amd64.deb ...\r\n",
+      "Unpacking manticore-columnar-lib (2.2.5-240217-a5342a1) ...\r\n",
+      "Setting up manticore-columnar-lib (2.2.5-240217-a5342a1) ...\r\n",
+      "a546aec22291\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "import time\n",
+    "\n",
+    "# Start container\n",
+    "containers = !docker ps --filter \"name=langchain-manticoresearch-server\" -q\n",
+    "if len(containers) == 0:\n",
+    "    !docker run -d -p 9308:9308 --name langchain-manticoresearch-server manticoresearch/manticore:dev\n",
+    "    time.sleep(20)  # Wait for the container to start up\n",
+    "\n",
+    "# Get ID of container\n",
+    "container_id = containers[0]\n",
+    "\n",
+    "# Install manticore-columnar-lib package as root user\n",
+    "!docker exec -it --user 0 {container_id} apt-get update\n",
+    "!docker exec -it --user 0 {container_id} apt-get install -y manticore-columnar-lib\n",
+    "\n",
+    "# Restart container\n",
+    "!docker restart {container_id}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42284e4c8fd0aeb4",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "Installing ManticoreSearch python client"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "bc7bd70a63cc8d90",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-03-03T11:28:38.544198Z",
+     "start_time": "2024-03-03T11:28:37.178755Z"
+    },
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\r\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\r\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\r\n",
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
+    }
+   ],
+   "source": [
+    "%pip install --upgrade --quiet manticoresearch-dev"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f90b4793255edcb1",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "We want to use OpenAIEmbeddings so we have to get the OpenAI API Key."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "a303c63186fd8abd",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-03-03T11:28:38.546877Z",
+     "start_time": "2024-03-03T11:28:38.544907Z"
+    },
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "from langchain.text_splitter import CharacterTextSplitter\n",
+    "from langchain_community.embeddings import GPT4AllEmbeddings\n",
+    "from langchain_community.vectorstores import ManticoreSearch, ManticoreSearchSettings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "46ad30f36815ed15",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-03-03T11:28:38.991083Z",
+     "start_time": "2024-03-03T11:28:38.547705Z"
+    },
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Created a chunk of size 338, which is longer than the specified 100\n",
+      "Created a chunk of size 508, which is longer than the specified 100\n",
+      "Created a chunk of size 277, which is longer than the specified 100\n",
+      "Created a chunk of size 777, which is longer than the specified 100\n",
+      "Created a chunk of size 247, which is longer than the specified 100\n",
+      "Created a chunk of size 228, which is longer than the specified 100\n",
+      "Created a chunk of size 557, which is longer than the specified 100\n",
+      "Created a chunk of size 587, which is longer than the specified 100\n",
+      "Created a chunk of size 173, which is longer than the specified 100\n",
+      "Created a chunk of size 622, which is longer than the specified 100\n",
+      "Created a chunk of size 775, which is longer than the specified 100\n",
+      "Created a chunk of size 292, which is longer than the specified 100\n",
+      "Created a chunk of size 456, which is longer than the specified 100\n",
+      "Created a chunk of size 291, which is longer than the specified 100\n",
+      "Created a chunk of size 367, which is longer than the specified 100\n",
+      "Created a chunk of size 604, which is longer than the specified 100\n",
+      "Created a chunk of size 618, which is longer than the specified 100\n",
+      "Created a chunk of size 340, which is longer than the specified 100\n",
+      "Created a chunk of size 395, which is longer than the specified 100\n",
+      "Created a chunk of size 321, which is longer than the specified 100\n",
+      "Created a chunk of size 453, which is longer than the specified 100\n",
+      "Created a chunk of size 354, which is longer than the specified 100\n",
+      "Created a chunk of size 481, which is longer than the specified 100\n",
+      "Created a chunk of size 233, which is longer than the specified 100\n",
+      "Created a chunk of size 270, which is longer than the specified 100\n",
+      "Created a chunk of size 305, which is longer than the specified 100\n",
+      "Created a chunk of size 520, which is longer than the specified 100\n",
+      "Created a chunk of size 289, which is longer than the specified 100\n",
+      "Created a chunk of size 280, which is longer than the specified 100\n",
+      "Created a chunk of size 417, which is longer than the specified 100\n",
+      "Created a chunk of size 495, which is longer than the specified 100\n",
+      "Created a chunk of size 602, which is longer than the specified 100\n",
+      "Created a chunk of size 1004, which is longer than the specified 100\n",
+      "Created a chunk of size 272, which is longer than the specified 100\n",
+      "Created a chunk of size 1203, which is longer than the specified 100\n",
+      "Created a chunk of size 844, which is longer than the specified 100\n",
+      "Created a chunk of size 135, which is longer than the specified 100\n",
+      "Created a chunk of size 306, which is longer than the specified 100\n",
+      "Created a chunk of size 407, which is longer than the specified 100\n",
+      "Created a chunk of size 910, which is longer than the specified 100\n",
+      "Created a chunk of size 398, which is longer than the specified 100\n",
+      "Created a chunk of size 674, which is longer than the specified 100\n",
+      "Created a chunk of size 356, which is longer than the specified 100\n",
+      "Created a chunk of size 474, which is longer than the specified 100\n",
+      "Created a chunk of size 814, which is longer than the specified 100\n",
+      "Created a chunk of size 530, which is longer than the specified 100\n",
+      "Created a chunk of size 469, which is longer than the specified 100\n",
+      "Created a chunk of size 489, which is longer than the specified 100\n",
+      "Created a chunk of size 433, which is longer than the specified 100\n",
+      "Created a chunk of size 603, which is longer than the specified 100\n",
+      "Created a chunk of size 380, which is longer than the specified 100\n",
+      "Created a chunk of size 354, which is longer than the specified 100\n",
+      "Created a chunk of size 391, which is longer than the specified 100\n",
+      "Created a chunk of size 772, which is longer than the specified 100\n",
+      "Created a chunk of size 267, which is longer than the specified 100\n",
+      "Created a chunk of size 571, which is longer than the specified 100\n",
+      "Created a chunk of size 594, which is longer than the specified 100\n",
+      "Created a chunk of size 458, which is longer than the specified 100\n",
+      "Created a chunk of size 386, which is longer than the specified 100\n",
+      "Created a chunk of size 417, which is longer than the specified 100\n",
+      "Created a chunk of size 370, which is longer than the specified 100\n",
+      "Created a chunk of size 402, which is longer than the specified 100\n",
+      "Created a chunk of size 306, which is longer than the specified 100\n",
+      "Created a chunk of size 173, which is longer than the specified 100\n",
+      "Created a chunk of size 628, which is longer than the specified 100\n",
+      "Created a chunk of size 321, which is longer than the specified 100\n",
+      "Created a chunk of size 294, which is longer than the specified 100\n",
+      "Created a chunk of size 689, which is longer than the specified 100\n",
+      "Created a chunk of size 641, which is longer than the specified 100\n",
+      "Created a chunk of size 473, which is longer than the specified 100\n",
+      "Created a chunk of size 414, which is longer than the specified 100\n",
+      "Created a chunk of size 585, which is longer than the specified 100\n",
+      "Created a chunk of size 764, which is longer than the specified 100\n",
+      "Created a chunk of size 502, which is longer than the specified 100\n",
+      "Created a chunk of size 640, which is longer than the specified 100\n",
+      "Created a chunk of size 507, which is longer than the specified 100\n",
+      "Created a chunk of size 564, which is longer than the specified 100\n",
+      "Created a chunk of size 707, which is longer than the specified 100\n",
+      "Created a chunk of size 380, which is longer than the specified 100\n",
+      "Created a chunk of size 615, which is longer than the specified 100\n",
+      "Created a chunk of size 733, which is longer than the specified 100\n",
+      "Created a chunk of size 277, which is longer than the specified 100\n",
+      "Created a chunk of size 497, which is longer than the specified 100\n",
+      "Created a chunk of size 625, which is longer than the specified 100\n",
+      "Created a chunk of size 468, which is longer than the specified 100\n",
+      "Created a chunk of size 289, which is longer than the specified 100\n",
+      "Created a chunk of size 576, which is longer than the specified 100\n",
+      "Created a chunk of size 297, which is longer than the specified 100\n",
+      "Created a chunk of size 534, which is longer than the specified 100\n",
+      "Created a chunk of size 427, which is longer than the specified 100\n",
+      "Created a chunk of size 412, which is longer than the specified 100\n",
+      "Created a chunk of size 381, which is longer than the specified 100\n",
+      "Created a chunk of size 417, which is longer than the specified 100\n",
+      "Created a chunk of size 244, which is longer than the specified 100\n",
+      "Created a chunk of size 307, which is longer than the specified 100\n",
+      "Created a chunk of size 528, which is longer than the specified 100\n",
+      "Created a chunk of size 565, which is longer than the specified 100\n",
+      "Created a chunk of size 487, which is longer than the specified 100\n",
+      "Created a chunk of size 470, which is longer than the specified 100\n",
+      "Created a chunk of size 332, which is longer than the specified 100\n",
+      "Created a chunk of size 552, which is longer than the specified 100\n",
+      "Created a chunk of size 427, which is longer than the specified 100\n",
+      "Created a chunk of size 596, which is longer than the specified 100\n",
+      "Created a chunk of size 192, which is longer than the specified 100\n",
+      "Created a chunk of size 403, which is longer than the specified 100\n",
+      "Created a chunk of size 255, which is longer than the specified 100\n",
+      "Created a chunk of size 1025, which is longer than the specified 100\n",
+      "Created a chunk of size 438, which is longer than the specified 100\n",
+      "Created a chunk of size 900, which is longer than the specified 100\n",
+      "Created a chunk of size 250, which is longer than the specified 100\n",
+      "Created a chunk of size 614, which is longer than the specified 100\n",
+      "Created a chunk of size 635, which is longer than the specified 100\n",
+      "Created a chunk of size 443, which is longer than the specified 100\n",
+      "Created a chunk of size 478, which is longer than the specified 100\n",
+      "Created a chunk of size 473, which is longer than the specified 100\n",
+      "Created a chunk of size 302, which is longer than the specified 100\n",
+      "Created a chunk of size 549, which is longer than the specified 100\n",
+      "Created a chunk of size 644, which is longer than the specified 100\n",
+      "Created a chunk of size 402, which is longer than the specified 100\n",
+      "Created a chunk of size 489, which is longer than the specified 100\n",
+      "Created a chunk of size 551, which is longer than the specified 100\n",
+      "Created a chunk of size 527, which is longer than the specified 100\n",
+      "Created a chunk of size 563, which is longer than the specified 100\n",
+      "Created a chunk of size 472, which is longer than the specified 100\n",
+      "Created a chunk of size 511, which is longer than the specified 100\n",
+      "Created a chunk of size 419, which is longer than the specified 100\n",
+      "Created a chunk of size 245, which is longer than the specified 100\n",
+      "Created a chunk of size 371, which is longer than the specified 100\n",
+      "Created a chunk of size 484, which is longer than the specified 100\n",
+      "Created a chunk of size 306, which is longer than the specified 100\n",
+      "Created a chunk of size 190, which is longer than the specified 100\n",
+      "Created a chunk of size 499, which is longer than the specified 100\n",
+      "Created a chunk of size 480, which is longer than the specified 100\n",
+      "Created a chunk of size 634, which is longer than the specified 100\n",
+      "Created a chunk of size 611, which is longer than the specified 100\n",
+      "Created a chunk of size 356, which is longer than the specified 100\n",
+      "Created a chunk of size 478, which is longer than the specified 100\n",
+      "Created a chunk of size 369, which is longer than the specified 100\n",
+      "Created a chunk of size 526, which is longer than the specified 100\n",
+      "Created a chunk of size 311, which is longer than the specified 100\n",
+      "Created a chunk of size 181, which is longer than the specified 100\n",
+      "Created a chunk of size 637, which is longer than the specified 100\n",
+      "Created a chunk of size 219, which is longer than the specified 100\n",
+      "Created a chunk of size 305, which is longer than the specified 100\n",
+      "Created a chunk of size 409, which is longer than the specified 100\n",
+      "Created a chunk of size 235, which is longer than the specified 100\n",
+      "Created a chunk of size 302, which is longer than the specified 100\n",
+      "Created a chunk of size 236, which is longer than the specified 100\n",
+      "Created a chunk of size 209, which is longer than the specified 100\n",
+      "Created a chunk of size 366, which is longer than the specified 100\n",
+      "Created a chunk of size 277, which is longer than the specified 100\n",
+      "Created a chunk of size 591, which is longer than the specified 100\n",
+      "Created a chunk of size 232, which is longer than the specified 100\n",
+      "Created a chunk of size 543, which is longer than the specified 100\n",
+      "Created a chunk of size 199, which is longer than the specified 100\n",
+      "Created a chunk of size 214, which is longer than the specified 100\n",
+      "Created a chunk of size 263, which is longer than the specified 100\n",
+      "Created a chunk of size 375, which is longer than the specified 100\n",
+      "Created a chunk of size 221, which is longer than the specified 100\n",
+      "Created a chunk of size 261, which is longer than the specified 100\n",
+      "Created a chunk of size 203, which is longer than the specified 100\n",
+      "Created a chunk of size 758, which is longer than the specified 100\n",
+      "Created a chunk of size 271, which is longer than the specified 100\n",
+      "Created a chunk of size 323, which is longer than the specified 100\n",
+      "Created a chunk of size 275, which is longer than the specified 100\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "bert_load_from_file: gguf version     = 2\n",
+      "bert_load_from_file: gguf alignment   = 32\n",
+      "bert_load_from_file: gguf data offset = 695552\n",
+      "bert_load_from_file: model name           = BERT\n",
+      "bert_load_from_file: model architecture   = bert\n",
+      "bert_load_from_file: model file type      = 1\n",
+      "bert_load_from_file: bert tokenizer vocab = 30522\n"
+     ]
+    }
+   ],
+   "source": [
+    "from langchain_community.document_loaders import TextLoader\n",
+    "\n",
+    "loader = TextLoader(\"../../modules/paul_graham_essay.txt\")\n",
+    "documents = loader.load()\n",
+    "text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)\n",
+    "docs = text_splitter.split_documents(documents)\n",
+    "\n",
+    "embeddings = GPT4AllEmbeddings()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "a06370cae96cbaef",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-03-03T11:28:42.366398Z",
+     "start_time": "2024-03-03T11:28:38.991827Z"
+    },
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[Document(page_content='Computer Science is an uneasy alliance between two halves, theory and systems. The theory people prove things, and the systems people build things. I wanted to build things. I had plenty of respect for theory — indeed, a sneaking suspicion that it was the more admirable of the two halves — but building things seemed so much more exciting.', metadata={'some': 'metadata'}), Document(page_content=\"I applied to 3 grad schools: MIT and Yale, which were renowned for AI at the time, and Harvard, which I'd visited because Rich Draves went there, and was also home to Bill Woods, who'd invented the type of parser I used in my SHRDLU clone. Only Harvard accepted me, so that was where I went.\", metadata={'some': 'metadata'}), Document(page_content='For my undergraduate thesis, I reverse-engineered SHRDLU. My God did I love working on that program. It was a pleasing bit of code, but what made it even more exciting was my belief — hard to imagine now, but not unique in 1985 — that it was already climbing the lower slopes of intelligence.', metadata={'some': 'metadata'}), Document(page_content=\"The problem with systems work, though, was that it didn't last. Any program you wrote today, no matter how good, would be obsolete in a couple decades at best. People might mention your software in footnotes, but no one would actually use it. And indeed, it would seem very feeble work. Only people with a sense of the history of the field would even realize that, in its time, it had been good.\", metadata={'some': 'metadata'})]\n"
+     ]
+    }
+   ],
+   "source": [
+    "for d in docs:\n",
+    "    d.metadata = {\"some\": \"metadata\"}\n",
+    "settings = ManticoreSearchSettings(table=\"manticoresearch_vector_search_example\")\n",
+    "docsearch = ManticoreSearch.from_documents(docs, embeddings, config=settings)\n",
+    "\n",
+    "query = \"Robert Morris is\"\n",
+    "docs = docsearch.similarity_search(query)\n",
+    "print(docs)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/libs/community/langchain_community/vectorstores/init.py
+++ b/libs/community/langchain_community/vectorstores/init.py
@@ -153,6 +153,10 @@ if TYPE_CHECKING:
    from langchain_community.vectorstores.llm_rails import (
        LLMRails,
    )
+    from langchain_community.vectorstores.manticore_search import (
+        ManticoreSearch,
+        ManticoreSearchSettings,
+    )
    from langchain_community.vectorstores.marqo import (
        Marqo,
    )
@@ -341,6 +345,8 @@ __all__ = [
    "LLMRails",
    "LanceDB",
    "Lantern",
+    "ManticoreSearch",
+    "ManticoreSearchSettings",
    "Marqo",
    "MatchingEngine",
    "Meilisearch",
@@ -439,6 +445,8 @@ _module_lookup = {
    "LLMRails": "langchain_community.vectorstores.llm_rails",
    "LanceDB": "langchain_community.vectorstores.lancedb",
    "Lantern": "langchain_community.vectorstores.lantern",
+    "ManticoreSearch": "langchain_community.vectorstores.manticore_search",
+    "ManticoreSearchSettings": "langchain_community.vectorstores.manticore_search",
    "Marqo": "langchain_community.vectorstores.marqo",
    "MatchingEngine": "langchain_community.vectorstores.matching_engine",
    "Meilisearch": "langchain_community.vectorstores.meilisearch",
--- a/libs/community/langchain_community/vectorstores/manticore_search.py
+++ b/libs/community/langchain_community/vectorstores/manticore_search.py
@@ -0,0 +1,372 @@
+from __future__ import annotations
+
+import json
+import logging
+import uuid
+from hashlib import sha1
+from typing import Any, Dict, Iterable, List, Optional, Type
+
+from langchain_core.documents import Document
+from langchain_core.embeddings import Embeddings
+from langchain_core.pydantic_v1 import BaseSettings
+from langchain_core.vectorstores import VectorStore
+
+logger = logging.getLogger()
+DEFAULT_K = 4  # Number of Documents to return.
+
+
+class ManticoreSearchSettings(BaseSettings):
+    proto: str = "http"
+    host: str = "localhost"
+    port: int = 9308
+
+    username: Optional[str] = None
+    password: Optional[str] = None
+
+    # database: str = "Manticore"
+    table: str = "langchain"
+
+    column_map: Dict[str, str] = {
+        "id": "id",
+        "uuid": "uuid",
+        "document": "document",
+        "embedding": "embedding",
+        "metadata": "metadata",
+    }
+
+    # A mandatory setting; currently, only hnsw is supported.
+    knn_type: str = "hnsw"
+
+    # A mandatory setting that specifies the dimensions of the vectors being indexed.
+    knn_dims: Optional[int] = None  # Defaults autodetect
+
+    # A mandatory setting that specifies the distance function used by the HNSW index.
+    hnsw_similarity: str = "L2"  # Acceptable values are: L2, IP, COSINE
+
+    # An optional setting that defines the maximum amount of outgoing connections
+    # in the graph.
+    hnsw_m: int = 16  # The default is 16.
+
+    # An optional setting that defines a construction time/accuracy trade-off.
+    hnsw_ef_construction = 100
+
+    def get_connection_string(self) -> str:
+        return self.proto + "://" + self.host + ":" + str(self.port)
+
+    def __getitem__(self, item: str) -> Any:
+        return getattr(self, item)
+
+    class Config:
+        env_file = ".env"
+        env_prefix = "manticore_"
+        env_file_encoding = "utf-8"
+
+
+class ManticoreSearch(VectorStore):
+    """
+    `ManticoreSearch Engine` vector store.
+
+    To use, you should have the ``manticoresearch`` python package installed.
+
+    Example:
+        .. code-block:: python
+
+                from langchain_community.vectorstores import Manticore
+                from langchain_community.embeddings.openai import OpenAIEmbeddings
+
+                embeddings = OpenAIEmbeddings()
+                vectorstore = ManticoreSearch(embeddings)
+    """
+
+    def __init__(
+        self,
+        embedding: Embeddings,
+        *,
+        config: Optional[ManticoreSearchSettings] = None,
+        **kwargs: Any,
+    ) -> None:
+        """
+        ManticoreSearch Wrapper to LangChain
+
+        Args:
+            embedding (Embeddings): Text embedding model.
+            config (ManticoreSearchSettings): Configuration of ManticoreSearch Client
+            **kwargs: Other keyword arguments will pass into Configuration of API client
+                manticoresearch-python. See
+                https://github.com/manticoresoftware/manticoresearch-python for more.
+        """
+        try:
+            import manticoresearch.api as ENDPOINTS
+            import manticoresearch.api_client as API
+        except ImportError:
+            raise ImportError(
+                "Could not import manticoresearch python package. "
+                "Please install it with `pip install manticoresearch-dev`."
+            )
+
+        try:
+            from tqdm import tqdm
+
+            self.pgbar = tqdm
+        except ImportError:
+            # Just in case if tqdm is not installed
+            self.pgbar = lambda x, **kwargs: x
+
+        super().__init__()
+
+        self.embedding = embedding
+        if config is not None:
+            self.config = config
+        else:
+            self.config = ManticoreSearchSettings()
+
+        assert self.config
+        assert self.config.host and self.config.port
+        assert (
+            self.config.column_map
+            # and self.config.database
+            and self.config.table
+        )
+
+        assert (
+            self.config.knn_type
+            # and self.config.knn_dims
+            # and self.config.hnsw_m
+            # and self.config.hnsw_ef_construction
+            and self.config.hnsw_similarity
+        )
+
+        for k in ["id", "embedding", "document", "metadata", "uuid"]:
+            assert k in self.config.column_map
+
+        # Detect embeddings dimension
+        if self.config.knn_dims is None:
+            self.dim: int = len(self.embedding.embed_query("test"))
+        else:
+            self.dim = self.config.knn_dims
+
+        # Initialize the schema
+        self.schema = f"""\
+CREATE TABLE IF NOT EXISTS {self.config.table}(
+    {self.config.column_map['id']} bigint,
+    {self.config.column_map['document']} text indexed stored,
+    {self.config.column_map['embedding']} \
+        float_vector knn_type='{self.config.knn_type}' \
+        knn_dims='{self.dim}' \
+        hnsw_similarity='{self.config.hnsw_similarity}' \
+        hnsw_m='{self.config.hnsw_m}' \
+        hnsw_ef_construction='{self.config.hnsw_ef_construction}',
+    {self.config.column_map['metadata']} json,
+    {self.config.column_map['uuid']} text indexed stored
+)\
+"""
+
+        # Create a connection to ManticoreSearch
+        self.configuration = API.Configuration(
+            host=self.config.get_connection_string(),
+            username=self.config.username,
+            password=self.config.password,
+            # disabled_client_side_validations=",",
+            **kwargs,
+        )
+        self.connection = API.ApiClient(self.configuration)
+        self.client = {
+            "index": ENDPOINTS.IndexApi(self.connection),
+            "utils": ENDPOINTS.UtilsApi(self.connection),
+            "search": ENDPOINTS.SearchApi(self.connection),
+        }
+
+        # Create default schema if not exists
+        self.client["utils"].sql(self.schema)
+
+    @property
+    def embeddings(self) -> Embeddings:
+        return self.embedding
+
+    def add_texts(
+        self,
+        texts: Iterable[str],
+        metadatas: Optional[List[dict]] = None,
+        *,
+        batch_size: int = 32,
+        text_ids: Optional[List[str]] = None,
+        **kwargs: Any,
+    ) -> List[str]:
+        """
+        Insert more texts through the embeddings and add to the VectorStore.
+
+        Args:
+            texts: Iterable of strings to add to the VectorStore
+            metadata: Optional column data to be inserted
+            batch_size: Batch size of insertion
+            ids: Optional list of ids to associate with the texts
+
+        Returns:
+            List of ids from adding the texts into the VectorStore.
+        """
+        # Embed and create the documents
+        ids = text_ids or [
+            # See https://stackoverflow.com/questions/67219691/python-hash-function-that-returns-32-or-64-bits
+            str(int(sha1(t.encode("utf-8")).hexdigest()[:15], 16))
+            for t in texts
+        ]
+        transac = []
+        for i, text in enumerate(texts):
+            embed = self.embeddings.embed_query(text)
+            doc_uuid = str(uuid.uuid1())
+            doc = {
+                self.config.column_map["document"]: text,
+                self.config.column_map["embedding"]: embed,
+                self.config.column_map["metadata"]: metadatas[i] if metadatas else {},
+                self.config.column_map["uuid"]: doc_uuid,
+            }
+            transac.append(
+                {"replace": {"index": self.config.table, "id": ids[i], "doc": doc}}
+            )
+
+            if len(transac) == batch_size:
+                body = "\n".join(map(json.dumps, transac))
+                try:
+                    self.client["index"].bulk(body)
+                    transac = []
+                except Exception as e:
+                    logger.info(f"Error indexing documents: {e}")
+
+        if len(transac) > 0:
+            body = "\n".join(map(json.dumps, transac))
+            try:
+                self.client["index"].bulk(body)
+            except Exception as e:
+                logger.info(f"Error indexing documents: {e}")
+
+        return ids
+
+    @classmethod
+    def from_texts(
+        cls: Type[ManticoreSearch],
+        texts: List[str],
+        embedding: Embeddings,
+        metadatas: Optional[List[Dict[Any, Any]]] = None,
+        *,
+        config: Optional[ManticoreSearchSettings] = None,
+        text_ids: Optional[List[str]] = None,
+        batch_size: int = 32,
+        **kwargs: Any,
+    ) -> ManticoreSearch:
+        ctx = cls(embedding, config=config, **kwargs)
+        ctx.add_texts(
+            texts=texts,
+            embedding=embedding,
+            text_ids=text_ids,
+            batch_size=batch_size,
+            metadatas=metadatas,
+            **kwargs,
+        )
+        return ctx
+
+    @classmethod
+    def from_documents(
+        cls: Type[ManticoreSearch],
+        documents: List[Document],
+        embedding: Embeddings,
+        *,
+        config: Optional[ManticoreSearchSettings] = None,
+        text_ids: Optional[List[str]] = None,
+        batch_size: int = 32,
+        **kwargs: Any,
+    ) -> ManticoreSearch:
+        texts = [doc.page_content for doc in documents]
+        metadatas = [doc.metadata for doc in documents]
+        return cls.from_texts(
+            texts=texts,
+            embedding=embedding,
+            text_ids=text_ids,
+            batch_size=batch_size,
+            metadatas=metadatas,
+            **kwargs,
+        )
+
+    def __repr__(self) -> str:
+        """
+        Text representation for ManticoreSearch Vector Store, prints backends, username
+        and schemas. Easy to use with `str(ManticoreSearch())`
+
+        Returns:
+            repr: string to show connection info and data schema
+        """
+        _repr = f"\033[92m\033[1m{self.config.table} @ "
+        _repr += f"http://{self.config.host}:{self.config.port}\033[0m\n\n"
+        _repr += f"\033[1musername: {self.config.username}\033[0m\n\nTable Schema:\n"
+        _repr += "-" * 51 + "\n"
+        for r in self.client["utils"].sql(f"DESCRIBE {self.config.table}")[0]["data"]:
+            _repr += (
+                f"|\033[94m{r['Field']:24s}\033[0m|\033["
+                f"96m{r['Type'] + ' ' + r['Properties']:24s}\033[0m|\n"
+            )
+        _repr += "-" * 51 + "\n"
+        return _repr
+
+    def similarity_search(
+        self, query: str, k: int = DEFAULT_K, **kwargs: Any
+    ) -> List[Document]:
+        """Perform a similarity search with ManticoreSearch
+
+        Args:
+            query (str): query string
+            k (int, optional): Top K neighbors to retrieve. Defaults to 4.
+
+        Returns:
+            List[Document]: List of Documents
+        """
+        return self.similarity_search_by_vector(
+            self.embedding.embed_query(query), k, **kwargs
+        )
+
+    def similarity_search_by_vector(
+        self,
+        embedding: List[float],
+        k: int = DEFAULT_K,
+        **kwargs: Any,
+    ) -> List[Document]:
+        """Perform a similarity search with ManticoreSearch by vectors
+
+        Args:
+            embedding (List[float]): Embedding vector
+            k (int, optional): Top K neighbors to retrieve. Defaults to 4.
+
+        Returns:
+            List[Document]: List of documents
+        """
+
+        # Build search request
+        request = {
+            "index": self.config.table,
+            "knn": {
+                "field": self.config.column_map["embedding"],
+                "k": k,
+                "query_vector": embedding,
+            },
+        }
+
+        # Execute request and convert response to langchain.Document format
+        try:
+            return [
+                Document(
+                    page_content=r["_source"][self.config.column_map["document"]],
+                    metadata=r["_source"][self.config.column_map["metadata"]],
+                )
+                for r in self.client["search"].search(request, **kwargs).hits.hits[:k]
+            ]
+        except Exception as e:
+            logger.error(f"\033[91m\033[1m{type(e)}\033[0m \033[95m{str(e)}\033[0m")
+            return []
+
+    def drop(self) -> None:
+        """
+        Helper function: Drop data
+        """
+        self.client["utils"].sql(f"DROP TABLE IF EXISTS {self.config.table}")
+
+    @property
+    def metadata_column(self) -> str:
+        return self.config.column_map["metadata"]
--- a/libs/community/tests/unit_tests/vectorstores/test_imports.py
+++ b/libs/community/tests/unit_tests/vectorstores/test_imports.py
@@ -50,6 +50,8 @@ EXPECTED_ALL = [
    "LLMRails",
    "LanceDB",
    "Lantern",
+    "ManticoreSearch",
+    "ManticoreSearchSettings",
    "Marqo",
    "MatchingEngine",
    "Meilisearch",
@@ -112,6 +114,7 @@ def test_all_imports_exclusive() -> None:
            "PathwayVectorClient",
            "DistanceStrategy",
            "KineticaSettings",
+            "ManticoreSearchSettings",
        ]:
            assert issubclass(getattr(vectorstores, cls), VectorStore)

--- a/libs/community/tests/unit_tests/vectorstores/test_public_api.py
+++ b/libs/community/tests/unit_tests/vectorstores/test_public_api.py
@@ -1,106 +0,0 @@
-"""Test the public API of the tools package."""
-from langchain_community.vectorstores import __all__ as public_api
-
-_EXPECTED = [
-    "Aerospike",
-    "AlibabaCloudOpenSearch",
-    "AlibabaCloudOpenSearchSettings",
-    "AnalyticDB",
-    "Annoy",
-    "ApacheDoris",
-    "AtlasDB",
-    "AwaDB",
-    "AzureSearch",
-    "Bagel",
-    "BaiduVectorDB",
-    "BESVectorStore",
-    "BigQueryVectorSearch",
-    "Cassandra",
-    "AstraDB",
-    "Chroma",
-    "Clarifai",
-    "Clickhouse",
-    "ClickhouseSettings",
-    "DashVector",
-    "DatabricksVectorSearch",
-    "DeepLake",
-    "Dingo",
-    "DistanceStrategy",
-    "DocArrayHnswSearch",
-    "DocArrayInMemorySearch",
-    "DocumentDBVectorSearch",
-    "DuckDB",
-    "EcloudESVectorStore",
-    "ElasticKnnSearch",
-    "ElasticVectorSearch",
-    "ElasticsearchStore",
-    "Epsilla",
-    "FAISS",
-    "HanaDB",
-    "Hologres",
-    "InfinispanVS",
-    "InMemoryVectorStore",
-    "KDBAI",
-    "Kinetica",
-    "KineticaSettings",
-    "LanceDB",
-    "Lantern",
-    "LLMRails",
-    "Marqo",
-    "MatchingEngine",
-    "Meilisearch",
-    "Milvus",
-    "MomentoVectorIndex",
-    "MongoDBAtlasVectorSearch",
-    "MyScale",
-    "MyScaleSettings",
-    "Neo4jVector",
-    "OpenSearchVectorSearch",
-    "OracleVS",
-    "PGEmbedding",
-    "PGVector",
-    "PathwayVectorClient",
-    "Pinecone",
-    "Qdrant",
-    "Redis",
-    "Relyt",
-    "Rockset",
-    "SKLearnVectorStore",
-    "ScaNN",
-    "SemaDB",
-    "SingleStoreDB",
-    "SQLiteVSS",
-    "StarRocks",
-    "SupabaseVectorStore",
-    "SurrealDBStore",
-    "Tair",
-    "TiDBVectorStore",
-    "TileDB",
-    "Tigris",
-    "TimescaleVector",
-    "Typesense",
-    "UpstashVectorStore",
-    "USearch",
-    "Vald",
-    "VDMS",
-    "Vearch",
-    "Vectara",
-    "VespaStore",
-    "VLite",
-    "Weaviate",
-    "ZepVectorStore",
-    "Zilliz",
-    "TencentVectorDB",
-    "AzureCosmosDBVectorSearch",
-    "VectorStore",
-    "Yellowbrick",
-    "NeuralDBClientVectorStore",
-    "NeuralDBVectorStore",
-    "CouchbaseVectorStore",
-]
-
-
-def test_public_api() -> None:
-    """Test for regressions or changes in the public API."""
-    # Check that the public API is as expected
-    assert set(public_api) == set(_EXPECTED)