1
0
mirror of https://github.com/hwchase17/langchain.git synced 2025-05-04 06:37:58 +00:00

community[minor]: ManticoreSearch engine added to vectorstore ()

**Description:** ManticoreSearch engine added to vectorstores
**Issue:** no issue, just a new feature
**Dependencies:** https://pypi.org/project/manticoresearch-dev/
**Twitter handle:** @EvilFreelancer

- Example notebook with test integration:

https://github.com/EvilFreelancer/langchain/blob/manticore-search-vectorstore/docs/docs/integrations/vectorstores/manticore_search.ipynb

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
Pavel Zloi 2024-05-23 23:56:18 +03:00 committed by GitHub
parent 95c3e5f85f
commit fe26f937e4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 826 additions and 106 deletions
docs/docs/integrations/vectorstores
libs/community
langchain_community/vectorstores
tests/unit_tests/vectorstores

View File

@ -0,0 +1,443 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "bf48a5c8c3d125e1",
"metadata": {
"collapsed": false
},
"source": [
"# ManticoreSearch VectorStore\n",
"\n",
"[ManticoreSearch](https://manticoresearch.com/) is an open-source search engine that offers fast, scalable, and user-friendly capabilities. Originating as a fork of [Sphinx Search](http://sphinxsearch.com/), it has evolved to incorporate modern search engine features and improvements. ManticoreSearch distinguishes itself with its robust performance and ease of integration into various applications.\n",
"\n",
"ManticoreSearch has recently introduced [vector search capabilities](https://manual.manticoresearch.com/dev/Searching/KNN), starting with search engine version 6.2 and only with [manticore-columnar-lib](https://github.com/manticoresoftware/columnar) package installed. This feature is a considerable advancement, allowing for the execution of searches based on vector similarity.\n",
"\n",
"As of now, the vector search functionality is only accessible in the developmental (dev) versions of the search engine. Consequently, it is imperative to employ a developmental [manticoresearch-dev](https://pypi.org/project/manticoresearch-dev/) Python client for utilizing this feature effectively."
]
},
{
"cell_type": "markdown",
"id": "d5050b607ca217ad",
"metadata": {
"collapsed": false
},
"source": [
"## Setting up environments"
]
},
{
"cell_type": "markdown",
"id": "b26c5ab7f89a61fc",
"metadata": {
"collapsed": false
},
"source": [
"Starting Docker-container with ManticoreSearch and installing manticore-columnar-lib package (optional)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "initial_id",
"metadata": {
"ExecuteTime": {
"end_time": "2024-03-03T11:28:37.177840Z",
"start_time": "2024-03-03T11:28:26.863511Z"
},
"collapsed": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Get:1 http://repo.manticoresearch.com/repository/manticoresearch_jammy_dev jammy InRelease [3525 kB]\r\n",
"Get:2 http://archive.ubuntu.com/ubuntu jammy InRelease [270 kB] \r\n",
"Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB] \r\n",
"Get:4 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB] \r\n",
"Get:5 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [1074 kB]\r\n",
"Get:6 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [109 kB] \r\n",
"Get:7 http://archive.ubuntu.com/ubuntu jammy/universe amd64 Packages [17.5 MB] \r\n",
"Get:8 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [1517 kB]\r\n",
"Get:9 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [1889 kB]\r\n",
"Get:10 http://security.ubuntu.com/ubuntu jammy-security/multiverse amd64 Packages [44.6 kB]\r\n",
"Get:11 http://archive.ubuntu.com/ubuntu jammy/restricted amd64 Packages [164 kB]\r\n",
"Get:12 http://archive.ubuntu.com/ubuntu jammy/multiverse amd64 Packages [266 kB]\r\n",
"Get:13 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages [1792 kB] \r\n",
"Get:14 http://archive.ubuntu.com/ubuntu jammy-updates/multiverse amd64 Packages [50.4 kB]\r\n",
"Get:15 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages [1927 kB]\r\n",
"Get:16 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1346 kB]\r\n",
"Get:17 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [1796 kB]\r\n",
"Get:18 http://archive.ubuntu.com/ubuntu jammy-backports/universe amd64 Packages [28.1 kB]\r\n",
"Get:19 http://archive.ubuntu.com/ubuntu jammy-backports/main amd64 Packages [50.4 kB]\r\n",
"Get:20 http://repo.manticoresearch.com/repository/manticoresearch_jammy_dev jammy/main amd64 Packages [5020 kB]\r\n",
"Fetched 38.6 MB in 7s (5847 kB/s) \r\n",
"Reading package lists... Done\r\n",
"Reading package lists... Done\r\n",
"Building dependency tree... Done\r\n",
"Reading state information... Done\r\n",
"The following NEW packages will be installed:\r\n",
" manticore-columnar-lib\r\n",
"0 upgraded, 1 newly installed, 0 to remove and 21 not upgraded.\r\n",
"Need to get 1990 kB of archives.\r\n",
"After this operation, 10.0 MB of additional disk space will be used.\r\n",
"Get:1 http://repo.manticoresearch.com/repository/manticoresearch_jammy_dev jammy/main amd64 manticore-columnar-lib amd64 2.2.5-240217-a5342a1 [1990 kB]\r\n",
"Fetched 1990 kB in 1s (1505 kB/s) \r\n",
"debconf: delaying package configuration, since apt-utils is not installed\r\n",
"Selecting previously unselected package manticore-columnar-lib.\r\n",
"(Reading database ... 12260 files and directories currently installed.)\r\n",
"Preparing to unpack .../manticore-columnar-lib_2.2.5-240217-a5342a1_amd64.deb ...\r\n",
"Unpacking manticore-columnar-lib (2.2.5-240217-a5342a1) ...\r\n",
"Setting up manticore-columnar-lib (2.2.5-240217-a5342a1) ...\r\n",
"a546aec22291\r\n"
]
}
],
"source": [
"import time\n",
"\n",
"# Start container\n",
"containers = !docker ps --filter \"name=langchain-manticoresearch-server\" -q\n",
"if len(containers) == 0:\n",
" !docker run -d -p 9308:9308 --name langchain-manticoresearch-server manticoresearch/manticore:dev\n",
" time.sleep(20) # Wait for the container to start up\n",
"\n",
"# Get ID of container\n",
"container_id = containers[0]\n",
"\n",
"# Install manticore-columnar-lib package as root user\n",
"!docker exec -it --user 0 {container_id} apt-get update\n",
"!docker exec -it --user 0 {container_id} apt-get install -y manticore-columnar-lib\n",
"\n",
"# Restart container\n",
"!docker restart {container_id}"
]
},
{
"cell_type": "markdown",
"id": "42284e4c8fd0aeb4",
"metadata": {
"collapsed": false
},
"source": [
"Installing ManticoreSearch python client"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "bc7bd70a63cc8d90",
"metadata": {
"ExecuteTime": {
"end_time": "2024-03-03T11:28:38.544198Z",
"start_time": "2024-03-03T11:28:37.178755Z"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\r\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\r\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\r\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install --upgrade --quiet manticoresearch-dev"
]
},
{
"cell_type": "markdown",
"id": "f90b4793255edcb1",
"metadata": {
"collapsed": false
},
"source": [
"We want to use OpenAIEmbeddings so we have to get the OpenAI API Key."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "a303c63186fd8abd",
"metadata": {
"ExecuteTime": {
"end_time": "2024-03-03T11:28:38.546877Z",
"start_time": "2024-03-03T11:28:38.544907Z"
},
"collapsed": false
},
"outputs": [],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain_community.embeddings import GPT4AllEmbeddings\n",
"from langchain_community.vectorstores import ManticoreSearch, ManticoreSearchSettings"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "46ad30f36815ed15",
"metadata": {
"ExecuteTime": {
"end_time": "2024-03-03T11:28:38.991083Z",
"start_time": "2024-03-03T11:28:38.547705Z"
},
"collapsed": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Created a chunk of size 338, which is longer than the specified 100\n",
"Created a chunk of size 508, which is longer than the specified 100\n",
"Created a chunk of size 277, which is longer than the specified 100\n",
"Created a chunk of size 777, which is longer than the specified 100\n",
"Created a chunk of size 247, which is longer than the specified 100\n",
"Created a chunk of size 228, which is longer than the specified 100\n",
"Created a chunk of size 557, which is longer than the specified 100\n",
"Created a chunk of size 587, which is longer than the specified 100\n",
"Created a chunk of size 173, which is longer than the specified 100\n",
"Created a chunk of size 622, which is longer than the specified 100\n",
"Created a chunk of size 775, which is longer than the specified 100\n",
"Created a chunk of size 292, which is longer than the specified 100\n",
"Created a chunk of size 456, which is longer than the specified 100\n",
"Created a chunk of size 291, which is longer than the specified 100\n",
"Created a chunk of size 367, which is longer than the specified 100\n",
"Created a chunk of size 604, which is longer than the specified 100\n",
"Created a chunk of size 618, which is longer than the specified 100\n",
"Created a chunk of size 340, which is longer than the specified 100\n",
"Created a chunk of size 395, which is longer than the specified 100\n",
"Created a chunk of size 321, which is longer than the specified 100\n",
"Created a chunk of size 453, which is longer than the specified 100\n",
"Created a chunk of size 354, which is longer than the specified 100\n",
"Created a chunk of size 481, which is longer than the specified 100\n",
"Created a chunk of size 233, which is longer than the specified 100\n",
"Created a chunk of size 270, which is longer than the specified 100\n",
"Created a chunk of size 305, which is longer than the specified 100\n",
"Created a chunk of size 520, which is longer than the specified 100\n",
"Created a chunk of size 289, which is longer than the specified 100\n",
"Created a chunk of size 280, which is longer than the specified 100\n",
"Created a chunk of size 417, which is longer than the specified 100\n",
"Created a chunk of size 495, which is longer than the specified 100\n",
"Created a chunk of size 602, which is longer than the specified 100\n",
"Created a chunk of size 1004, which is longer than the specified 100\n",
"Created a chunk of size 272, which is longer than the specified 100\n",
"Created a chunk of size 1203, which is longer than the specified 100\n",
"Created a chunk of size 844, which is longer than the specified 100\n",
"Created a chunk of size 135, which is longer than the specified 100\n",
"Created a chunk of size 306, which is longer than the specified 100\n",
"Created a chunk of size 407, which is longer than the specified 100\n",
"Created a chunk of size 910, which is longer than the specified 100\n",
"Created a chunk of size 398, which is longer than the specified 100\n",
"Created a chunk of size 674, which is longer than the specified 100\n",
"Created a chunk of size 356, which is longer than the specified 100\n",
"Created a chunk of size 474, which is longer than the specified 100\n",
"Created a chunk of size 814, which is longer than the specified 100\n",
"Created a chunk of size 530, which is longer than the specified 100\n",
"Created a chunk of size 469, which is longer than the specified 100\n",
"Created a chunk of size 489, which is longer than the specified 100\n",
"Created a chunk of size 433, which is longer than the specified 100\n",
"Created a chunk of size 603, which is longer than the specified 100\n",
"Created a chunk of size 380, which is longer than the specified 100\n",
"Created a chunk of size 354, which is longer than the specified 100\n",
"Created a chunk of size 391, which is longer than the specified 100\n",
"Created a chunk of size 772, which is longer than the specified 100\n",
"Created a chunk of size 267, which is longer than the specified 100\n",
"Created a chunk of size 571, which is longer than the specified 100\n",
"Created a chunk of size 594, which is longer than the specified 100\n",
"Created a chunk of size 458, which is longer than the specified 100\n",
"Created a chunk of size 386, which is longer than the specified 100\n",
"Created a chunk of size 417, which is longer than the specified 100\n",
"Created a chunk of size 370, which is longer than the specified 100\n",
"Created a chunk of size 402, which is longer than the specified 100\n",
"Created a chunk of size 306, which is longer than the specified 100\n",
"Created a chunk of size 173, which is longer than the specified 100\n",
"Created a chunk of size 628, which is longer than the specified 100\n",
"Created a chunk of size 321, which is longer than the specified 100\n",
"Created a chunk of size 294, which is longer than the specified 100\n",
"Created a chunk of size 689, which is longer than the specified 100\n",
"Created a chunk of size 641, which is longer than the specified 100\n",
"Created a chunk of size 473, which is longer than the specified 100\n",
"Created a chunk of size 414, which is longer than the specified 100\n",
"Created a chunk of size 585, which is longer than the specified 100\n",
"Created a chunk of size 764, which is longer than the specified 100\n",
"Created a chunk of size 502, which is longer than the specified 100\n",
"Created a chunk of size 640, which is longer than the specified 100\n",
"Created a chunk of size 507, which is longer than the specified 100\n",
"Created a chunk of size 564, which is longer than the specified 100\n",
"Created a chunk of size 707, which is longer than the specified 100\n",
"Created a chunk of size 380, which is longer than the specified 100\n",
"Created a chunk of size 615, which is longer than the specified 100\n",
"Created a chunk of size 733, which is longer than the specified 100\n",
"Created a chunk of size 277, which is longer than the specified 100\n",
"Created a chunk of size 497, which is longer than the specified 100\n",
"Created a chunk of size 625, which is longer than the specified 100\n",
"Created a chunk of size 468, which is longer than the specified 100\n",
"Created a chunk of size 289, which is longer than the specified 100\n",
"Created a chunk of size 576, which is longer than the specified 100\n",
"Created a chunk of size 297, which is longer than the specified 100\n",
"Created a chunk of size 534, which is longer than the specified 100\n",
"Created a chunk of size 427, which is longer than the specified 100\n",
"Created a chunk of size 412, which is longer than the specified 100\n",
"Created a chunk of size 381, which is longer than the specified 100\n",
"Created a chunk of size 417, which is longer than the specified 100\n",
"Created a chunk of size 244, which is longer than the specified 100\n",
"Created a chunk of size 307, which is longer than the specified 100\n",
"Created a chunk of size 528, which is longer than the specified 100\n",
"Created a chunk of size 565, which is longer than the specified 100\n",
"Created a chunk of size 487, which is longer than the specified 100\n",
"Created a chunk of size 470, which is longer than the specified 100\n",
"Created a chunk of size 332, which is longer than the specified 100\n",
"Created a chunk of size 552, which is longer than the specified 100\n",
"Created a chunk of size 427, which is longer than the specified 100\n",
"Created a chunk of size 596, which is longer than the specified 100\n",
"Created a chunk of size 192, which is longer than the specified 100\n",
"Created a chunk of size 403, which is longer than the specified 100\n",
"Created a chunk of size 255, which is longer than the specified 100\n",
"Created a chunk of size 1025, which is longer than the specified 100\n",
"Created a chunk of size 438, which is longer than the specified 100\n",
"Created a chunk of size 900, which is longer than the specified 100\n",
"Created a chunk of size 250, which is longer than the specified 100\n",
"Created a chunk of size 614, which is longer than the specified 100\n",
"Created a chunk of size 635, which is longer than the specified 100\n",
"Created a chunk of size 443, which is longer than the specified 100\n",
"Created a chunk of size 478, which is longer than the specified 100\n",
"Created a chunk of size 473, which is longer than the specified 100\n",
"Created a chunk of size 302, which is longer than the specified 100\n",
"Created a chunk of size 549, which is longer than the specified 100\n",
"Created a chunk of size 644, which is longer than the specified 100\n",
"Created a chunk of size 402, which is longer than the specified 100\n",
"Created a chunk of size 489, which is longer than the specified 100\n",
"Created a chunk of size 551, which is longer than the specified 100\n",
"Created a chunk of size 527, which is longer than the specified 100\n",
"Created a chunk of size 563, which is longer than the specified 100\n",
"Created a chunk of size 472, which is longer than the specified 100\n",
"Created a chunk of size 511, which is longer than the specified 100\n",
"Created a chunk of size 419, which is longer than the specified 100\n",
"Created a chunk of size 245, which is longer than the specified 100\n",
"Created a chunk of size 371, which is longer than the specified 100\n",
"Created a chunk of size 484, which is longer than the specified 100\n",
"Created a chunk of size 306, which is longer than the specified 100\n",
"Created a chunk of size 190, which is longer than the specified 100\n",
"Created a chunk of size 499, which is longer than the specified 100\n",
"Created a chunk of size 480, which is longer than the specified 100\n",
"Created a chunk of size 634, which is longer than the specified 100\n",
"Created a chunk of size 611, which is longer than the specified 100\n",
"Created a chunk of size 356, which is longer than the specified 100\n",
"Created a chunk of size 478, which is longer than the specified 100\n",
"Created a chunk of size 369, which is longer than the specified 100\n",
"Created a chunk of size 526, which is longer than the specified 100\n",
"Created a chunk of size 311, which is longer than the specified 100\n",
"Created a chunk of size 181, which is longer than the specified 100\n",
"Created a chunk of size 637, which is longer than the specified 100\n",
"Created a chunk of size 219, which is longer than the specified 100\n",
"Created a chunk of size 305, which is longer than the specified 100\n",
"Created a chunk of size 409, which is longer than the specified 100\n",
"Created a chunk of size 235, which is longer than the specified 100\n",
"Created a chunk of size 302, which is longer than the specified 100\n",
"Created a chunk of size 236, which is longer than the specified 100\n",
"Created a chunk of size 209, which is longer than the specified 100\n",
"Created a chunk of size 366, which is longer than the specified 100\n",
"Created a chunk of size 277, which is longer than the specified 100\n",
"Created a chunk of size 591, which is longer than the specified 100\n",
"Created a chunk of size 232, which is longer than the specified 100\n",
"Created a chunk of size 543, which is longer than the specified 100\n",
"Created a chunk of size 199, which is longer than the specified 100\n",
"Created a chunk of size 214, which is longer than the specified 100\n",
"Created a chunk of size 263, which is longer than the specified 100\n",
"Created a chunk of size 375, which is longer than the specified 100\n",
"Created a chunk of size 221, which is longer than the specified 100\n",
"Created a chunk of size 261, which is longer than the specified 100\n",
"Created a chunk of size 203, which is longer than the specified 100\n",
"Created a chunk of size 758, which is longer than the specified 100\n",
"Created a chunk of size 271, which is longer than the specified 100\n",
"Created a chunk of size 323, which is longer than the specified 100\n",
"Created a chunk of size 275, which is longer than the specified 100\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"bert_load_from_file: gguf version = 2\n",
"bert_load_from_file: gguf alignment = 32\n",
"bert_load_from_file: gguf data offset = 695552\n",
"bert_load_from_file: model name = BERT\n",
"bert_load_from_file: model architecture = bert\n",
"bert_load_from_file: model file type = 1\n",
"bert_load_from_file: bert tokenizer vocab = 30522\n"
]
}
],
"source": [
"from langchain_community.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader(\"../../modules/paul_graham_essay.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = GPT4AllEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "a06370cae96cbaef",
"metadata": {
"ExecuteTime": {
"end_time": "2024-03-03T11:28:42.366398Z",
"start_time": "2024-03-03T11:28:38.991827Z"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Document(page_content='Computer Science is an uneasy alliance between two halves, theory and systems. The theory people prove things, and the systems people build things. I wanted to build things. I had plenty of respect for theory — indeed, a sneaking suspicion that it was the more admirable of the two halves — but building things seemed so much more exciting.', metadata={'some': 'metadata'}), Document(page_content=\"I applied to 3 grad schools: MIT and Yale, which were renowned for AI at the time, and Harvard, which I'd visited because Rich Draves went there, and was also home to Bill Woods, who'd invented the type of parser I used in my SHRDLU clone. Only Harvard accepted me, so that was where I went.\", metadata={'some': 'metadata'}), Document(page_content='For my undergraduate thesis, I reverse-engineered SHRDLU. My God did I love working on that program. It was a pleasing bit of code, but what made it even more exciting was my belief — hard to imagine now, but not unique in 1985 — that it was already climbing the lower slopes of intelligence.', metadata={'some': 'metadata'}), Document(page_content=\"The problem with systems work, though, was that it didn't last. Any program you wrote today, no matter how good, would be obsolete in a couple decades at best. People might mention your software in footnotes, but no one would actually use it. And indeed, it would seem very feeble work. Only people with a sense of the history of the field would even realize that, in its time, it had been good.\", metadata={'some': 'metadata'})]\n"
]
}
],
"source": [
"for d in docs:\n",
" d.metadata = {\"some\": \"metadata\"}\n",
"settings = ManticoreSearchSettings(table=\"manticoresearch_vector_search_example\")\n",
"docsearch = ManticoreSearch.from_documents(docs, embeddings, config=settings)\n",
"\n",
"query = \"Robert Morris is\"\n",
"docs = docsearch.similarity_search(query)\n",
"print(docs)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -153,6 +153,10 @@ if TYPE_CHECKING:
from langchain_community.vectorstores.llm_rails import (
LLMRails,
)
from langchain_community.vectorstores.manticore_search import (
ManticoreSearch,
ManticoreSearchSettings,
)
from langchain_community.vectorstores.marqo import (
Marqo,
)
@ -341,6 +345,8 @@ __all__ = [
"LLMRails",
"LanceDB",
"Lantern",
"ManticoreSearch",
"ManticoreSearchSettings",
"Marqo",
"MatchingEngine",
"Meilisearch",
@ -439,6 +445,8 @@ _module_lookup = {
"LLMRails": "langchain_community.vectorstores.llm_rails",
"LanceDB": "langchain_community.vectorstores.lancedb",
"Lantern": "langchain_community.vectorstores.lantern",
"ManticoreSearch": "langchain_community.vectorstores.manticore_search",
"ManticoreSearchSettings": "langchain_community.vectorstores.manticore_search",
"Marqo": "langchain_community.vectorstores.marqo",
"MatchingEngine": "langchain_community.vectorstores.matching_engine",
"Meilisearch": "langchain_community.vectorstores.meilisearch",

View File

@ -0,0 +1,372 @@
from __future__ import annotations
import json
import logging
import uuid
from hashlib import sha1
from typing import Any, Dict, Iterable, List, Optional, Type
from langchain_core.documents import Document
from langchain_core.embeddings import Embeddings
from langchain_core.pydantic_v1 import BaseSettings
from langchain_core.vectorstores import VectorStore
logger = logging.getLogger()
DEFAULT_K = 4 # Number of Documents to return.
class ManticoreSearchSettings(BaseSettings):
proto: str = "http"
host: str = "localhost"
port: int = 9308
username: Optional[str] = None
password: Optional[str] = None
# database: str = "Manticore"
table: str = "langchain"
column_map: Dict[str, str] = {
"id": "id",
"uuid": "uuid",
"document": "document",
"embedding": "embedding",
"metadata": "metadata",
}
# A mandatory setting; currently, only hnsw is supported.
knn_type: str = "hnsw"
# A mandatory setting that specifies the dimensions of the vectors being indexed.
knn_dims: Optional[int] = None # Defaults autodetect
# A mandatory setting that specifies the distance function used by the HNSW index.
hnsw_similarity: str = "L2" # Acceptable values are: L2, IP, COSINE
# An optional setting that defines the maximum amount of outgoing connections
# in the graph.
hnsw_m: int = 16 # The default is 16.
# An optional setting that defines a construction time/accuracy trade-off.
hnsw_ef_construction = 100
def get_connection_string(self) -> str:
return self.proto + "://" + self.host + ":" + str(self.port)
def __getitem__(self, item: str) -> Any:
return getattr(self, item)
class Config:
env_file = ".env"
env_prefix = "manticore_"
env_file_encoding = "utf-8"
class ManticoreSearch(VectorStore):
"""
`ManticoreSearch Engine` vector store.
To use, you should have the ``manticoresearch`` python package installed.
Example:
.. code-block:: python
from langchain_community.vectorstores import Manticore
from langchain_community.embeddings.openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
vectorstore = ManticoreSearch(embeddings)
"""
def __init__(
self,
embedding: Embeddings,
*,
config: Optional[ManticoreSearchSettings] = None,
**kwargs: Any,
) -> None:
"""
ManticoreSearch Wrapper to LangChain
Args:
embedding (Embeddings): Text embedding model.
config (ManticoreSearchSettings): Configuration of ManticoreSearch Client
**kwargs: Other keyword arguments will pass into Configuration of API client
manticoresearch-python. See
https://github.com/manticoresoftware/manticoresearch-python for more.
"""
try:
import manticoresearch.api as ENDPOINTS
import manticoresearch.api_client as API
except ImportError:
raise ImportError(
"Could not import manticoresearch python package. "
"Please install it with `pip install manticoresearch-dev`."
)
try:
from tqdm import tqdm
self.pgbar = tqdm
except ImportError:
# Just in case if tqdm is not installed
self.pgbar = lambda x, **kwargs: x
super().__init__()
self.embedding = embedding
if config is not None:
self.config = config
else:
self.config = ManticoreSearchSettings()
assert self.config
assert self.config.host and self.config.port
assert (
self.config.column_map
# and self.config.database
and self.config.table
)
assert (
self.config.knn_type
# and self.config.knn_dims
# and self.config.hnsw_m
# and self.config.hnsw_ef_construction
and self.config.hnsw_similarity
)
for k in ["id", "embedding", "document", "metadata", "uuid"]:
assert k in self.config.column_map
# Detect embeddings dimension
if self.config.knn_dims is None:
self.dim: int = len(self.embedding.embed_query("test"))
else:
self.dim = self.config.knn_dims
# Initialize the schema
self.schema = f"""\
CREATE TABLE IF NOT EXISTS {self.config.table}(
{self.config.column_map['id']} bigint,
{self.config.column_map['document']} text indexed stored,
{self.config.column_map['embedding']} \
float_vector knn_type='{self.config.knn_type}' \
knn_dims='{self.dim}' \
hnsw_similarity='{self.config.hnsw_similarity}' \
hnsw_m='{self.config.hnsw_m}' \
hnsw_ef_construction='{self.config.hnsw_ef_construction}',
{self.config.column_map['metadata']} json,
{self.config.column_map['uuid']} text indexed stored
)\
"""
# Create a connection to ManticoreSearch
self.configuration = API.Configuration(
host=self.config.get_connection_string(),
username=self.config.username,
password=self.config.password,
# disabled_client_side_validations=",",
**kwargs,
)
self.connection = API.ApiClient(self.configuration)
self.client = {
"index": ENDPOINTS.IndexApi(self.connection),
"utils": ENDPOINTS.UtilsApi(self.connection),
"search": ENDPOINTS.SearchApi(self.connection),
}
# Create default schema if not exists
self.client["utils"].sql(self.schema)
@property
def embeddings(self) -> Embeddings:
return self.embedding
def add_texts(
self,
texts: Iterable[str],
metadatas: Optional[List[dict]] = None,
*,
batch_size: int = 32,
text_ids: Optional[List[str]] = None,
**kwargs: Any,
) -> List[str]:
"""
Insert more texts through the embeddings and add to the VectorStore.
Args:
texts: Iterable of strings to add to the VectorStore
metadata: Optional column data to be inserted
batch_size: Batch size of insertion
ids: Optional list of ids to associate with the texts
Returns:
List of ids from adding the texts into the VectorStore.
"""
# Embed and create the documents
ids = text_ids or [
# See https://stackoverflow.com/questions/67219691/python-hash-function-that-returns-32-or-64-bits
str(int(sha1(t.encode("utf-8")).hexdigest()[:15], 16))
for t in texts
]
transac = []
for i, text in enumerate(texts):
embed = self.embeddings.embed_query(text)
doc_uuid = str(uuid.uuid1())
doc = {
self.config.column_map["document"]: text,
self.config.column_map["embedding"]: embed,
self.config.column_map["metadata"]: metadatas[i] if metadatas else {},
self.config.column_map["uuid"]: doc_uuid,
}
transac.append(
{"replace": {"index": self.config.table, "id": ids[i], "doc": doc}}
)
if len(transac) == batch_size:
body = "\n".join(map(json.dumps, transac))
try:
self.client["index"].bulk(body)
transac = []
except Exception as e:
logger.info(f"Error indexing documents: {e}")
if len(transac) > 0:
body = "\n".join(map(json.dumps, transac))
try:
self.client["index"].bulk(body)
except Exception as e:
logger.info(f"Error indexing documents: {e}")
return ids
@classmethod
def from_texts(
cls: Type[ManticoreSearch],
texts: List[str],
embedding: Embeddings,
metadatas: Optional[List[Dict[Any, Any]]] = None,
*,
config: Optional[ManticoreSearchSettings] = None,
text_ids: Optional[List[str]] = None,
batch_size: int = 32,
**kwargs: Any,
) -> ManticoreSearch:
ctx = cls(embedding, config=config, **kwargs)
ctx.add_texts(
texts=texts,
embedding=embedding,
text_ids=text_ids,
batch_size=batch_size,
metadatas=metadatas,
**kwargs,
)
return ctx
@classmethod
def from_documents(
cls: Type[ManticoreSearch],
documents: List[Document],
embedding: Embeddings,
*,
config: Optional[ManticoreSearchSettings] = None,
text_ids: Optional[List[str]] = None,
batch_size: int = 32,
**kwargs: Any,
) -> ManticoreSearch:
texts = [doc.page_content for doc in documents]
metadatas = [doc.metadata for doc in documents]
return cls.from_texts(
texts=texts,
embedding=embedding,
text_ids=text_ids,
batch_size=batch_size,
metadatas=metadatas,
**kwargs,
)
def __repr__(self) -> str:
"""
Text representation for ManticoreSearch Vector Store, prints backends, username
and schemas. Easy to use with `str(ManticoreSearch())`
Returns:
repr: string to show connection info and data schema
"""
_repr = f"\033[92m\033[1m{self.config.table} @ "
_repr += f"http://{self.config.host}:{self.config.port}\033[0m\n\n"
_repr += f"\033[1musername: {self.config.username}\033[0m\n\nTable Schema:\n"
_repr += "-" * 51 + "\n"
for r in self.client["utils"].sql(f"DESCRIBE {self.config.table}")[0]["data"]:
_repr += (
f"|\033[94m{r['Field']:24s}\033[0m|\033["
f"96m{r['Type'] + ' ' + r['Properties']:24s}\033[0m|\n"
)
_repr += "-" * 51 + "\n"
return _repr
def similarity_search(
self, query: str, k: int = DEFAULT_K, **kwargs: Any
) -> List[Document]:
"""Perform a similarity search with ManticoreSearch
Args:
query (str): query string
k (int, optional): Top K neighbors to retrieve. Defaults to 4.
Returns:
List[Document]: List of Documents
"""
return self.similarity_search_by_vector(
self.embedding.embed_query(query), k, **kwargs
)
def similarity_search_by_vector(
self,
embedding: List[float],
k: int = DEFAULT_K,
**kwargs: Any,
) -> List[Document]:
"""Perform a similarity search with ManticoreSearch by vectors
Args:
embedding (List[float]): Embedding vector
k (int, optional): Top K neighbors to retrieve. Defaults to 4.
Returns:
List[Document]: List of documents
"""
# Build search request
request = {
"index": self.config.table,
"knn": {
"field": self.config.column_map["embedding"],
"k": k,
"query_vector": embedding,
},
}
# Execute request and convert response to langchain.Document format
try:
return [
Document(
page_content=r["_source"][self.config.column_map["document"]],
metadata=r["_source"][self.config.column_map["metadata"]],
)
for r in self.client["search"].search(request, **kwargs).hits.hits[:k]
]
except Exception as e:
logger.error(f"\033[91m\033[1m{type(e)}\033[0m \033[95m{str(e)}\033[0m")
return []
def drop(self) -> None:
"""
Helper function: Drop data
"""
self.client["utils"].sql(f"DROP TABLE IF EXISTS {self.config.table}")
@property
def metadata_column(self) -> str:
return self.config.column_map["metadata"]

View File

@ -50,6 +50,8 @@ EXPECTED_ALL = [
"LLMRails",
"LanceDB",
"Lantern",
"ManticoreSearch",
"ManticoreSearchSettings",
"Marqo",
"MatchingEngine",
"Meilisearch",
@ -112,6 +114,7 @@ def test_all_imports_exclusive() -> None:
"PathwayVectorClient",
"DistanceStrategy",
"KineticaSettings",
"ManticoreSearchSettings",
]:
assert issubclass(getattr(vectorstores, cls), VectorStore)

View File

@ -1,106 +0,0 @@
"""Test the public API of the tools package."""
from langchain_community.vectorstores import __all__ as public_api
_EXPECTED = [
"Aerospike",
"AlibabaCloudOpenSearch",
"AlibabaCloudOpenSearchSettings",
"AnalyticDB",
"Annoy",
"ApacheDoris",
"AtlasDB",
"AwaDB",
"AzureSearch",
"Bagel",
"BaiduVectorDB",
"BESVectorStore",
"BigQueryVectorSearch",
"Cassandra",
"AstraDB",
"Chroma",
"Clarifai",
"Clickhouse",
"ClickhouseSettings",
"DashVector",
"DatabricksVectorSearch",
"DeepLake",
"Dingo",
"DistanceStrategy",
"DocArrayHnswSearch",
"DocArrayInMemorySearch",
"DocumentDBVectorSearch",
"DuckDB",
"EcloudESVectorStore",
"ElasticKnnSearch",
"ElasticVectorSearch",
"ElasticsearchStore",
"Epsilla",
"FAISS",
"HanaDB",
"Hologres",
"InfinispanVS",
"InMemoryVectorStore",
"KDBAI",
"Kinetica",
"KineticaSettings",
"LanceDB",
"Lantern",
"LLMRails",
"Marqo",
"MatchingEngine",
"Meilisearch",
"Milvus",
"MomentoVectorIndex",
"MongoDBAtlasVectorSearch",
"MyScale",
"MyScaleSettings",
"Neo4jVector",
"OpenSearchVectorSearch",
"OracleVS",
"PGEmbedding",
"PGVector",
"PathwayVectorClient",
"Pinecone",
"Qdrant",
"Redis",
"Relyt",
"Rockset",
"SKLearnVectorStore",
"ScaNN",
"SemaDB",
"SingleStoreDB",
"SQLiteVSS",
"StarRocks",
"SupabaseVectorStore",
"SurrealDBStore",
"Tair",
"TiDBVectorStore",
"TileDB",
"Tigris",
"TimescaleVector",
"Typesense",
"UpstashVectorStore",
"USearch",
"Vald",
"VDMS",
"Vearch",
"Vectara",
"VespaStore",
"VLite",
"Weaviate",
"ZepVectorStore",
"Zilliz",
"TencentVectorDB",
"AzureCosmosDBVectorSearch",
"VectorStore",
"Yellowbrick",
"NeuralDBClientVectorStore",
"NeuralDBVectorStore",
"CouchbaseVectorStore",
]
def test_public_api() -> None:
"""Test for regressions or changes in the public API."""
# Check that the public API is as expected
assert set(public_api) == set(_EXPECTED)