mirror of
https://github.com/hwchase17/langchain.git
synced 2026-04-03 19:04:23 +00:00
Use docusaurus versioning with a callout, merged master as well @hwchase17 @baskaryan --------- Signed-off-by: Weichen Xu <weichen.xu@databricks.com> Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com> Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru> Co-authored-by: Averi Kitsch <akitsch@google.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: Nuno Campos <nuno@boringbits.io> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Martín Gotelli Ferenaz <martingotelliferenaz@gmail.com> Co-authored-by: Fayfox <admin@fayfox.com> Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Dawson Bauer <105886620+djbauer2@users.noreply.github.com> Co-authored-by: Ravindu Somawansa <ravindu.somawansa@gmail.com> Co-authored-by: Dhruv Chawla <43818888+Dominastorm@users.noreply.github.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: WeichenXu <weichen.xu@databricks.com> Co-authored-by: Benito Geordie <89472452+benitoThree@users.noreply.github.com> Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com> Co-authored-by: Kartik Sarangmath <kartik@thirdai.com> Co-authored-by: Sevin F. Varoglu <sfvaroglu@octoml.ai> Co-authored-by: MacanPN <martin.triska@gmail.com> Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com> Co-authored-by: Hyeongchan Kim <kozistr@gmail.com> Co-authored-by: sdan <git@sdan.io> Co-authored-by: Guangdong Liu <liugddx@gmail.com> Co-authored-by: Rahul Triptahi <rahul.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: pjb157 <84070455+pjb157@users.noreply.github.com> Co-authored-by: Eun Hye Kim <ehkim1440@gmail.com> Co-authored-by: kaijietti <43436010+kaijietti@users.noreply.github.com> Co-authored-by: Pengcheng Liu <pcliu.fd@gmail.com> Co-authored-by: Tomer Cagan <tomer@tomercagan.com> Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
325 lines
8.7 KiB
Plaintext
325 lines
8.7 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "cffb482c-bbd8-4829-b185-0d930a5fe0bc",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Infinispan\n",
|
|
"\n",
|
|
"Infinispan is an open-source key-value data grid, it can work as single node as well as distributed.\n",
|
|
"\n",
|
|
"Vector search is supported since release 15.x\n",
|
|
"For more: [Infinispan Home](https://infinispan.org)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "03ec8f9a-7641-47ea-9fa0-f43ee9fc79a3",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Ensure that all we need is installed\n",
|
|
"# You may want to skip this\n",
|
|
"%pip install sentence-transformers\n",
|
|
"%pip install langchain\n",
|
|
"%pip install langchain_core\n",
|
|
"%pip install langchain_community"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "180d172e-cca1-481c-87d5-c4f14684604d",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Setup\n",
|
|
"\n",
|
|
"To run this demo we need a running Infinispan instance without authentication and a data file.\n",
|
|
"In the next three cells we're going to:\n",
|
|
"- download the data file\n",
|
|
"- create the configuration\n",
|
|
"- run Infinispan in docker"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9678d5ce-894c-4e28-bf68-20d45507122f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%bash\n",
|
|
"#get an archive of news\n",
|
|
"wget https://raw.githubusercontent.com/rigazilla/infinispan-vector/main/bbc_news.csv.gz"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "b251e66e-f056-4e81-a6b4-5f4d95b6537d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%bash\n",
|
|
"#create infinispan configuration file\n",
|
|
"echo 'infinispan:\n",
|
|
" cache-container: \n",
|
|
" name: default\n",
|
|
" transport: \n",
|
|
" cluster: cluster \n",
|
|
" stack: tcp \n",
|
|
" server:\n",
|
|
" interfaces:\n",
|
|
" interface:\n",
|
|
" name: public\n",
|
|
" inet-address:\n",
|
|
" value: 0.0.0.0 \n",
|
|
" socket-bindings:\n",
|
|
" default-interface: public\n",
|
|
" port-offset: 0 \n",
|
|
" socket-binding:\n",
|
|
" name: default\n",
|
|
" port: 11222\n",
|
|
" endpoints:\n",
|
|
" endpoint:\n",
|
|
" socket-binding: default\n",
|
|
" rest-connector:\n",
|
|
"' > infinispan-noauth.yaml"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "009da6d1-9d1a-4392-90f1-5c654dd12654",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!docker rm --force infinispanvs-demo\n",
|
|
"!docker run -d --name infinispanvs-demo -v $(pwd):/user-config -p 11222:11222 infinispan/server:15.0 -c /user-config/infinispan-noauth.yaml"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b575cde9-4c62-47b3-af89-109ed39f56b6",
|
|
"metadata": {},
|
|
"source": [
|
|
"# The Code\n",
|
|
"\n",
|
|
"## Pick up an embedding model\n",
|
|
"\n",
|
|
"In this demo we're using\n",
|
|
"a HuggingFace embedding mode."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "d2c9f46f-3c78-4865-810b-52408dff5fb7",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.embeddings import HuggingFaceEmbeddings\n",
|
|
"from langchain_core.embeddings import Embeddings\n",
|
|
"\n",
|
|
"model_name = \"sentence-transformers/all-MiniLM-L12-v2\"\n",
|
|
"hf = HuggingFaceEmbeddings(model_name=model_name)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "61ce7e1f-51ee-4d3d-ad3c-97088b1120f6",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup Infinispan cache\n",
|
|
"\n",
|
|
"Infinispan is a very flexible key-value store, it can store raw bits as well as complex data type.\n",
|
|
"User has complete freedom in the datagrid configuration, but for simple data type everything is automatically\n",
|
|
"configured by the python layer. We take advantage of this feature so we can focus on our application."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "456da9e7-baf4-472a-a9ee-8473aed8cabd",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Prepare the data\n",
|
|
"\n",
|
|
"In this demo we rely on the default configuration, thus texts, metadatas and vectors in the same cache, but other options are possible: i.e. content can be store somewhere else and vector store could contain only a reference to the actual content."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "0f6a42d3-c5ec-44ec-9b57-ebe5ca8c301a",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import csv\n",
|
|
"import gzip\n",
|
|
"import time\n",
|
|
"\n",
|
|
"# Open the news file and process it as a csv\n",
|
|
"with gzip.open(\"bbc_news.csv.gz\", \"rt\", newline=\"\") as csvfile:\n",
|
|
" spamreader = csv.reader(csvfile, delimiter=\",\", quotechar='\"')\n",
|
|
" i = 0\n",
|
|
" texts = []\n",
|
|
" metas = []\n",
|
|
" embeds = []\n",
|
|
" for row in spamreader:\n",
|
|
" # first and fifth values are joined to form the content\n",
|
|
" # to be processed\n",
|
|
" text = row[0] + \".\" + row[4]\n",
|
|
" texts.append(text)\n",
|
|
" # Store text and title as metadata\n",
|
|
" meta = {\"text\": row[4], \"title\": row[0]}\n",
|
|
" metas.append(meta)\n",
|
|
" i = i + 1\n",
|
|
" # Change this to change the number of news you want to load\n",
|
|
" if i >= 5000:\n",
|
|
" break"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a6b00299-94db-43ca-9da3-45d12cdf2db1",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Populate the vector store"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "75e135a6-1b38-48eb-96ca-379b6f4a653f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# add texts and fill vector db\n",
|
|
"\n",
|
|
"from langchain_community.vectorstores import InfinispanVS\n",
|
|
"\n",
|
|
"ispnvs = InfinispanVS.from_texts(texts, hf, metas)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "2bb6f053-208d-407e-b8b7-c6c6443522d8",
|
|
"metadata": {},
|
|
"source": [
|
|
"# An helper func that prints the result documents\n",
|
|
"\n",
|
|
"By default InfinispanVS returns the protobuf `ŧext` field in the `Document.page_content`\n",
|
|
"and all the remaining protobuf fields (except the vector) in the `metadata`. This behaviour is\n",
|
|
"configurable via lambda functions at setup."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "594fad38-37f0-4dd4-9785-a99a2f009ae5",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def print_docs(docs):\n",
|
|
" for res, i in zip(docs, range(len(docs))):\n",
|
|
" print(\"----\" + str(i + 1) + \"----\")\n",
|
|
" print(\"TITLE: \" + res.metadata[\"title\"])\n",
|
|
" print(res.page_content)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "cfa517c7-e741-4f64-9736-6db7a6bd259a",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Try it!!!\n",
|
|
"\n",
|
|
"Below some sample queries"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "86e782b3-5a74-4ca1-a5d1-c0ee935a659e",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"docs = ispnvs.similarity_search(\"European nations\", 5)\n",
|
|
"print_docs(docs)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "b60847f9-ef34-4c79-b276-ac62170e2d6a",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"print_docs(ispnvs.similarity_search(\"Milan fashion week begins\", 2))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "6cbb5607-da55-4879-92cf-79ac690cc0c5",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"print_docs(ispnvs.similarity_search(\"Stock market is rising today\", 4))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "3bb94ca1-7b1e-41ed-9d8f-b845775d11c1",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"print_docs(ispnvs.similarity_search(\"Why cats are so viral?\", 2))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "a4fca208-b580-483d-9be0-786b6b63a31d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"print_docs(ispnvs.similarity_search(\"How to stay young\", 5))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "d4a460b8-f0c8-4ae9-a7ff-cf550c3195f1",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!docker rm --force infinispanvs-demo"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.9.18"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|