Files
langchain/docs/versioned_docs/version-0.2.x/integrations/vectorstores/hippo.ipynb
Jacob Lee aff771923a Jacob/new docs (#20570)
Use docusaurus versioning with a callout, merged master as well

@hwchase17 @baskaryan

---------

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com>
Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
Co-authored-by: Averi Kitsch <akitsch@google.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Martín Gotelli Ferenaz <martingotelliferenaz@gmail.com>
Co-authored-by: Fayfox <admin@fayfox.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Dawson Bauer <105886620+djbauer2@users.noreply.github.com>
Co-authored-by: Ravindu Somawansa <ravindu.somawansa@gmail.com>
Co-authored-by: Dhruv Chawla <43818888+Dominastorm@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: WeichenXu <weichen.xu@databricks.com>
Co-authored-by: Benito Geordie <89472452+benitoThree@users.noreply.github.com>
Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com>
Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>
Co-authored-by: Sevin F. Varoglu <sfvaroglu@octoml.ai>
Co-authored-by: MacanPN <martin.triska@gmail.com>
Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
Co-authored-by: Hyeongchan Kim <kozistr@gmail.com>
Co-authored-by: sdan <git@sdan.io>
Co-authored-by: Guangdong Liu <liugddx@gmail.com>
Co-authored-by: Rahul Triptahi <rahul.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: pjb157 <84070455+pjb157@users.noreply.github.com>
Co-authored-by: Eun Hye Kim <ehkim1440@gmail.com>
Co-authored-by: kaijietti <43436010+kaijietti@users.noreply.github.com>
Co-authored-by: Pengcheng Liu <pcliu.fd@gmail.com>
Co-authored-by: Tomer Cagan <tomer@tomercagan.com>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
2024-04-18 11:10:55 -07:00

503 lines
14 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"id": "357f24224a8e818f",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"# Hippo\n",
"\n",
">[Transwarp Hippo](https://www.transwarp.cn/en/subproduct/hippo) is an enterprise-level cloud-native distributed vector database that supports storage, retrieval, and management of massive vector-based datasets. It efficiently solves problems such as vector similarity search and high-density vector clustering. `Hippo` features high availability, high performance, and easy scalability. It has many functions, such as multiple vector search indexes, data partitioning and sharding, data persistence, incremental data ingestion, vector scalar field filtering, and mixed queries. It can effectively meet the high real-time search demands of enterprises for massive vector data\n",
"\n",
"## Getting Started\n",
"\n",
"The only prerequisite here is an API key from the OpenAI website. Make sure you have already started a Hippo instance."
]
},
{
"cell_type": "markdown",
"id": "a92d2ce26df7ac4c",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"## Installing Dependencies\n",
"\n",
"Initially, we require the installation of certain dependencies, such as OpenAI, Langchain, and Hippo-API. Please note, that you should install the appropriate versions tailored to your environment."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "13b1d1ae153ff434",
"metadata": {
"ExecuteTime": {
"end_time": "2023-10-30T06:47:54.718488Z",
"start_time": "2023-10-30T06:47:53.563129Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: hippo-api==1.1.0.rc3 in /Users/daochengzhang/miniforge3/envs/py310/lib/python3.10/site-packages (1.1.0rc3)\r\n",
"Requirement already satisfied: pyyaml>=6.0 in /Users/daochengzhang/miniforge3/envs/py310/lib/python3.10/site-packages (from hippo-api==1.1.0.rc3) (6.0.1)\r\n"
]
}
],
"source": [
"%pip install --upgrade --quiet langchain tiktoken langchain-openai\n",
"%pip install --upgrade --quiet hippo-api==1.1.0.rc3"
]
},
{
"cell_type": "markdown",
"id": "554081137df2c252",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"Note: Python version needs to be >=3.8.\n",
"\n",
"## Best Practices\n",
"### Importing Dependency Packages"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "5ff3296ce812aeb8",
"metadata": {
"ExecuteTime": {
"end_time": "2023-10-30T06:47:56.003409Z",
"start_time": "2023-10-30T06:47:55.998839Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain_community.document_loaders import TextLoader\n",
"from langchain_community.vectorstores.hippo import Hippo\n",
"from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n",
"from langchain_text_splitters import CharacterTextSplitter"
]
},
{
"cell_type": "markdown",
"id": "dad255dae8aea755",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Loading Knowledge Documents"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "f02d66a7fd653dc1",
"metadata": {
"ExecuteTime": {
"end_time": "2023-10-30T06:47:59.027869Z",
"start_time": "2023-10-30T06:47:59.023934Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"os.environ[\"OPENAI_API_KEY\"] = \"YOUR OPENAI KEY\"\n",
"loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n",
"documents = loader.load()"
]
},
{
"cell_type": "markdown",
"id": "e9b93c330f1c6160",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Segmenting the Knowledge Document\n",
"\n",
"Here, we use Langchain's CharacterTextSplitter for segmentation. The delimiter is a period. After segmentation, the text segment does not exceed 1000 characters, and the number of repeated characters is 0."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "fe6b43175318331f",
"metadata": {
"ExecuteTime": {
"end_time": "2023-10-30T06:48:00.279351Z",
"start_time": "2023-10-30T06:48:00.275763Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)"
]
},
{
"cell_type": "markdown",
"id": "eefe28c7c993ffdf",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Declaring the Embedding Model\n",
"Below, we create the OpenAI or Azure embedding model using the OpenAIEmbeddings method from Langchain."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "8619f16b9f7355ea",
"metadata": {
"ExecuteTime": {
"end_time": "2023-10-30T06:48:11.686166Z",
"start_time": "2023-10-30T06:48:11.664355Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"# openai\n",
"embeddings = OpenAIEmbeddings()\n",
"# azure\n",
"# embeddings = OpenAIEmbeddings(\n",
"# openai_api_type=\"azure\",\n",
"# openai_api_base=\"x x x\",\n",
"# openai_api_version=\"x x x\",\n",
"# model=\"x x x\",\n",
"# deployment=\"x x x\",\n",
"# openai_api_key=\"x x x\"\n",
"# )"
]
},
{
"cell_type": "markdown",
"id": "e60235602ed91d3c",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Declaring Hippo Client"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "c666b70dcab78129",
"metadata": {
"ExecuteTime": {
"end_time": "2023-10-30T06:48:48.594298Z",
"start_time": "2023-10-30T06:48:48.585267Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"HIPPO_CONNECTION = {\"host\": \"IP\", \"port\": \"PORT\"}"
]
},
{
"cell_type": "markdown",
"id": "43ee6dbd765c3172",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Storing the Document"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "79372c869844bdc9",
"metadata": {
"ExecuteTime": {
"end_time": "2023-10-30T06:51:12.661741Z",
"start_time": "2023-10-30T06:51:06.257156Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"input...\n",
"success\n"
]
}
],
"source": [
"print(\"input...\")\n",
"# insert docs\n",
"vector_store = Hippo.from_documents(\n",
" docs,\n",
" embedding=embeddings,\n",
" table_name=\"langchain_test\",\n",
" connection_args=HIPPO_CONNECTION,\n",
")\n",
"print(\"success\")"
]
},
{
"cell_type": "markdown",
"id": "89077cc9763d5dd0",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Conducting Knowledge-based Question and Answer\n",
"#### Creating a Large Language Question-Answering Model\n",
"Below, we create the OpenAI or Azure large language question-answering model respectively using the AzureChatOpenAI and ChatOpenAI methods from Langchain."
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "c9f2c42e9884f628",
"metadata": {
"ExecuteTime": {
"end_time": "2023-10-30T06:51:28.329351Z",
"start_time": "2023-10-30T06:51:28.318713Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"# llm = AzureChatOpenAI(\n",
"# openai_api_base=\"x x x\",\n",
"# openai_api_version=\"xxx\",\n",
"# deployment_name=\"xxx\",\n",
"# openai_api_key=\"xxx\",\n",
"# openai_api_type=\"azure\"\n",
"# )\n",
"\n",
"llm = ChatOpenAI(openai_api_key=\"YOUR OPENAI KEY\", model_name=\"gpt-3.5-turbo-16k\")"
]
},
{
"cell_type": "markdown",
"id": "a4c5d73016a9db0c",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Acquiring Related Knowledge Based on the Question"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "8656e80519da1f97",
"metadata": {
"ExecuteTime": {
"end_time": "2023-10-30T06:51:33.195634Z",
"start_time": "2023-10-30T06:51:32.196493Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"query = \"Please introduce COVID-19\"\n",
"# query = \"Please introduce Hippo Core Architecture\"\n",
"# query = \"What operations does the Hippo Vector Database support for vector data?\"\n",
"# query = \"Does Hippo use hardware acceleration technology? Briefly introduce hardware acceleration technology.\"\n",
"\n",
"\n",
"# Retrieve similar content from the knowledge base,fetch the top two most similar texts.\n",
"res = vector_store.similarity_search(query, 2)\n",
"content_list = [item.page_content for item in res]\n",
"text = \"\".join(content_list)"
]
},
{
"cell_type": "markdown",
"id": "e5adbaaa7086d1ae",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Constructing a Prompt Template"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "b915d3001a2741c1",
"metadata": {
"ExecuteTime": {
"end_time": "2023-10-30T06:51:35.649376Z",
"start_time": "2023-10-30T06:51:35.645763Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"prompt = f\"\"\"\n",
"Please use the content of the following [Article] to answer my question. If you don't know, please say you don't know, and the answer should be concise.\"\n",
"[Article]:{text}\n",
"Please answer this question in conjunction with the above article:{query}\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"id": "b36b6a9adbec8a82",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Waiting for the Large Language Model to Generate an Answer"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "58eb5d2396321001",
"metadata": {
"ExecuteTime": {
"end_time": "2023-10-30T06:52:17.967885Z",
"start_time": "2023-10-30T06:51:37.692819Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"response_with_hippo:COVID-19 is a virus that has impacted every aspect of our lives for over two years. It is a highly contagious and mutates easily, requiring us to remain vigilant in combating its spread. However, due to progress made and the resilience of individuals, we are now able to move forward safely and return to more normal routines.\n",
"==========================================\n",
"response_without_hippo:COVID-19 is a contagious respiratory illness caused by the novel coronavirus SARS-CoV-2. It was first identified in December 2019 in Wuhan, China and has since spread globally, leading to a pandemic. The virus primarily spreads through respiratory droplets when an infected person coughs, sneezes, talks, or breathes, and can also spread by touching contaminated surfaces and then touching the face. COVID-19 symptoms include fever, cough, shortness of breath, fatigue, muscle or body aches, sore throat, loss of taste or smell, headache, and in severe cases, pneumonia and organ failure. While most people experience mild to moderate symptoms, it can lead to severe illness and even death, particularly among older adults and those with underlying health conditions. To combat the spread of the virus, various preventive measures have been implemented globally, including social distancing, wearing face masks, practicing good hand hygiene, and vaccination efforts.\n"
]
}
],
"source": [
"response_with_hippo = llm.predict(prompt)\n",
"print(f\"response_with_hippo:{response_with_hippo}\")\n",
"response = llm.predict(query)\n",
"print(\"==========================================\")\n",
"print(f\"response_without_hippo:{response}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b2b7ce4e1850ecf1",
"metadata": {
"ExecuteTime": {
"start_time": "2023-10-30T06:42:42.172639Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}