community[minor]: Rememberizer retriever (#20052)

**Description:** This pull request introduces a new feature for LangChain: the integration with the Rememberizer API through a custom retriever. This enables LangChain applications to allow users to load and sync their data from Dropbox, Google Drive, Slack, their hard drive into a vector database that LangChain can query. Queries involve sending text chunks generated within LangChain and retrieving a collection of semantically relevant user data for inclusion in LLM prompts. User knowledge dramatically improved AI applications. The Rememberizer integration will also allow users to access general purpose vectorized data such as Reddit channel discussions and US patents. **Issue:** N/A **Dependencies:** N/A **Twitter handle:** https://twitter.com/Rememberizer
2025-09-05 04:55:14 +00:00 · 2024-05-01 21:41:44 +07:00
parent 1ce1a10f2b
commit 2a6f78a53f
8 changed files with 368 additions and 0 deletions
--- a/docs/docs/integrations/retrievers/rememberizer.ipynb
+++ b/docs/docs/integrations/retrievers/rememberizer.ipynb
@@ -0,0 +1,221 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Rememberizer\n",
+    "\n",
+    ">[Rememberizer](https://rememberizer.ai/) is a knowledge enhancement service for AI applications created by  SkyDeck AI Inc.\n",
+    "\n",
+    "This notebook shows how to retrieve documents from `Rememberizer` into the Document format that is used downstream."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Preparation\n",
+    "\n",
+    "You will need an API key: you can get one after creating a common knowledge at [https://rememberizer.ai](https://rememberizer.ai/). Once you have an API key, you must set it as an environment variable `REMEMBERIZER_API_KEY` or pass it as `rememberizer_api_key` when initializing `RememberizerRetriever`.\n",
+    "\n",
+    "`RememberizerRetriever` has these arguments:\n",
+    "- optional `top_k_results`: default=10. Use it to limit number of returned documents. \n",
+    "- optional `rememberizer_api_key`: required if you don't set the environment variable `REMEMBERIZER_API_KEY`.\n",
+    "\n",
+    "`get_relevant_documents()` has one argument, `query`: free text which used to find documents in the common knowledge of `Rememberizer.ai`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Examples\n",
+    "\n",
+    "## Basic usage"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Setup API key\n",
+    "from getpass import getpass\n",
+    "\n",
+    "REMEMBERIZER_API_KEY = getpass()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "from langchain_community.retrievers import RememberizerRetriever\n",
+    "\n",
+    "os.environ[\"REMEMBERIZER_API_KEY\"] = REMEMBERIZER_API_KEY\n",
+    "retriever = RememberizerRetriever(top_k_results=5)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs = retriever.get_relevant_documents(query=\"How does Large Language Models works?\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'id': 13646493,\n",
+       " 'document_id': '17s3LlMbpkTk0ikvGwV0iLMCj-MNubIaP',\n",
+       " 'name': 'What is a large language model (LLM)_ _ Cloudflare.pdf',\n",
+       " 'type': 'application/pdf',\n",
+       " 'path': '/langchain/What is a large language model (LLM)_ _ Cloudflare.pdf',\n",
+       " 'url': 'https://drive.google.com/file/d/17s3LlMbpkTk0ikvGwV0iLMCj-MNubIaP/view',\n",
+       " 'size': 337089,\n",
+       " 'created_time': '',\n",
+       " 'modified_time': '',\n",
+       " 'indexed_on': '2024-04-04T03:36:28.886170Z',\n",
+       " 'integration': {'id': 347, 'integration_type': 'google_drive'}}"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "docs[0].metadata  # meta-information of the Document"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "before, or contextualized in new ways. on some level they \" understand \" semantics in that they can associate words and concepts by their meaning, having seen them grouped together in that way millions or billions of times. how developers can quickly start building their own llms to build llm applications, developers need easy access to multiple data sets, and they need places for those data sets \n"
+     ]
+    }
+   ],
+   "source": [
+    "print(docs[0].page_content[:400])  # a content of the Document"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Usage in a chain"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "OPENAI_API_KEY = getpass()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.chains import ConversationalRetrievalChain\n",
+    "from langchain_openai import ChatOpenAI\n",
+    "\n",
+    "model = ChatOpenAI(model_name=\"gpt-3.5-turbo\")\n",
+    "qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "-> **Question**: What is RAG? \n",
+      "\n",
+      "**Answer**: RAG stands for Retrieval-Augmented Generation. It is an AI framework that retrieves facts from an external knowledge base to enhance the responses generated by Large Language Models (LLMs) by providing up-to-date and accurate information. This framework helps users understand the generative process of LLMs and ensures that the model has access to reliable information sources. \n",
+      "\n",
+      "-> **Question**: How does Large Language Models works? \n",
+      "\n",
+      "**Answer**: Large Language Models (LLMs) work by analyzing massive data sets of language to comprehend and generate human language text. They are built on machine learning, specifically deep learning, which involves training a program to recognize features of data without human intervention. LLMs use neural networks, specifically transformer models, to understand context in human language, making them better at interpreting language even in vague or new contexts. Developers can quickly start building their own LLMs by accessing multiple data sets and using services like Cloudflare's Vectorize and Cloudflare Workers AI platform. \n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "questions = [\n",
+    "    \"What is RAG?\",\n",
+    "    \"How does Large Language Models works?\",\n",
+    "]\n",
+    "chat_history = []\n",
+    "\n",
+    "for question in questions:\n",
+    "    result = qa.invoke({\"question\": question, \"chat_history\": chat_history})\n",
+    "    chat_history.append((question, result[\"answer\"]))\n",
+    "    print(f\"-> **Question**: {question} \\n\")\n",
+    "    print(f\"**Answer**: {result['answer']} \\n\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "langchain",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.18"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}