langchain/docs/docs/integrations/retrievers/superlinked_examples.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SuperlinkedRetriever Examples\n",
    "\n",
    "This notebook demonstrates how to build a Superlinked App and Query Descriptor and use them with the LangChain `SuperlinkedRetriever`.\n",
    "\n",
    "Install the integration from PyPI:\n",
    "\n",
    "```bash\n",
    "pip install -U langchain-superlinked superlinked\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "Install the integration and its peer dependency:\n",
    "\n",
    "```bash\n",
    "pip install -U langchain-superlinked superlinked\n",
    "```\n",
    "\n",
    "## Instantiation\n",
    "\n",
    "See below for creating a Superlinked App (`sl_client`) and a `QueryDescriptor` (`sl_query`), then wiring them into `SuperlinkedRetriever`.\n",
    "\n",
    "## Usage\n",
    "\n",
    "Call `retriever.invoke(query_text, **params)` to retrieve `Document` objects. Examples below show single-space and multi-space setups.\n",
    "\n",
    "## Use within a chain\n",
    "\n",
    "The retriever can be used in LangChain chains by piping it into your prompt and model. See the main Superlinked retriever page for a full RAG example.\n",
    "\n",
    "## API reference\n",
    "\n",
    "Refer to the API docs:\n",
    "\n",
    "- https://python.langchain.com/api_reference/superlinked/retrievers/langchain_superlinked.retrievers.SuperlinkedRetriever.html\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import superlinked.framework as sl\n",
    "from langchain_superlinked import SuperlinkedRetriever\n",
    "from datetime import timedelta\n",
    "\n",
    "\n",
    "# Define schema\n",
    "class DocumentSchema(sl.Schema):\n",
    "    id: sl.IdField\n",
    "    content: sl.String\n",
    "\n",
    "\n",
    "doc_schema = DocumentSchema()\n",
    "\n",
    "# Space + index\n",
    "text_space = sl.TextSimilaritySpace(\n",
    "    text=doc_schema.content, model=\"sentence-transformers/all-MiniLM-L6-v2\"\n",
    ")\n",
    "doc_index = sl.Index([text_space])\n",
    "\n",
    "# Query descriptor\n",
    "query = (\n",
    "    sl.Query(doc_index)\n",
    "    .find(doc_schema)\n",
    "    .similar(text_space.text, sl.Param(\"query_text\"))\n",
    "    .select([doc_schema.content])\n",
    "    .limit(sl.Param(\"limit\"))\n",
    ")\n",
    "\n",
    "# Minimal app\n",
    "source = sl.InMemorySource(schema=doc_schema)\n",
    "executor = sl.InMemoryExecutor(sources=[source], indices=[doc_index])\n",
    "app = executor.run()\n",
    "\n",
    "# Data\n",
    "source.put(\n",
    "    [\n",
    "        {\"id\": \"1\", \"content\": \"Machine learning algorithms process data efficiently.\"},\n",
    "        {\n",
    "            \"id\": \"2\",\n",
    "            \"content\": \"Natural language processing understands human language.\",\n",
    "        },\n",
    "        {\"id\": \"3\", \"content\": \"Deep learning models require significant compute.\"},\n",
    "    ]\n",
    ")\n",
    "\n",
    "# Retriever\n",
    "retriever = SuperlinkedRetriever(\n",
    "    sl_client=app, sl_query=query, page_content_field=\"content\"\n",
    ")\n",
    "\n",
    "retriever.invoke(\"artificial intelligence\", limit=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Multi-space example (blog posts)\n",
    "class BlogPostSchema(sl.Schema):\n",
    "    id: sl.IdField\n",
    "    title: sl.String\n",
    "    content: sl.String\n",
    "    category: sl.String\n",
    "    published_date: sl.Timestamp\n",
    "\n",
    "\n",
    "blog = BlogPostSchema()\n",
    "\n",
    "content_space = sl.TextSimilaritySpace(\n",
    "    text=blog.content, model=\"sentence-transformers/all-MiniLM-L6-v2\"\n",
    ")\n",
    "title_space = sl.TextSimilaritySpace(\n",
    "    text=blog.title, model=\"sentence-transformers/all-MiniLM-L6-v2\"\n",
    ")\n",
    "cat_space = sl.CategoricalSimilaritySpace(\n",
    "    category_input=blog.category, categories=[\"technology\", \"science\", \"business\"]\n",
    ")\n",
    "recency_space = sl.RecencySpace(\n",
    "    timestamp=blog.published_date,\n",
    "    period_time_list=[\n",
    "        sl.PeriodTime(timedelta(days=30)),\n",
    "        sl.PeriodTime(timedelta(days=90)),\n",
    "    ],\n",
    ")\n",
    "\n",
    "blog_index = sl.Index([content_space, title_space, cat_space, recency_space])\n",
    "\n",
    "blog_query = (\n",
    "    sl.Query(\n",
    "        blog_index,\n",
    "        weights={\n",
    "            content_space: sl.Param(\"content_weight\"),\n",
    "            title_space: sl.Param(\"title_weight\"),\n",
    "            cat_space: sl.Param(\"category_weight\"),\n",
    "            recency_space: sl.Param(\"recency_weight\"),\n",
    "        },\n",
    "    )\n",
    "    .find(blog)\n",
    "    .similar(content_space.text, sl.Param(\"query_text\"))\n",
    "    .select([blog.title, blog.content, blog.category, blog.published_date])\n",
    "    .limit(sl.Param(\"limit\"))\n",
    ")\n",
    "\n",
    "source = sl.InMemorySource(schema=blog)\n",
    "app = sl.InMemoryExecutor(sources=[source], indices=[blog_index]).run()\n",
    "\n",
    "from datetime import datetime\n",
    "\n",
    "source.put(\n",
    "    [\n",
    "        {\n",
    "            \"id\": \"p1\",\n",
    "            \"title\": \"Intro to ML\",\n",
    "            \"content\": \"Machine learning 101\",\n",
    "            \"category\": \"technology\",\n",
    "            \"published_date\": int((datetime.now() - timedelta(days=5)).timestamp()),\n",
    "        },\n",
    "        {\n",
    "            \"id\": \"p2\",\n",
    "            \"title\": \"AI in Healthcare\",\n",
    "            \"content\": \"Transforming diagnosis\",\n",
    "            \"category\": \"science\",\n",
    "            \"published_date\": int((datetime.now() - timedelta(days=15)).timestamp()),\n",
    "        },\n",
    "    ]\n",
    ")\n",
    "\n",
    "blog_retriever = SuperlinkedRetriever(\n",
    "    sl_client=app,\n",
    "    sl_query=blog_query,\n",
    "    page_content_field=\"content\",\n",
    "    metadata_fields=[\"title\", \"category\", \"published_date\"],\n",
    ")\n",
    "\n",
    "blog_retriever.invoke(\n",
    "    \"machine learning\", content_weight=1.0, recency_weight=0.5, limit=2\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}