Files
langchain/docs/docs/integrations/retrievers/superlinked_examples.ipynb
Filip Makraduli 0be7515abc docs: add superlinked retriever integration (#32433)
# feat(superlinked): add superlinked retriever integration

**Description:** 
Add Superlinked as a custom retriever with full LangChain compatibility.
This integration enables users to leverage Superlinked's multi-modal
vector search capabilities including text similarity, categorical
similarity, recency, and numerical spaces with flexible weighting
strategies. The implementation provides a `SuperlinkedRetriever` class
that extends LangChain's `BaseRetriever` with comprehensive error
handling, parameter validation, and support for various vector databases
(in-memory, Qdrant, Redis, MongoDB).

**Key Features:**
- Full LangChain `BaseRetriever` compatibility with `k` parameter
support
- Multi-modal search spaces (text, categorical, numerical, recency)
- Flexible weighting strategies for complex search scenarios
- Vector database agnostic implementation
- Comprehensive validation and error handling
- Complete test coverage (unit tests, integration tests)
- Detailed documentation with 6 practical usage examples

**Issue:** N/A (new integration)

**Dependencies:** 
- `superlinked==33.5.1` (peer dependency, imported within functions)
- `pandas^2.2.0` (required by superlinked)

**Linkedin handle:** https://www.linkedin.com/in/filipmakraduli/

## Implementation Details

### Files Added/Modified:
- `libs/partners/superlinked/` - Complete package structure
- `libs/partners/superlinked/langchain_superlinked/retrievers.py` - Main
retriever implementation
- `libs/partners/superlinked/tests/unit_tests/test_retrievers.py` - unit
tests
- `libs/partners/superlinked/tests/integration_tests/test_retrievers.py`
- Integration tests with mocking
- `docs/docs/integrations/retrievers/superlinked.ipynb` - Documentation
a few usage examples

### Testing:
- `make format` - passing
- `make lint` - passing 
- `make test` - passing (16 unit tests, integration tests)
- Comprehensive test coverage including error handling, validation, and
edge cases

### Documentation:
- Example notebook with 6 practical scenarios:
  1. Simple text search
  2. Multi-space blog search (content + category + recency)
  3. E-commerce product search (price + brand + ratings)
  4. News article search (sentiment + topics + recency)
  5. LangChain RAG integration example
  6. Qdrant vector database integration

### Code Quality:
- Follows LangChain contribution guidelines
- Backwards compatible
- Optional dependencies imported within functions
- Comprehensive error handling and validation
- Type hints and docstrings throughout

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-15 13:54:04 +00:00

205 lines
6.5 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# SuperlinkedRetriever Examples\n",
"\n",
"This notebook demonstrates how to build a Superlinked App and Query Descriptor and use them with the LangChain `SuperlinkedRetriever`.\n",
"\n",
"Install the integration from PyPI:\n",
"\n",
"```bash\n",
"pip install -U langchain-superlinked superlinked\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"Install the integration and its peer dependency:\n",
"\n",
"```bash\n",
"pip install -U langchain-superlinked superlinked\n",
"```\n",
"\n",
"## Instantiation\n",
"\n",
"See below for creating a Superlinked App (`sl_client`) and a `QueryDescriptor` (`sl_query`), then wiring them into `SuperlinkedRetriever`.\n",
"\n",
"## Usage\n",
"\n",
"Call `retriever.invoke(query_text, **params)` to retrieve `Document` objects. Examples below show single-space and multi-space setups.\n",
"\n",
"## Use within a chain\n",
"\n",
"The retriever can be used in LangChain chains by piping it into your prompt and model. See the main Superlinked retriever page for a full RAG example.\n",
"\n",
"## API reference\n",
"\n",
"Refer to the API docs:\n",
"\n",
"- https://python.langchain.com/api_reference/superlinked/retrievers/langchain_superlinked.retrievers.SuperlinkedRetriever.html\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import superlinked.framework as sl\n",
"from langchain_superlinked import SuperlinkedRetriever\n",
"from datetime import timedelta\n",
"\n",
"\n",
"# Define schema\n",
"class DocumentSchema(sl.Schema):\n",
" id: sl.IdField\n",
" content: sl.String\n",
"\n",
"\n",
"doc_schema = DocumentSchema()\n",
"\n",
"# Space + index\n",
"text_space = sl.TextSimilaritySpace(\n",
" text=doc_schema.content, model=\"sentence-transformers/all-MiniLM-L6-v2\"\n",
")\n",
"doc_index = sl.Index([text_space])\n",
"\n",
"# Query descriptor\n",
"query = (\n",
" sl.Query(doc_index)\n",
" .find(doc_schema)\n",
" .similar(text_space.text, sl.Param(\"query_text\"))\n",
" .select([doc_schema.content])\n",
" .limit(sl.Param(\"limit\"))\n",
")\n",
"\n",
"# Minimal app\n",
"source = sl.InMemorySource(schema=doc_schema)\n",
"executor = sl.InMemoryExecutor(sources=[source], indices=[doc_index])\n",
"app = executor.run()\n",
"\n",
"# Data\n",
"source.put(\n",
" [\n",
" {\"id\": \"1\", \"content\": \"Machine learning algorithms process data efficiently.\"},\n",
" {\n",
" \"id\": \"2\",\n",
" \"content\": \"Natural language processing understands human language.\",\n",
" },\n",
" {\"id\": \"3\", \"content\": \"Deep learning models require significant compute.\"},\n",
" ]\n",
")\n",
"\n",
"# Retriever\n",
"retriever = SuperlinkedRetriever(\n",
" sl_client=app, sl_query=query, page_content_field=\"content\"\n",
")\n",
"\n",
"retriever.invoke(\"artificial intelligence\", limit=2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Multi-space example (blog posts)\n",
"class BlogPostSchema(sl.Schema):\n",
" id: sl.IdField\n",
" title: sl.String\n",
" content: sl.String\n",
" category: sl.String\n",
" published_date: sl.Timestamp\n",
"\n",
"\n",
"blog = BlogPostSchema()\n",
"\n",
"content_space = sl.TextSimilaritySpace(\n",
" text=blog.content, model=\"sentence-transformers/all-MiniLM-L6-v2\"\n",
")\n",
"title_space = sl.TextSimilaritySpace(\n",
" text=blog.title, model=\"sentence-transformers/all-MiniLM-L6-v2\"\n",
")\n",
"cat_space = sl.CategoricalSimilaritySpace(\n",
" category_input=blog.category, categories=[\"technology\", \"science\", \"business\"]\n",
")\n",
"recency_space = sl.RecencySpace(\n",
" timestamp=blog.published_date,\n",
" period_time_list=[\n",
" sl.PeriodTime(timedelta(days=30)),\n",
" sl.PeriodTime(timedelta(days=90)),\n",
" ],\n",
")\n",
"\n",
"blog_index = sl.Index([content_space, title_space, cat_space, recency_space])\n",
"\n",
"blog_query = (\n",
" sl.Query(\n",
" blog_index,\n",
" weights={\n",
" content_space: sl.Param(\"content_weight\"),\n",
" title_space: sl.Param(\"title_weight\"),\n",
" cat_space: sl.Param(\"category_weight\"),\n",
" recency_space: sl.Param(\"recency_weight\"),\n",
" },\n",
" )\n",
" .find(blog)\n",
" .similar(content_space.text, sl.Param(\"query_text\"))\n",
" .select([blog.title, blog.content, blog.category, blog.published_date])\n",
" .limit(sl.Param(\"limit\"))\n",
")\n",
"\n",
"source = sl.InMemorySource(schema=blog)\n",
"app = sl.InMemoryExecutor(sources=[source], indices=[blog_index]).run()\n",
"\n",
"from datetime import datetime\n",
"\n",
"source.put(\n",
" [\n",
" {\n",
" \"id\": \"p1\",\n",
" \"title\": \"Intro to ML\",\n",
" \"content\": \"Machine learning 101\",\n",
" \"category\": \"technology\",\n",
" \"published_date\": int((datetime.now() - timedelta(days=5)).timestamp()),\n",
" },\n",
" {\n",
" \"id\": \"p2\",\n",
" \"title\": \"AI in Healthcare\",\n",
" \"content\": \"Transforming diagnosis\",\n",
" \"category\": \"science\",\n",
" \"published_date\": int((datetime.now() - timedelta(days=15)).timestamp()),\n",
" },\n",
" ]\n",
")\n",
"\n",
"blog_retriever = SuperlinkedRetriever(\n",
" sl_client=app,\n",
" sl_query=blog_query,\n",
" page_content_field=\"content\",\n",
" metadata_fields=[\"title\", \"category\", \"published_date\"],\n",
")\n",
"\n",
"blog_retriever.invoke(\n",
" \"machine learning\", content_weight=1.0, recency_weight=0.5, limit=2\n",
")"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}