mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-24 12:01:54 +00:00
docs: add superlinked retriever integration (#32433)
# feat(superlinked): add superlinked retriever integration **Description:** Add Superlinked as a custom retriever with full LangChain compatibility. This integration enables users to leverage Superlinked's multi-modal vector search capabilities including text similarity, categorical similarity, recency, and numerical spaces with flexible weighting strategies. The implementation provides a `SuperlinkedRetriever` class that extends LangChain's `BaseRetriever` with comprehensive error handling, parameter validation, and support for various vector databases (in-memory, Qdrant, Redis, MongoDB). **Key Features:** - Full LangChain `BaseRetriever` compatibility with `k` parameter support - Multi-modal search spaces (text, categorical, numerical, recency) - Flexible weighting strategies for complex search scenarios - Vector database agnostic implementation - Comprehensive validation and error handling - Complete test coverage (unit tests, integration tests) - Detailed documentation with 6 practical usage examples **Issue:** N/A (new integration) **Dependencies:** - `superlinked==33.5.1` (peer dependency, imported within functions) - `pandas^2.2.0` (required by superlinked) **Linkedin handle:** https://www.linkedin.com/in/filipmakraduli/ ## Implementation Details ### Files Added/Modified: - `libs/partners/superlinked/` - Complete package structure - `libs/partners/superlinked/langchain_superlinked/retrievers.py` - Main retriever implementation - `libs/partners/superlinked/tests/unit_tests/test_retrievers.py` - unit tests - `libs/partners/superlinked/tests/integration_tests/test_retrievers.py` - Integration tests with mocking - `docs/docs/integrations/retrievers/superlinked.ipynb` - Documentation a few usage examples ### Testing: - `make format` - passing - `make lint` - passing - `make test` - passing (16 unit tests, integration tests) - Comprehensive test coverage including error handling, validation, and edge cases ### Documentation: - Example notebook with 6 practical scenarios: 1. Simple text search 2. Multi-space blog search (content + category + recency) 3. E-commerce product search (price + brand + ratings) 4. News article search (sentiment + topics + recency) 5. LangChain RAG integration example 6. Qdrant vector database integration ### Code Quality: - Follows LangChain contribution guidelines - Backwards compatible - Optional dependencies imported within functions - Comprehensive error handling and validation - Type hints and docstrings throughout --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>
This commit is contained in:
140
docs/docs/integrations/providers/superlinked.mdx
Normal file
140
docs/docs/integrations/providers/superlinked.mdx
Normal file
@@ -0,0 +1,140 @@
|
|||||||
|
---
|
||||||
|
title: Superlinked
|
||||||
|
description: LangChain integration package for the Superlinked retrieval stack
|
||||||
|
---
|
||||||
|
|
||||||
|
import Link from '@docusaurus/Link';
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
|
||||||
|
Superlinked enables context‑aware retrieval using multiple space types (text similarity, categorical, numerical, recency, and more). The `langchain-superlinked` package provides a LangChain‑native `SuperlinkedRetriever` that plugs directly into your RAG chains.
|
||||||
|
|
||||||
|
### Links
|
||||||
|
|
||||||
|
- <Link to="https://github.com/superlinked/langchain-superlinked">Integration repository</Link>
|
||||||
|
- <Link to="https://links.superlinked.com/langchain_repo_sl">Superlinked core repository</Link>
|
||||||
|
- <Link to="https://links.superlinked.com/langchain_article">Article: Build RAG using LangChain & Superlinked</Link>
|
||||||
|
|
||||||
|
### Install
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -U langchain-superlinked superlinked
|
||||||
|
```
|
||||||
|
|
||||||
|
### Quickstart
|
||||||
|
|
||||||
|
```python
|
||||||
|
import superlinked.framework as sl
|
||||||
|
from langchain_superlinked import SuperlinkedRetriever
|
||||||
|
|
||||||
|
# 1) Define schema
|
||||||
|
class DocumentSchema(sl.Schema):
|
||||||
|
id: sl.IdField
|
||||||
|
content: sl.String
|
||||||
|
|
||||||
|
doc_schema = DocumentSchema()
|
||||||
|
|
||||||
|
# 2) Define space and index
|
||||||
|
text_space = sl.TextSimilaritySpace(
|
||||||
|
text=doc_schema.content, model="sentence-transformers/all-MiniLM-L6-v2"
|
||||||
|
)
|
||||||
|
doc_index = sl.Index([text_space])
|
||||||
|
|
||||||
|
# 3) Define query
|
||||||
|
query = (
|
||||||
|
sl.Query(doc_index)
|
||||||
|
.find(doc_schema)
|
||||||
|
.similar(text_space.text, sl.Param("query_text"))
|
||||||
|
.select([doc_schema.content])
|
||||||
|
.limit(sl.Param("limit"))
|
||||||
|
)
|
||||||
|
|
||||||
|
# 4) Minimal app setup
|
||||||
|
source = sl.InMemorySource(schema=doc_schema)
|
||||||
|
executor = sl.InMemoryExecutor(sources=[source], indices=[doc_index])
|
||||||
|
app = executor.run()
|
||||||
|
source.put([
|
||||||
|
{"id": "1", "content": "Machine learning algorithms process data efficiently."},
|
||||||
|
{"id": "2", "content": "Natural language processing understands human language."},
|
||||||
|
])
|
||||||
|
|
||||||
|
# 5) LangChain retriever
|
||||||
|
retriever = SuperlinkedRetriever(
|
||||||
|
sl_client=app, sl_query=query, page_content_field="content"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Search
|
||||||
|
docs = retriever.invoke("artificial intelligence", limit=2)
|
||||||
|
for d in docs:
|
||||||
|
print(d.page_content)
|
||||||
|
```
|
||||||
|
|
||||||
|
### What the retriever expects (App and Query)
|
||||||
|
|
||||||
|
The retriever takes two core inputs:
|
||||||
|
|
||||||
|
- `sl_client`: a Superlinked App created by running an executor (e.g., `InMemoryExecutor(...).run()`)
|
||||||
|
- `sl_query`: a `QueryDescriptor` returned by chaining `sl.Query(...).find(...).similar(...).select(...).limit(...)`
|
||||||
|
|
||||||
|
Minimal setup:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import superlinked.framework as sl
|
||||||
|
from langchain_superlinked import SuperlinkedRetriever
|
||||||
|
|
||||||
|
class Doc(sl.Schema):
|
||||||
|
id: sl.IdField
|
||||||
|
content: sl.String
|
||||||
|
|
||||||
|
doc = Doc()
|
||||||
|
space = sl.TextSimilaritySpace(text=doc.content, model="sentence-transformers/all-MiniLM-L6-v2")
|
||||||
|
index = sl.Index([space])
|
||||||
|
|
||||||
|
query = (
|
||||||
|
sl.Query(index)
|
||||||
|
.find(doc)
|
||||||
|
.similar(space.text, sl.Param("query_text"))
|
||||||
|
.select([doc.content])
|
||||||
|
.limit(sl.Param("limit"))
|
||||||
|
)
|
||||||
|
|
||||||
|
source = sl.InMemorySource(schema=doc)
|
||||||
|
app = sl.InMemoryExecutor(sources=[source], indices=[index]).run()
|
||||||
|
|
||||||
|
retriever = SuperlinkedRetriever(sl_client=app, sl_query=query, page_content_field="content")
|
||||||
|
```
|
||||||
|
|
||||||
|
Note: For a persistent vector DB, pass `vector_database=...` to the executor (e.g., Qdrant) before `.run()`.
|
||||||
|
|
||||||
|
### Use within a chain
|
||||||
|
|
||||||
|
```python
|
||||||
|
from langchain_core.runnables import RunnablePassthrough
|
||||||
|
from langchain_core.prompts import ChatPromptTemplate
|
||||||
|
from langchain_openai import ChatOpenAI
|
||||||
|
|
||||||
|
def format_docs(docs):
|
||||||
|
return "\n\n".join(doc.page_content for doc in docs)
|
||||||
|
|
||||||
|
prompt = ChatPromptTemplate.from_template(
|
||||||
|
"""
|
||||||
|
Answer based on context:\n\nContext: {context}\nQuestion: {question}
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
|
||||||
|
chain = ({"context": retriever | format_docs, "question": RunnablePassthrough()}
|
||||||
|
| prompt
|
||||||
|
| ChatOpenAI())
|
||||||
|
|
||||||
|
answer = chain.invoke("How does machine learning work?")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Resources
|
||||||
|
|
||||||
|
- <Link to="https://pypi.org/project/langchain-superlinked/">PyPI: langchain-superlinked</Link>
|
||||||
|
- <Link to="https://pypi.org/project/superlinked/">PyPI: superlinked</Link>
|
||||||
|
- <Link to="https://github.com/superlinked/langchain-superlinked">Source repository</Link>
|
||||||
|
- <Link to="https://links.superlinked.com/langchain_repo_sl">Superlinked core repository</Link>
|
||||||
|
- <Link to="https://links.superlinked.com/langchain_article">Build RAG using LangChain & Superlinked (article)</Link>
|
||||||
|
|
||||||
|
|
1292
docs/docs/integrations/retrievers/superlinked.ipynb
Normal file
1292
docs/docs/integrations/retrievers/superlinked.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
204
docs/docs/integrations/retrievers/superlinked_examples.ipynb
Normal file
204
docs/docs/integrations/retrievers/superlinked_examples.ipynb
Normal file
@@ -0,0 +1,204 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# SuperlinkedRetriever Examples\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook demonstrates how to build a Superlinked App and Query Descriptor and use them with the LangChain `SuperlinkedRetriever`.\n",
|
||||||
|
"\n",
|
||||||
|
"Install the integration from PyPI:\n",
|
||||||
|
"\n",
|
||||||
|
"```bash\n",
|
||||||
|
"pip install -U langchain-superlinked superlinked\n",
|
||||||
|
"```\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Setup\n",
|
||||||
|
"\n",
|
||||||
|
"Install the integration and its peer dependency:\n",
|
||||||
|
"\n",
|
||||||
|
"```bash\n",
|
||||||
|
"pip install -U langchain-superlinked superlinked\n",
|
||||||
|
"```\n",
|
||||||
|
"\n",
|
||||||
|
"## Instantiation\n",
|
||||||
|
"\n",
|
||||||
|
"See below for creating a Superlinked App (`sl_client`) and a `QueryDescriptor` (`sl_query`), then wiring them into `SuperlinkedRetriever`.\n",
|
||||||
|
"\n",
|
||||||
|
"## Usage\n",
|
||||||
|
"\n",
|
||||||
|
"Call `retriever.invoke(query_text, **params)` to retrieve `Document` objects. Examples below show single-space and multi-space setups.\n",
|
||||||
|
"\n",
|
||||||
|
"## Use within a chain\n",
|
||||||
|
"\n",
|
||||||
|
"The retriever can be used in LangChain chains by piping it into your prompt and model. See the main Superlinked retriever page for a full RAG example.\n",
|
||||||
|
"\n",
|
||||||
|
"## API reference\n",
|
||||||
|
"\n",
|
||||||
|
"Refer to the API docs:\n",
|
||||||
|
"\n",
|
||||||
|
"- https://python.langchain.com/api_reference/superlinked/retrievers/langchain_superlinked.retrievers.SuperlinkedRetriever.html\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import superlinked.framework as sl\n",
|
||||||
|
"from langchain_superlinked import SuperlinkedRetriever\n",
|
||||||
|
"from datetime import timedelta\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"# Define schema\n",
|
||||||
|
"class DocumentSchema(sl.Schema):\n",
|
||||||
|
" id: sl.IdField\n",
|
||||||
|
" content: sl.String\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"doc_schema = DocumentSchema()\n",
|
||||||
|
"\n",
|
||||||
|
"# Space + index\n",
|
||||||
|
"text_space = sl.TextSimilaritySpace(\n",
|
||||||
|
" text=doc_schema.content, model=\"sentence-transformers/all-MiniLM-L6-v2\"\n",
|
||||||
|
")\n",
|
||||||
|
"doc_index = sl.Index([text_space])\n",
|
||||||
|
"\n",
|
||||||
|
"# Query descriptor\n",
|
||||||
|
"query = (\n",
|
||||||
|
" sl.Query(doc_index)\n",
|
||||||
|
" .find(doc_schema)\n",
|
||||||
|
" .similar(text_space.text, sl.Param(\"query_text\"))\n",
|
||||||
|
" .select([doc_schema.content])\n",
|
||||||
|
" .limit(sl.Param(\"limit\"))\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"# Minimal app\n",
|
||||||
|
"source = sl.InMemorySource(schema=doc_schema)\n",
|
||||||
|
"executor = sl.InMemoryExecutor(sources=[source], indices=[doc_index])\n",
|
||||||
|
"app = executor.run()\n",
|
||||||
|
"\n",
|
||||||
|
"# Data\n",
|
||||||
|
"source.put(\n",
|
||||||
|
" [\n",
|
||||||
|
" {\"id\": \"1\", \"content\": \"Machine learning algorithms process data efficiently.\"},\n",
|
||||||
|
" {\n",
|
||||||
|
" \"id\": \"2\",\n",
|
||||||
|
" \"content\": \"Natural language processing understands human language.\",\n",
|
||||||
|
" },\n",
|
||||||
|
" {\"id\": \"3\", \"content\": \"Deep learning models require significant compute.\"},\n",
|
||||||
|
" ]\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"# Retriever\n",
|
||||||
|
"retriever = SuperlinkedRetriever(\n",
|
||||||
|
" sl_client=app, sl_query=query, page_content_field=\"content\"\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"retriever.invoke(\"artificial intelligence\", limit=2)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Multi-space example (blog posts)\n",
|
||||||
|
"class BlogPostSchema(sl.Schema):\n",
|
||||||
|
" id: sl.IdField\n",
|
||||||
|
" title: sl.String\n",
|
||||||
|
" content: sl.String\n",
|
||||||
|
" category: sl.String\n",
|
||||||
|
" published_date: sl.Timestamp\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"blog = BlogPostSchema()\n",
|
||||||
|
"\n",
|
||||||
|
"content_space = sl.TextSimilaritySpace(\n",
|
||||||
|
" text=blog.content, model=\"sentence-transformers/all-MiniLM-L6-v2\"\n",
|
||||||
|
")\n",
|
||||||
|
"title_space = sl.TextSimilaritySpace(\n",
|
||||||
|
" text=blog.title, model=\"sentence-transformers/all-MiniLM-L6-v2\"\n",
|
||||||
|
")\n",
|
||||||
|
"cat_space = sl.CategoricalSimilaritySpace(\n",
|
||||||
|
" category_input=blog.category, categories=[\"technology\", \"science\", \"business\"]\n",
|
||||||
|
")\n",
|
||||||
|
"recency_space = sl.RecencySpace(\n",
|
||||||
|
" timestamp=blog.published_date,\n",
|
||||||
|
" period_time_list=[\n",
|
||||||
|
" sl.PeriodTime(timedelta(days=30)),\n",
|
||||||
|
" sl.PeriodTime(timedelta(days=90)),\n",
|
||||||
|
" ],\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"blog_index = sl.Index([content_space, title_space, cat_space, recency_space])\n",
|
||||||
|
"\n",
|
||||||
|
"blog_query = (\n",
|
||||||
|
" sl.Query(\n",
|
||||||
|
" blog_index,\n",
|
||||||
|
" weights={\n",
|
||||||
|
" content_space: sl.Param(\"content_weight\"),\n",
|
||||||
|
" title_space: sl.Param(\"title_weight\"),\n",
|
||||||
|
" cat_space: sl.Param(\"category_weight\"),\n",
|
||||||
|
" recency_space: sl.Param(\"recency_weight\"),\n",
|
||||||
|
" },\n",
|
||||||
|
" )\n",
|
||||||
|
" .find(blog)\n",
|
||||||
|
" .similar(content_space.text, sl.Param(\"query_text\"))\n",
|
||||||
|
" .select([blog.title, blog.content, blog.category, blog.published_date])\n",
|
||||||
|
" .limit(sl.Param(\"limit\"))\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"source = sl.InMemorySource(schema=blog)\n",
|
||||||
|
"app = sl.InMemoryExecutor(sources=[source], indices=[blog_index]).run()\n",
|
||||||
|
"\n",
|
||||||
|
"from datetime import datetime\n",
|
||||||
|
"\n",
|
||||||
|
"source.put(\n",
|
||||||
|
" [\n",
|
||||||
|
" {\n",
|
||||||
|
" \"id\": \"p1\",\n",
|
||||||
|
" \"title\": \"Intro to ML\",\n",
|
||||||
|
" \"content\": \"Machine learning 101\",\n",
|
||||||
|
" \"category\": \"technology\",\n",
|
||||||
|
" \"published_date\": int((datetime.now() - timedelta(days=5)).timestamp()),\n",
|
||||||
|
" },\n",
|
||||||
|
" {\n",
|
||||||
|
" \"id\": \"p2\",\n",
|
||||||
|
" \"title\": \"AI in Healthcare\",\n",
|
||||||
|
" \"content\": \"Transforming diagnosis\",\n",
|
||||||
|
" \"category\": \"science\",\n",
|
||||||
|
" \"published_date\": int((datetime.now() - timedelta(days=15)).timestamp()),\n",
|
||||||
|
" },\n",
|
||||||
|
" ]\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"blog_retriever = SuperlinkedRetriever(\n",
|
||||||
|
" sl_client=app,\n",
|
||||||
|
" sl_query=blog_query,\n",
|
||||||
|
" page_content_field=\"content\",\n",
|
||||||
|
" metadata_fields=[\"title\", \"category\", \"published_date\"],\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"blog_retriever.invoke(\n",
|
||||||
|
" \"machine learning\", content_weight=1.0, recency_weight=0.5, limit=2\n",
|
||||||
|
")"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
Reference in New Issue
Block a user