docs: add hybrid search documentation to PGVectorStore (#32549)

Adding documentation for Hybrid Search in the PGVectorStore Notebook

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
This commit is contained in:
dishaprakash
2025-09-12 01:12:58 +00:00
committed by GitHub
parent 15d558ff16
commit bea72bac3e

View File

@@ -496,7 +496,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Create a custom Vector Store\n", "## Create a custom Vector Store\n",
"\n", "\n",
"Customize the vectorstore with special column names or with custom metadata columns.\n", "Customize the vectorstore with special column names or with custom metadata columns.\n",
"\n", "\n",
@@ -617,7 +617,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Create a Vector Store using existing table\n", "## Create a Vector Store using existing table\n",
"\n", "\n",
"A Vector Store can be built up on an existing table.\n", "A Vector Store can be built up on an existing table.\n",
"\n", "\n",
@@ -713,6 +713,260 @@
"1. For new records, added via `VectorStore` embeddings are automatically generated." "1. For new records, added via `VectorStore` embeddings are automatically generated."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hybrid Search with PGVectorStore\n",
"\n",
"A Hybrid Search combines multiple lookup strategies to provide more comprehensive and relevant search results. Specifically, it leverages both dense embedding vector search (for semantic similarity) and TSV (Text Search Vector) based keyword search (for lexical matching). This approach is particularly powerful for applications requiring efficient searching through customized text and metadata, especially when a specialized embedding model isn't feasible or necessary.\n",
"\n",
"By integrating both semantic and lexical capabilities, hybrid search helps overcome the limitations of each individual method:\n",
"* **Semantic Search**: Excellent for understanding the meaning of a query, even if the exact keywords aren't present. However, it can sometimes miss highly relevant documents that contain the precise keywords but have a slightly different semantic context.\n",
"* **Keyword Search**: Highly effective for finding documents with exact keyword matches and is generally fast. Its weakness lies in its inability to understand synonyms, misspellings, or conceptual relationships."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Hybrid Search Config\n",
"\n",
"You can take advantage of hybrid search with PGVectorStore using the `HybridSearchConfig`.\n",
"\n",
"With a `HybridSearchConfig` provided, the `PGVectorStore` class can efficiently manage a hybrid search vector store using PostgreSQL as the backend, automatically handling the creation and population of the necessary TSV columns when possible."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Building the config\n",
"\n",
"Here are the parameters to the hybrid search config:\n",
"* **tsv_column:** The column name for TSV column. Default: `<content_column>_tsv`\n",
"* **tsv_lang:** Value representing a supported language. Default: `pg_catalog.english`\n",
"* **fts_query:** If provided, this would be used for secondary retrieval instead of user provided query.\n",
"* **fusion_function:** Determines how the results are to be merged, default is equal weighted sum ranking.\n",
"* **fusion_function_parameters:** Parameters for the fusion function\n",
"* **primary_top_k:** Max results fetched for primary retrieval. Default: `4`\n",
"* **secondary_top_k:** Max results fetched for secondary retrieval. Default: `4`\n",
"* **index_name:** Name of the index built on the `tsv_column`\n",
"* **index_type:** GIN or GIST. Default: `GIN`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is an example `HybridSearchConfig`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_postgres.v2.hybrid_search_config import (\n",
" HybridSearchConfig,\n",
" reciprocal_rank_fusion,\n",
")\n",
"\n",
"hybrid_search_config = HybridSearchConfig(\n",
" tsv_column=\"hybrid_description\",\n",
" tsv_lang=\"pg_catalog.english\",\n",
" fusion_function=reciprocal_rank_fusion,\n",
" fusion_function_parameters={\n",
" \"rrf_k\": 60,\n",
" \"fetch_top_k\": 10,\n",
" },\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note:** In this case, we have mentioned the fusion function to be a `reciprocal rank fusion` but you can also use the `weighted_sum_ranking`.\n",
"\n",
"Make sure to use the right fusion function parameters\n",
"\n",
"`reciprocal_rank_fusion`:\n",
"* rrf_k: The RRF parameter k. Defaults to 60\n",
"* fetch_top_k: The number of documents to fetch after merging the results. Defaults to 4\n",
"\n",
"`weighted_sum_ranking`:\n",
"* primary_results_weight: The weight for the primary source's scores. Defaults to 0.5\n",
"* secondary_results_weight: The weight for the secondary source's scores. Defaults to 0.5\n",
"* fetch_top_k: The number of documents to fetch after merging the results. Defaults to 4\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Usage\n",
"\n",
"Let's assume we are using the previously mentioned table [`products`](#create-a-vector-store-using-existing-table), which stores product details for an eComm venture.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### With a new hybrid search table\n",
"To create a new postgres table with the tsv column, specify the hybrid search config during the initialization of the vector store.\n",
"\n",
"In this case, all the similarity searches will make use of hybrid search."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_postgres import PGVectorStore\n",
"\n",
"TABLE_NAME = \"hybrid_search_products\"\n",
"\n",
"await pg_engine.ainit_vectorstore_table(\n",
" table_name=TABLE_NAME,\n",
" # schema_name=SCHEMA_NAME,\n",
" vector_size=VECTOR_SIZE,\n",
" id_column=\"product_id\",\n",
" content_column=\"description\",\n",
" embedding_column=\"embed\",\n",
" metadata_columns=[\"name\", \"category\", \"price_usd\", \"quantity\", \"sku\", \"image_url\"],\n",
" metadata_json_column=\"metadata\",\n",
" hybrid_search_config=hybrid_search_config,\n",
" store_metadata=True,\n",
")\n",
"\n",
"vs_hybrid = await PGVectorStore.create(\n",
" pg_engine,\n",
" table_name=TABLE_NAME,\n",
" # schema_name=SCHEMA_NAME,\n",
" embedding_service=embedding,\n",
" # Connect to existing VectorStore by customizing below column names\n",
" id_column=\"product_id\",\n",
" content_column=\"description\",\n",
" embedding_column=\"embed\",\n",
" metadata_columns=[\"name\", \"category\", \"price_usd\", \"quantity\", \"sku\", \"image_url\"],\n",
" metadata_json_column=\"metadata\",\n",
" hybrid_search_config=hybrid_search_config,\n",
")\n",
"\n",
"# Fetch documents from the previously created store to fetch product documents\n",
"docs = await custom_store.asimilarity_search(\"products\", k=5)\n",
"# Add data normally to the hybrid search vector store, which will also add the tsv values in tsv_column\n",
"await vs_hybrid.aadd_documents(docs)\n",
"\n",
"# Use hybrid search\n",
"hybrid_docs = await vs_hybrid.asimilarity_search(\"products\", k=5)\n",
"print(hybrid_docs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### With a pre-existing table\n",
"\n",
"If a hybrid search config is **NOT** provided during `init_vectorstore_table` while creating a table, the table will not contain a tsv_column. In this case you can still take advantage of hybrid search using the `HybridSearchConfig`.\n",
"\n",
"The specified TSV column is not present but the TSV vectors are created dynamically on-the-go for hybrid search."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_postgres import PGVectorStore\n",
"\n",
"# Set the existing table name\n",
"TABLE_NAME = \"products\"\n",
"# SCHEMA_NAME = \"my_schema\"\n",
"\n",
"hybrid_search_config = HybridSearchConfig(\n",
" tsv_lang=\"pg_catalog.english\",\n",
" fusion_function=reciprocal_rank_fusion,\n",
" fusion_function_parameters={\n",
" \"rrf_k\": 60,\n",
" \"fetch_top_k\": 10,\n",
" },\n",
")\n",
"\n",
"# Initialize PGVectorStore with the hybrid search config\n",
"custom_hybrid_store = await PGVectorStore.create(\n",
" pg_engine,\n",
" table_name=TABLE_NAME,\n",
" # schema_name=SCHEMA_NAME,\n",
" embedding_service=embedding,\n",
" # Connect to existing VectorStore by customizing below column names\n",
" id_column=\"product_id\",\n",
" content_column=\"description\",\n",
" embedding_column=\"embed\",\n",
" metadata_columns=[\"name\", \"category\", \"price_usd\", \"quantity\", \"sku\", \"image_url\"],\n",
" metadata_json_column=\"metadata\",\n",
" hybrid_search_config=hybrid_search_config,\n",
")\n",
"\n",
"# Use hybrid search\n",
"hybrid_docs = await custom_hybrid_store.asimilarity_search(\"products\", k=5)\n",
"print(hybrid_docs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this case, all the similarity searches will make use of hybrid search."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Applying Hybrid Search to Specific Queries\n",
"\n",
"To use hybrid search only for certain queries, omit the configuration during initialization and pass it directly to the search method when needed.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Use hybrid search\n",
"hybrid_docs = await custom_store.asimilarity_search(\n",
" \"products\", k=5, hybrid_search_config=hybrid_search_config\n",
")\n",
"print(hybrid_docs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Hybrid Search Index\n",
"\n",
"Optionally, if you have created a Postgres table with a tsv_column, you can create an index."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"await vs_hybrid.aapply_hybrid_search_index()"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},