mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-16 04:21:52 +00:00
393 lines
9.9 KiB
Plaintext
393 lines
9.9 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "14d3fd06",
|
|
"metadata": {
|
|
"id": "14d3fd06"
|
|
},
|
|
"source": [
|
|
"# Hybrid Search\n",
|
|
"\n",
|
|
"The standard search in LangChain is done by vector similarity. However, a number of vectorstores implementations (Astra DB, ElasticSearch, Neo4J, AzureSearch, ...) also support more advanced search combining vector similarity search and other search techniques (full-text, BM25, and so on). This is generally referred to as \"Hybrid\" search.\n",
|
|
"\n",
|
|
"**Step 1: Make sure the vectorstore you are using supports hybrid search**\n",
|
|
"\n",
|
|
"At the moment, there is no unified way to perform hybrid search in LangChain. Each vectorstore may have their own way to do it. This is generally exposed as a keyword argument that is passed in during `similarity_search`. By reading the documentation or source code, figure out whether the vectorstore you are using supports hybrid search, and, if so, how to use it.\n",
|
|
"\n",
|
|
"**Step 2: Add that parameter as a configurable field for the chain**\n",
|
|
"\n",
|
|
"This will let you easily call the chain and configure any relevant flags at runtime. See [this documentation](/docs/how_to/configure) for more information on configuration.\n",
|
|
"\n",
|
|
"**Step 3: Call the chain with that configurable field**\n",
|
|
"\n",
|
|
"Now, at runtime you can call this chain with configurable field.\n",
|
|
"\n",
|
|
"## Code Example\n",
|
|
"\n",
|
|
"Let's see a concrete example of what this looks like in code. We will use the Cassandra/CQL interface of Astra DB for this example.\n",
|
|
"\n",
|
|
"Install the following Python package:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c2efe35eea197769",
|
|
"metadata": {
|
|
"id": "c2efe35eea197769",
|
|
"outputId": "527275b4-076e-4b22-945c-e41a59188116"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"!pip install \"cassio>=0.1.7\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b4ef96d44341cd84",
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"id": "b4ef96d44341cd84"
|
|
},
|
|
"source": [
|
|
"Get the [connection secrets](https://docs.datastax.com/en/astra/astra-db-vector/get-started/quickstart.html).\n",
|
|
"\n",
|
|
"Initialize cassio:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "cb2cef097277c32e",
|
|
"metadata": {
|
|
"id": "cb2cef097277c32e",
|
|
"outputId": "4c3d05a0-319a-44a0-8ec3-0a9c78453132"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import cassio\n",
|
|
"\n",
|
|
"cassio.init(\n",
|
|
" database_id=\"Your database ID\",\n",
|
|
" token=\"Your application token\",\n",
|
|
" keyspace=\"Your key space\",\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e1e51444877f45eb",
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"id": "e1e51444877f45eb"
|
|
},
|
|
"source": [
|
|
"Create the Cassandra VectorStore with a standard [index analyzer](https://docs.datastax.com/en/astra/astra-db-vector/cql/use-analyzers-with-cql.html). The index analyzer is needed to enable term matching."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "7345de3c",
|
|
"metadata": {
|
|
"id": "7345de3c",
|
|
"outputId": "d38bcee0-0134-4ac6-8d35-afcce282481b"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from cassio.table.cql import STANDARD_ANALYZER\n",
|
|
"from langchain_community.vectorstores import Cassandra\n",
|
|
"from langchain_openai import OpenAIEmbeddings\n",
|
|
"\n",
|
|
"embeddings = OpenAIEmbeddings()\n",
|
|
"vectorstore = Cassandra(\n",
|
|
" embedding=embeddings,\n",
|
|
" table_name=\"test_hybrid\",\n",
|
|
" body_index_options=[STANDARD_ANALYZER],\n",
|
|
" session=None,\n",
|
|
" keyspace=None,\n",
|
|
")\n",
|
|
"\n",
|
|
"vectorstore.add_texts(\n",
|
|
" [\n",
|
|
" \"In 2023, I visited Paris\",\n",
|
|
" \"In 2022, I visited New York\",\n",
|
|
" \"In 2021, I visited New Orleans\",\n",
|
|
" ]\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "73887f23bbab978c",
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"id": "73887f23bbab978c"
|
|
},
|
|
"source": [
|
|
"If we do a standard similarity search, we get all the documents:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "3c2a39fa",
|
|
"metadata": {
|
|
"id": "3c2a39fa",
|
|
"outputId": "5290085b-896c-4c81-9b40-c315331b7009"
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[Document(page_content='In 2022, I visited New York'),\n",
|
|
"Document(page_content='In 2023, I visited Paris'),\n",
|
|
"Document(page_content='In 2021, I visited New Orleans')]"
|
|
]
|
|
},
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"vectorstore.as_retriever().invoke(\"What city did I visit last?\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "78d4c3c79e67d8c3",
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"id": "78d4c3c79e67d8c3"
|
|
},
|
|
"source": [
|
|
"The Astra DB vectorstore `body_search` argument can be used to filter the search on the term `new`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "56393baa",
|
|
"metadata": {
|
|
"id": "56393baa",
|
|
"outputId": "d1c939f3-342f-4df4-94a3-d25429b5a25e"
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[Document(page_content='In 2022, I visited New York'),\n",
|
|
"Document(page_content='In 2021, I visited New Orleans')]"
|
|
]
|
|
},
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"vectorstore.as_retriever(search_kwargs={\"body_search\": \"new\"}).invoke(\n",
|
|
" \"What city did I visit last?\"\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "88ae97ed",
|
|
"metadata": {
|
|
"id": "88ae97ed"
|
|
},
|
|
"source": [
|
|
"We can now create the chain that we will use to do question-answering over"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "62707b4f",
|
|
"metadata": {
|
|
"id": "62707b4f"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_core.output_parsers import StrOutputParser\n",
|
|
"from langchain_core.prompts import ChatPromptTemplate\n",
|
|
"from langchain_core.runnables import (\n",
|
|
" ConfigurableField,\n",
|
|
" RunnablePassthrough,\n",
|
|
")\n",
|
|
"from langchain_openai import ChatOpenAI"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b6778ffa",
|
|
"metadata": {
|
|
"id": "b6778ffa"
|
|
},
|
|
"source": [
|
|
"This is basic question-answering chain set up."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "44a865f6",
|
|
"metadata": {
|
|
"id": "44a865f6"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"template = \"\"\"Answer the question based only on the following context:\n",
|
|
"{context}\n",
|
|
"Question: {question}\n",
|
|
"\"\"\"\n",
|
|
"prompt = ChatPromptTemplate.from_template(template)\n",
|
|
"\n",
|
|
"model = ChatOpenAI()\n",
|
|
"\n",
|
|
"retriever = vectorstore.as_retriever()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "72125166",
|
|
"metadata": {
|
|
"id": "72125166"
|
|
},
|
|
"source": [
|
|
"Here we mark the retriever as having a configurable field. All vectorstore retrievers have `search_kwargs` as a field. This is just a dictionary, with vectorstore specific fields"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "babbadff",
|
|
"metadata": {
|
|
"id": "babbadff"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"configurable_retriever = retriever.configurable_fields(\n",
|
|
" search_kwargs=ConfigurableField(\n",
|
|
" id=\"search_kwargs\",\n",
|
|
" name=\"Search Kwargs\",\n",
|
|
" description=\"The search kwargs to use\",\n",
|
|
" )\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "2d481b70",
|
|
"metadata": {
|
|
"id": "2d481b70"
|
|
},
|
|
"source": [
|
|
"We can now create the chain using our configurable retriever"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "210b0446",
|
|
"metadata": {
|
|
"id": "210b0446"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"chain = (\n",
|
|
" {\"context\": configurable_retriever, \"question\": RunnablePassthrough()}\n",
|
|
" | prompt\n",
|
|
" | model\n",
|
|
" | StrOutputParser()\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "a38037b2",
|
|
"metadata": {
|
|
"id": "a38037b2",
|
|
"outputId": "1ea14996-5965-4a5e-9678-b9c35ce5c6de"
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Paris"
|
|
]
|
|
},
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"chain.invoke(\"What city did I visit last?\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7f6458c3",
|
|
"metadata": {
|
|
"id": "7f6458c3"
|
|
},
|
|
"source": [
|
|
"We can now invoke the chain with configurable options. `search_kwargs` is the id of the configurable field. The value is the search kwargs to use for Astra DB."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9gYLqBTH8BFz",
|
|
"metadata": {
|
|
"id": "9gYLqBTH8BFz",
|
|
"outputId": "4358a2e6-f306-48f1-dd5c-781ac8a33e89"
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"New York"
|
|
]
|
|
},
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"chain.invoke(\n",
|
|
" \"What city did I visit last?\",\n",
|
|
" config={\"configurable\": {\"search_kwargs\": {\"body_search\": \"new\"}}},\n",
|
|
")"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.9.1"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|