mirror of
https://github.com/hwchase17/langchain.git
synced 2025-07-31 00:29:57 +00:00
docs: update Elasticsearch strategy names (#21530)
Update documentation with the [new names for retrieval strategies](https://github.com/langchain-ai/langchain-elastic/pull/22) --------- Co-authored-by: Erick Friis <erick@langchain.dev>
This commit is contained in:
parent
cdc8e2d0c2
commit
e6b7a1769b
@ -161,7 +161,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 3,
|
||||
"id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
|
||||
"metadata": {
|
||||
"id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
|
||||
@ -194,7 +194,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 4,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"id": "aac9563e",
|
||||
@ -208,7 +208,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 5,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {
|
||||
"id": "a3c3999a",
|
||||
@ -229,7 +229,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 6,
|
||||
"id": "12eb86d8",
|
||||
"metadata": {
|
||||
"id": "12eb86d8",
|
||||
@ -271,7 +271,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 7,
|
||||
"id": "5d076412",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -313,7 +313,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 8,
|
||||
"id": "b2a4bd1b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -345,7 +345,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 9,
|
||||
"id": "f3d294ff",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -375,7 +375,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 59,
|
||||
"execution_count": 10,
|
||||
"id": "55b63a61",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -405,7 +405,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 60,
|
||||
"execution_count": 11,
|
||||
"id": "9b831b3d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -435,7 +435,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 12,
|
||||
"id": "fb1482e7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@ -504,27 +504,29 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Retrieval Strategies\n",
|
||||
"Elasticsearch has big advantages over other vector only databases from its ability to support a wide range of retrieval strategies. In this notebook we will configure `ElasticsearchStore` to support some of the most common retrieval strategies. \n",
|
||||
"Elasticsearch has big advantages over other vector only databases from its ability to support a wide range of retrieval strategies. In this notebook we will configure `ElasticsearchStore` to support some of the most common retrieval strategies. \n",
|
||||
"\n",
|
||||
"By default, `ElasticsearchStore` uses the `ApproxRetrievalStrategy`.\n",
|
||||
"By default, `ElasticsearchStore` uses the `DenseVectorStrategy` (was called `ApproxRetrievalStrategy` prior to version 0.2.0).\n",
|
||||
"\n",
|
||||
"## ApproxRetrievalStrategy\n",
|
||||
"This will return the top `k` most similar vectors to the query vector. The `k` parameter is set when the `ElasticsearchStore` is initialized. The default value is `10`."
|
||||
"## DenseVectorStrategy\n",
|
||||
"This will return the top `k` most similar vectors to the query vector. The `k` parameter is set when the `ElasticsearchStore` is initialized. The default value is `10`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 13,
|
||||
"id": "999b5ef5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_elasticsearch import DenseVectorStrategy\n",
|
||||
"\n",
|
||||
"db = ElasticsearchStore.from_documents(\n",
|
||||
" docs,\n",
|
||||
" embeddings,\n",
|
||||
" es_url=\"http://localhost:9200\",\n",
|
||||
" index_name=\"test\",\n",
|
||||
" strategy=ElasticsearchStore.ApproxRetrievalStrategy(),\n",
|
||||
" strategy=DenseVectorStrategy(),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"docs = db.similarity_search(\n",
|
||||
@ -537,12 +539,12 @@
|
||||
"id": "9b651be5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Example: Approx with hybrid\n",
|
||||
"### Example: Hybrid retrieval with dense vector and keyword search\n",
|
||||
"This example will show how to configure `ElasticsearchStore` to perform a hybrid retrieval, using a combination of approximate semantic search and keyword based search. \n",
|
||||
"\n",
|
||||
"We use RRF to balance the two scores from different retrieval methods.\n",
|
||||
"\n",
|
||||
"To enable hybrid retrieval, we need to set `hybrid=True` in `ElasticsearchStore` `ApproxRetrievalStrategy` constructor.\n",
|
||||
"To enable hybrid retrieval, we need to set `hybrid=True` in the `DenseVectorStrategy` constructor.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"\n",
|
||||
@ -551,9 +553,7 @@
|
||||
" embeddings, \n",
|
||||
" es_url=\"http://localhost:9200\", \n",
|
||||
" index_name=\"test\",\n",
|
||||
" strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
|
||||
" hybrid=True,\n",
|
||||
" )\n",
|
||||
" strategy=DenseVectorStrategy(hybrid=True)\n",
|
||||
")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
@ -582,22 +582,22 @@
|
||||
"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"### Example: Approx with Embedding Model in Elasticsearch\n",
|
||||
"This example will show how to configure `ElasticsearchStore` to use the embedding model deployed in Elasticsearch for approximate retrieval. \n",
|
||||
"### Example: Dense vector search with Embedding Model in Elasticsearch\n",
|
||||
"This example will show how to configure `ElasticsearchStore` to use the embedding model deployed in Elasticsearch for dense vector retrieval.\n",
|
||||
"\n",
|
||||
"To use this, specify the model_id in `ElasticsearchStore` `ApproxRetrievalStrategy` constructor via the `query_model_id` argument.\n",
|
||||
"To use this, specify the model_id in `DenseVectorStrategy` constructor via the `query_model_id` argument.\n",
|
||||
"\n",
|
||||
"**NOTE** This requires the model to be deployed and running in Elasticsearch ml node. See [notebook example](https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/integrations/hugging-face/loading-model-from-hugging-face.ipynb) on how to deploy the model with eland.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 14,
|
||||
"id": "0a0c85e7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"APPROX_SELF_DEPLOYED_INDEX_NAME = \"test-approx-self-deployed\"\n",
|
||||
"DENSE_SELF_DEPLOYED_INDEX_NAME = \"test-dense-self-deployed\"\n",
|
||||
"\n",
|
||||
"# Note: This does not have an embedding function specified\n",
|
||||
"# Instead, we will use the embedding model deployed in Elasticsearch\n",
|
||||
@ -605,12 +605,10 @@
|
||||
" es_cloud_id=\"<your cloud id>\",\n",
|
||||
" es_user=\"elastic\",\n",
|
||||
" es_password=\"<your password>\",\n",
|
||||
" index_name=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
|
||||
" index_name=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
|
||||
" query_field=\"text_field\",\n",
|
||||
" vector_query_field=\"vector_query_field.predicted_value\",\n",
|
||||
" strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
|
||||
" query_model_id=\"sentence-transformers__all-minilm-l6-v2\"\n",
|
||||
" ),\n",
|
||||
" strategy=DenseVectorStrategy(model_id=\"sentence-transformers__all-minilm-l6-v2\"),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Setup a Ingest Pipeline to perform the embedding\n",
|
||||
@ -631,7 +629,7 @@
|
||||
"# creating a new index with the pipeline,\n",
|
||||
"# not relying on langchain to create the index\n",
|
||||
"db.client.indices.create(\n",
|
||||
" index=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
|
||||
" index=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
|
||||
" mappings={\n",
|
||||
" \"properties\": {\n",
|
||||
" \"text_field\": {\"type\": \"text\"},\n",
|
||||
@ -655,12 +653,10 @@
|
||||
" es_cloud_id=\"<cloud id>\",\n",
|
||||
" es_user=\"elastic\",\n",
|
||||
" es_password=\"<cloud password>\",\n",
|
||||
" index_name=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
|
||||
" index_name=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
|
||||
" query_field=\"text_field\",\n",
|
||||
" vector_query_field=\"vector_query_field.predicted_value\",\n",
|
||||
" strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
|
||||
" query_model_id=\"sentence-transformers__all-minilm-l6-v2\"\n",
|
||||
" ),\n",
|
||||
" strategy=DenseVectorStrategy(model_id=\"sentence-transformers__all-minilm-l6-v2\"),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Perform search\n",
|
||||
@ -672,12 +668,12 @@
|
||||
"id": "53959de6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## SparseVectorRetrievalStrategy (ELSER)\n",
|
||||
"## SparseVectorStrategy (ELSER)\n",
|
||||
"This strategy uses Elasticsearch's sparse vector retrieval to retrieve the top-k results. We only support our own \"ELSER\" embedding model for now.\n",
|
||||
"\n",
|
||||
"**NOTE** This requires the ELSER model to be deployed and running in Elasticsearch ml node. \n",
|
||||
"\n",
|
||||
"To use this, specify `SparseVectorRetrievalStrategy` in `ElasticsearchStore` constructor."
|
||||
"To use this, specify `SparseVectorStrategy` (was called `SparseVectorRetrievalStrategy` prior to version 0.2.0) in the `ElasticsearchStore` constructor. You will need to provide a model ID."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -695,15 +691,17 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_elasticsearch import SparseVectorStrategy\n",
|
||||
"\n",
|
||||
"# Note that this example doesn't have an embedding function. This is because we infer the tokens at index time and at query time within Elasticsearch.\n",
|
||||
"# This requires the ELSER model to be loaded and running in Elasticsearch.\n",
|
||||
"db = ElasticsearchStore.from_documents(\n",
|
||||
" docs,\n",
|
||||
" es_cloud_id=\"My_deployment:dXMtY2VudHJhbDEuZ2NwLmNsb3VkLmVzLmlvOjQ0MyQ2OGJhMjhmNDc1M2Y0MWVjYTk2NzI2ZWNkMmE5YzRkNyQ3NWI4ODRjNWQ2OTU0MTYzODFjOTkxNmQ1YzYxMGI1Mw==\",\n",
|
||||
" es_cloud_id=\"<cloud id>\",\n",
|
||||
" es_user=\"elastic\",\n",
|
||||
" es_password=\"GgUPiWKwEzgHIYdHdgPk1Lwi\",\n",
|
||||
" es_password=\"<cloud password>\",\n",
|
||||
" index_name=\"test-elser\",\n",
|
||||
" strategy=ElasticsearchStore.SparseVectorRetrievalStrategy(),\n",
|
||||
" strategy=SparseVectorStrategy(model_id=\".elser_model_2\"),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"db.client.indices.refresh(index=\"test-elser\")\n",
|
||||
@ -719,19 +717,42 @@
|
||||
"id": "edf3a093",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## ExactRetrievalStrategy\n",
|
||||
"This strategy uses Elasticsearch's exact retrieval (also known as brute force) to retrieve the top-k results.\n",
|
||||
"## DenseVectorScriptScoreStrategy\n",
|
||||
"This strategy uses Elasticsearch's script score query to perform exact vector retrieval (also known as brute force) to retrieve the top-k results. (This strategy was called `ExactRetrievalStrategy` prior to version 0.2.0.)\n",
|
||||
"\n",
|
||||
"To use this, specify `ExactRetrievalStrategy` in `ElasticsearchStore` constructor.\n",
|
||||
"To use this, specify `DenseVectorScriptScoreStrategy` in `ElasticsearchStore` constructor.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from langchain_elasticsearch import SparseVectorStrategy\n",
|
||||
"\n",
|
||||
"db = ElasticsearchStore.from_documents(\n",
|
||||
" docs, \n",
|
||||
" embeddings, \n",
|
||||
" es_url=\"http://localhost:9200\", \n",
|
||||
" index_name=\"test\",\n",
|
||||
" strategy=ElasticsearchStore.ExactRetrievalStrategy()\n",
|
||||
" strategy=DenseVectorScriptScoreStrategy(),\n",
|
||||
")\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "11b51c47",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## BM25Strategy\n",
|
||||
"Finally, you can use full-text keyword search.\n",
|
||||
"\n",
|
||||
"To use this, specify `BM25Strategy` in `ElasticsearchStore` constructor.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from langchain_elasticsearch import BM25Strategy\n",
|
||||
"\n",
|
||||
"db = ElasticsearchStore.from_documents(\n",
|
||||
" docs, \n",
|
||||
" es_url=\"http://localhost:9200\", \n",
|
||||
" index_name=\"test\",\n",
|
||||
" strategy=BM25Strategy(),\n",
|
||||
")\n",
|
||||
"```"
|
||||
]
|
||||
@ -924,9 +945,9 @@
|
||||
"\n",
|
||||
"## What's new?\n",
|
||||
"\n",
|
||||
"The new implementation is now one class called `ElasticsearchStore` which can be used for approx, exact, and ELSER search retrieval, via strategies.\n",
|
||||
"The new implementation is now one class called `ElasticsearchStore` which can be used for approximate dense vector, exact dense vector, sparse vector (ELSER), BM25 retrieval and hybrid retrieval, via strategies.\n",
|
||||
"\n",
|
||||
"## Im using ElasticKNNSearch\n",
|
||||
"## I am using ElasticKNNSearch\n",
|
||||
"\n",
|
||||
"Old implementation:\n",
|
||||
"\n",
|
||||
@ -946,21 +967,21 @@
|
||||
"\n",
|
||||
"```python\n",
|
||||
"\n",
|
||||
"from langchain_elasticsearch import ElasticsearchStore\n",
|
||||
"from langchain_elasticsearch import ElasticsearchStore, DenseVectorStrategy\n",
|
||||
"\n",
|
||||
"db = ElasticsearchStore(\n",
|
||||
" es_url=\"http://localhost:9200\",\n",
|
||||
" index_name=\"test_index\",\n",
|
||||
" embedding=embedding,\n",
|
||||
" # if you use the model_id\n",
|
||||
" # strategy=ElasticsearchStore.ApproxRetrievalStrategy( query_model_id=\"test_model\" )\n",
|
||||
" # strategy=DenseVectorStrategy(model_id=\"test_model\")\n",
|
||||
" # if you use hybrid search\n",
|
||||
" # strategy=ElasticsearchStore.ApproxRetrievalStrategy( hybrid=True )\n",
|
||||
" # strategy=DenseVectorStrategy(hybrid=True)\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"## Im using ElasticVectorSearch\n",
|
||||
"## I am using ElasticVectorSearch\n",
|
||||
"\n",
|
||||
"Old implementation:\n",
|
||||
"\n",
|
||||
@ -980,13 +1001,13 @@
|
||||
"\n",
|
||||
"```python\n",
|
||||
"\n",
|
||||
"from langchain_elasticsearch import ElasticsearchStore\n",
|
||||
"from langchain_elasticsearch import ElasticsearchStore, DenseVectorScriptScoreStrategy\n",
|
||||
"\n",
|
||||
"db = ElasticsearchStore(\n",
|
||||
" es_url=\"http://localhost:9200\",\n",
|
||||
" index_name=\"test_index\",\n",
|
||||
" embedding=embedding,\n",
|
||||
" strategy=ElasticsearchStore.ExactRetrievalStrategy()\n",
|
||||
" strategy=DenseVectorScriptScoreStrategy()\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"```"
|
||||
|
Loading…
Reference in New Issue
Block a user