From 6c11c8dac686673a83e92884c8fdaa8864b6662b Mon Sep 17 00:00:00 2001 From: Shotaro Sano Date: Wed, 10 Apr 2024 02:37:15 +0900 Subject: [PATCH] docs: Add documentation of `ElasticsearchStore.BM25RetrievalStrategy` (#20098) This pull request follows up on https://github.com/langchain-ai/langchain/pull/19314 and https://github.com/langchain-ai/langchain-elastic/pull/6, adding documentation for the `ElasticsearchStore.BM25RetrievalStrategy`. Like other retrieval strategies, we are now introducing BM25RetrievalStrategy. ### Background - The `BM25RetrievalStrategy` has been introduced to `langchain-elastic` via the pull request https://github.com/langchain-ai/langchain-elastic/pull/6. - This PR was initially created in the main `langchain` repository but was moved to `langchain-elastic` during the review process due to the migration of the partner package. - The original PR can be found at https://github.com/langchain-ai/langchain/pull/19314. - As [commented](https://github.com/langchain-ai/langchain/pull/19314#issuecomment-2023202401) by @joemcelroy, documenting the new retrieval strategy is part of the requirements for its introduction. Although the `BM25RetrievalStrategy` has been merged into `langchain-elastic`, its documentation is still to be maintained in the main `langchain` repository. Therefore, this pull request adds the documentation portion of `BM25RetrievalStrategy`. The content of the documentation remains the same as that included in the original PR, https://github.com/langchain-ai/langchain/pull/19314. --------- Co-authored-by: Max Jakob --- .../vectorstores/elasticsearch.ipynb | 46 ++++++++++++++++++- 1 file changed, 45 insertions(+), 1 deletion(-) diff --git a/docs/docs/integrations/vectorstores/elasticsearch.ipynb b/docs/docs/integrations/vectorstores/elasticsearch.ipynb index 0cf0cb146b9..2cf6b282426 100644 --- a/docs/docs/integrations/vectorstores/elasticsearch.ipynb +++ b/docs/docs/integrations/vectorstores/elasticsearch.ipynb @@ -736,6 +736,50 @@ "```" ] }, + { + "cell_type": "markdown", + "id": "05cdb43d-5e46-46f6-a2dc-91df4aa56ec7", + "metadata": {}, + "source": [ + "## BM25RetrievalStrategy\n", + "This strategy allows the user to perform searches using pure BM25 without vector search.\n", + "\n", + "To use this, specify `BM25RetrievalStrategy` in `ElasticsearchStore` constructor.\n", + "\n", + "Note that in the example below, the embedding option is not specified, indicating that the search is conducted without using embeddings." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "4464a657-08c5-4a1a-b0e8-dba65f5b7ec0", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[Document(page_content='foo'), Document(page_content='foo bar'), Document(page_content='foo bar baz')]\n" + ] + } + ], + "source": [ + "from langchain_elasticsearch import ElasticsearchStore\n", + "\n", + "db = ElasticsearchStore(\n", + " es_url=\"http://localhost:9200\",\n", + " index_name=\"test_index\",\n", + " strategy=ElasticsearchStore.BM25RetrievalStrategy(),\n", + ")\n", + "\n", + "db.add_texts(\n", + " [\"foo\", \"foo bar\", \"foo bar baz\", \"bar\", \"bar baz\", \"baz\"],\n", + ")\n", + "\n", + "results = db.similarity_search(query=\"foo\", k=10)\n", + "print(results)" + ] + }, { "cell_type": "markdown", "id": "0960fa0a", @@ -993,7 +1037,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.7" + "version": "3.11.8" } }, "nbformat": 4,