mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-07 05:52:15 +00:00
Harrison/as retriever docstring (#8840)
Co-authored-by: Bytestorm <31070777+Bytestorm5@users.noreply.github.com>
This commit is contained in:
@@ -117,3 +117,38 @@ qa.run(query)
|
||||
```
|
||||
|
||||
</CodeOutputBlock>
|
||||
|
||||
## Vectorstore Retriever Options
|
||||
You can adjust how documents are retrieved from your vectorstore depending on the specific task.
|
||||
|
||||
There are two main ways to retrieve documents relevant to a query- Similarity Search and Max Marginal Relevance Search (MMR Search). Similarity Search is the default, but you can use MMR by adding the `search_type` parameter:
|
||||
|
||||
```python
|
||||
docsearch.as_retriever(search_type="mmr")
|
||||
```
|
||||
|
||||
You can also modify the search by passing specific search arguments through the retriever to the search function, using the `search_kwargs` keyword argument.
|
||||
|
||||
- `k` defines how many documents are returned; defaults to 4.
|
||||
- `score_threshold` allows you to set a minimum relevance for documents returned by the retriever, if you are using the "similarity_score_threshold" search type.
|
||||
- `fetch_k` determines the amount of documents to pass to the MMR algorithm; defaults to 20.
|
||||
- `lambda_mult` controls the diversity of results returned by the MMR algorithm, with 1 being minimum diversity and 0 being maximum. Defaults to 0.5.
|
||||
- `filter` allows you to define a filter on what documents should be retrieved, based on the documents' metadata. This has no effect if the Vectorstore doesn't store any metadata.
|
||||
|
||||
Some examples for how these parameters can be used:
|
||||
```python
|
||||
# Retrieve more documents with higher diversity- useful if your dataset has many similar documents
|
||||
docsearch.as_retriever(search_type="mmr", search_kwargs={'k': 6, 'lambda_mult': 0.25})
|
||||
|
||||
# Fetch more documents for the MMR algorithm to consider, but only return the top 5
|
||||
docsearch.as_retriever(search_type="mmr", search_kwargs={'k': 5, 'fetch_k': 50})
|
||||
|
||||
# Only retrieve documents that have a relevance score above a certain threshold
|
||||
docsearch.as_retriever(search_type="similarity_score_threshold", search_kwargs={'score_threshold': 0.8})
|
||||
|
||||
# Only get the single most similar document from the dataset
|
||||
docsearch.as_retriever(search_kwargs={'k': 1})
|
||||
|
||||
# Use a filter to only retrieve documents from a specific paper
|
||||
docsearch.as_retriever(search_kwargs={'filter': {'paper_title':'GPT-4 Technical Report'}})
|
||||
```
|
@@ -3,7 +3,7 @@ Additionally, we can return the source documents used to answer the question by
|
||||
|
||||
|
||||
```python
|
||||
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=docsearch.as_retriever(), return_source_documents=True)
|
||||
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=docsearch.as_retriever(search_type="mmr", search_kwargs={'fetch_k': 30}), return_source_documents=True)
|
||||
```
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user